Skip to main content

Cloud Computing IU

Class Summary

In this course you will learn basic concepts in Cloud Computing, including how to write your own software using key cloud programming models and tools to support data mining and data analysis applications.

Course Organization

This course is structured with a mix of recorded lectures, programming labs and forum discussions.

Prerequisites

B649 Cloud Computing online is a programming intensive course. It has similar requirements to the CS graduate level residential version. Students are expected to have weekly (or biweekly) programming homework. General programming experience with Windows or Linux using Java (2-3 years) and scripts is required. A background in parallel and cluster computing is a plus, although not necessary.

Course Staff

Judy Qiu

Judy Qiu is an Assistant Professor in the School of Informatics and Computing at Indiana University. Her research interests focus on data-intensive computing at the intersection of cloud and multicore technologies, with an emphasis on life science applications using MapReduce as well as traditional parallel and distributed computing approaches.

Class Progress Distribution

The course progress is the percentage of mandatory items completed for the course.

Exams (40%) - Midterm (20%), Final (20%)
Written Assignments (30%) - 5 Quizzes (each 2 points), 4 Homework assignments(each 5 points)
Projects (30%) - 6 Projects: [Hadoop WordCount (5), Hadoop PageRank (5), Hadoop Blast (10), HBase WordCount (5), Building an Inverted Index (10), Build a Search Engine (20)], (2 Optional Projects: Twister/Giraph PageRank (20), Twister K-means (20))

Class Schedule

Week 1 - Jan 27th to Feb 2nd

  • Course Info
  • Introduction
  • Data Center Model
  • Data Intensive Sciences
  • IaaS, PaaS, and SaaS
  • Challenges

Week 2 - Feb 3rd to Feb 9th

  • Computational Clusters
  • Term Projects 1
  • Term Projects 2

Week 3 - Feb 10th to Feb 16th

  • Apache Data Analysis Open Stack
  • MapReduce
  • Hadoop Framework
  • Hadoop Tasks
  • Fault Tolerance

Week 4 - Feb 16th to Feb 23th

  • Programming on a Computer Cluster
  • How Hadoop Runs on a MapReduce Job
  • Literature Review
  • Introduction to BLAST
  • BLAST Parallelization
  • SIMD vs MIMD; SPMD vs MPMD
  • Data Locality
  • Optimal Data Locality
  • Trask Granularity
  • Resource Utilization and Speculative Execution

Week 5 - Feb 24th to Mar 2nd

  • Growth of Virtual Machines
  • Virtualization Implementation Levels
  • Virtualization Structures/Tools and Mechanisms
  • Virtualization of CPU, Memory and I/O Devices
  • Virtual Clusters and Resource Mgmt.
  • Virtualization for Data Center Automation

Week 6 - Mar 3rd to Mar 9th

  • MapReduce Refresher
  • Google Search Engine 1
  • Google Search Engine 2
  • Hadoop PageRank
  • Discussions and Parallel Thinking
  • Hadoop Extensions

Week 7 - Mar 10th to Mar 16th

  • There is no new lecture for this week. Continue to work on the Hadoop PageRank project from the previous lesson

Week 8 - Mar 17th to Mar 23rd

  • Midterm

Week 9 - Mar 24th to Mar 30th

  • RDBMS vs. NoSQL
  • NoSQL Characteristics
  • BigTable
  • HBase
  • HBase Coding

Week 10 - Mar 31st to Apr 6th

  • Applying for FutureGrid Account
  • FutureGrid India OpenStack
  • Hadoop WordCount on VMs

Week 11 - Apr 7th to Apr 13th

  • There is no new lecture for this week. Continue to work on the previous project.

Week 12 - Apr 14th to Apr 20th

  • There is no new lecture for this week. Continue to work on the previous project.

Week 13 - Apr 21st to Apr 27th

  • MapReduce Models
  • Designing for Big Data
  • Twister Iterative MapReduce
  • Application Performance
  • Twister K-means Explained
  • Twister K-means Code

Week 14 - Apr 28th to May 4th

  • Hangout Lab 1
  • Hangout Lab 2
  • Hangout Lab 3

Frequently Asked Questions

What will I learn?

At the end of this course, you will have learned key concepts in cloud computing and enough programming to be able to solve data analysis problems on your own.

What are the class projects?

The class has several projects that will allow students to get firsthand experience with the technologies taught here. Projects are performed on VirtualBox Appliances or academic clouds like FutureSystems.

  1. Course Number

    B649
  2. Estimated Effort

    6 Hours per week
Enroll