Generic placeholder image

OnCourse

Please use OnCourse for submission of some assignments and checking your grades.

Go to OnCourse

Generic placeholder image

Use of Google+

The class will interact with postings on a Google community group. The online section will also interact with Google Hangout or equivalent. Information about Google+ Hangout times will be posted for the data science ONLINE only section.

Go to Community

Generic placeholder image

Office Mix

Office Mix technology is being used for creating the lessons. The lessons are available in Office Mix, which supports combined voice, video, and slides.

Check out Office Mix

Generic placeholder image

Grading

Grading will be based on participation (10%), ABDS deployment (30%) and Project (60%).

Check your grades on OnCourse

Generic placeholder image

Technology Used

We will use FutureSystems (previously FutureGrid) facilities and cloud computing experience is helpful but not essential. Good working experience with Java is required.

Go to FutureSystems

FutureSystems Tutorial

Free Online Tutorials

Generic placeholder image

Instructor

Geoffrey Charles Fox
Senior Associate Dean for Research
Distinguished Professor of Physics,
Computer Science and Informatics
gcf@indiana.edu

More Details


Course description

This course studies software used in many commercial activities to study Big Data. The backdrop for course is the ~120 software subsystems illustrated here. We will describe the software architecture represented by this collection which we term HPC-ABDS (High Performance Computing enhanced Apache Big Data Stack).

A paper discussing this can be found here and presentations here and here. Copies of this material may be found in resources. You can download the syllabus (PDF).

Course Overview


Lessons

Unit 6: More on Software Stack (only one part)


FutureSystems Access

Getting started with hands-on access:

  1. Create an account on the FutureSystems Portal.
  2. Request to be added to project FG-452.
  3. Upload a public SSH key to FutureSystems portal in order to access FutureSystems systems. Initial steps described in videos 1) Get a Portal Account, and 2) Upload an SSH key
  4. Explore the OpenStack Tutorial.
  5. Instructions for account creation, joining a project and uploading an SSH key are all available here.
  6. If you are using Windows, the simplest solution for using SSH keys is the Putty SSH client, and it’s SSH authentication agent Pageant. Putty and its associated programs are available here

Free Online Training via Lynda.com


More Information

The course covers the following material

  1. The cloud computing architecture underlying ABDS and contrast of this with HPC.
  2. The software architecture with its different layers at HPC-ABDS Kaleidoscope covering broad functionality and rationale for each layer.
  3. We will give application examples
  4. Then we will go through selected software systems – about 10% of those in the Kaleidoscope which have been already deployed on FutureSystems systems using OpenStack and Chef recipes.
  5. Students will chose one other open source member of Kaleidoscope each and deploy as in 4).
  6. The main activity of the course will be building a significant project using multiple HPC-ABDS subsystems combined with user code and data.
  7. Teams of up to 3 students can be formed with corresponding increase in scope in activities 5), 6)