Using SalsaHadoop on FutureGrid

PLEASE NOTE: THIS MANUAL PAGE IS A DRAFT, PLEASE PROVIDE FEEDBACK IN THE COMMENT SECTION.
 

SalsaHadoop Introduction

Apache Hadoop is widely used by domain scientists for running their scientific applications in parallel fashion. For our research convenience, SalsaHPC research group is developing SalsaHadoop, an automatic method to start Hadoop without worrying the Hadoop configuration. SalsaHadoop can be running on any general cluster, and on multiple machines. It has been used by SalsaHPC research group and a graduate-level course CSCI B649 Cloud Computing for Data Intensive Sciences

Running SalsaHadoop on FutureGrid

SalsaHadoop can be run in various modes within FG, in either FutureGrid HPC or FutureGrid Cloud/IaaS environments. The following tutorials provide step-by-step instructions on using SalsaHadoop on these modes, and also show some examples of running Hadoop applications after starting Hadoop. In general, the HPC environment is easier if you do not have experience with IaaS Eucalyptus.