Information Processing and Remote Computing |
Foundation of Computer Science USA |
IPRC - Number 1 |
August 2012 |
Authors: G Sudha Sadasivam, S Sangeetha, R Radhakrishnan |
299e10e7-185b-4503-a147-16c129749b8a |
G Sudha Sadasivam, S Sangeetha, R Radhakrishnan . Energy Efficient and Reliable Job Submission in Hadoop Clusters. Information Processing and Remote Computing. IPRC, 1 (August 2012), 6-11.
MapReduce paradigm is highly suitable for large scale data intensive applications in the cloud environment. The scale of these applications necessitates minimization of cluster power consumption to reduce operational costs and carbon footprint. Energy consumption can be reduced by selective power down of nodes during periods of low utilization. Hadoop is basically used for batch processing of huge jobs. Before jobs are submitted, the files used them are uploaded into the cluster. A file is split up into a number of chunks and distributed across the Hadoop cluster. This paper addresses the problem of block allocation in distributed file system to improve reliability and energy efficiency. A framework to reduce power requirements of a cluster by identifying the number of replicas and their placement for reliable completion of the job has been designed. This will address the issues like block allocation, reliable job submission and minimization of cluster nodes to reduce power consumption. This framework is integrated with hadoop's namenode. The scheduler component in Hadoop has also been modified to enable submission of jobs to active data node containing data to be operated on. A greedy approach and an evolutionary approach using Particle Swarm Optimization (PSO) has been designed to identify suitable nodes to be activated in a cluster. Experimental results demonstrate the performance of these approaches.