CFP last date
20 December 2024
Reseach Article

Energy Efficient and Reliable Job Submission in Hadoop Clusters

Published on August 2012 by G Sudha Sadasivam, S Sangeetha, R Radhakrishnan
Information Processing and Remote Computing
Foundation of Computer Science USA
IPRC - Number 1
August 2012
Authors: G Sudha Sadasivam, S Sangeetha, R Radhakrishnan
299e10e7-185b-4503-a147-16c129749b8a

G Sudha Sadasivam, S Sangeetha, R Radhakrishnan . Energy Efficient and Reliable Job Submission in Hadoop Clusters. Information Processing and Remote Computing. IPRC, 1 (August 2012), 6-11.

@article{
author = { G Sudha Sadasivam, S Sangeetha, R Radhakrishnan },
title = { Energy Efficient and Reliable Job Submission in Hadoop Clusters },
journal = { Information Processing and Remote Computing },
issue_date = { August 2012 },
volume = { IPRC },
number = { 1 },
month = { August },
year = { 2012 },
issn = 0975-8887,
pages = { 6-11 },
numpages = 6,
url = { /specialissues/iprc/number1/7997-1004/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Special Issue Article
%1 Information Processing and Remote Computing
%A G Sudha Sadasivam
%A S Sangeetha
%A R Radhakrishnan
%T Energy Efficient and Reliable Job Submission in Hadoop Clusters
%J Information Processing and Remote Computing
%@ 0975-8887
%V IPRC
%N 1
%P 6-11
%D 2012
%I International Journal of Computer Applications
Abstract

MapReduce paradigm is highly suitable for large scale data intensive applications in the cloud environment. The scale of these applications necessitates minimization of cluster power consumption to reduce operational costs and carbon footprint. Energy consumption can be reduced by selective power down of nodes during periods of low utilization. Hadoop is basically used for batch processing of huge jobs. Before jobs are submitted, the files used them are uploaded into the cluster. A file is split up into a number of chunks and distributed across the Hadoop cluster. This paper addresses the problem of block allocation in distributed file system to improve reliability and energy efficiency. A framework to reduce power requirements of a cluster by identifying the number of replicas and their placement for reliable completion of the job has been designed. This will address the issues like block allocation, reliable job submission and minimization of cluster nodes to reduce power consumption. This framework is integrated with hadoop's namenode. The scheduler component in Hadoop has also been modified to enable submission of jobs to active data node containing data to be operated on. A greedy approach and an evolutionary approach using Particle Swarm Optimization (PSO) has been designed to identify suitable nodes to be activated in a cluster. Experimental results demonstrate the performance of these approaches.

References
  1. Hadoop- http://developer. yahoo. com – Hadoop internals and tutorial.
  2. HDFS - http://hadoop. apache. org/hdfs – Study about HDFS and its development.
  3. R. Buyya, M. Murshed, Gridsim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing, Concurrency and Computation: Practice and Experience 14 (2002), 1175{1220. doi:http://dx. doi. org/10. 1002/cpe. 710.
  4. Willis Lang and Jignesh M. Patel, Energy Management for MapReduce Clusters,Computer Sciences Department, University of WisconsinMadison,USA.
  5. Nitesh Maheshwari, Radheshyam Nanduri, Vasudeva Varma, Dynamic Energy Efficient Data Placement and Cluster Reconfiguration Algorithm for MapReduce Framework, Search and Information Extraction Lab, Language Technologies Research Centre (LTRC), IIIT Hyderabad.
  6. Jacob Leverich, Christos Kozyrakis, On the Energy (In)efficiency of Hadoop Clusters, Computer Systems Laboratory, Stanford University.
  7. Yanpei Chen, Laura Keys, Randy Katz , Hadoop Summit 2009 – Towards Energy Efficient Hadoop -, RAD Lab, UC Berkeley.
  8. Hyeong S. Kim Dong In Shin Young Jin Yu Hyeonsang Eom Heon Y. Yeom,, Towards Energy Proportional Cloud for Data Processing Frameworks, School of Computer Science and Engineering, Seoul National University.
  9. M. Weiser, B. Welch, A. Demers, S. Shenker, Scheduling for reducedcpu energy, in: OSDI '94: Proceedings of the 1st USENIX conferenconOperating Systems Design and Implementation, USENIX Association,Berkeley, CA, USA, 1994, p. 2.
  10. A. Rangasamy, R. Nagpal, Y. Srikant, Compiler-directed frequencyand voltage scaling for a multiple clock domain microarchitecture, in: CF '08: Proceedings of the 5th conference on Computing frontiers, ACM, New York, NY,USA, 2008, pp. 209{218. doi:http://doi. acm. org/10. 1145/1366230. 1366267
  11. A. R. Lebeck, X. Fan, H. Zeng, C. Ellis, Power aware page allocation, in: ASPLOS-IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, ACM,NewYork,NY,USA,2000,pp. 105{116. doi:http://doi. acm. org/10. 1145/378993. 379007.
  12. D. P. Helmbold, D. D. Long, T. L. Sconyers, B. Sherrod, Adaptive diskspindown for mobile computers, Mobile Networks and Applications 5(2000) 285{297.
  13. M. Elnozahy, M. Kistler, R. Rajamony, Energy conservation policiesfor web servers, in: USITS'03: Proceedings of the 4th conference onUSENIX Symposium on Internet Technologies and Systems, USENIXAssociation, Berkeley, CA, USA, 2003.
  14. E. V. Carrera, E. Pinheiro, R. Bianchini, Conserving disk energy in net-work servers, in: ICS '03: Proceedings of the 17th annual internationalconference on Supercomputing, ACM, New York, NY, USA, 2003, pp. 86{97. doi:http://doi. acm. org/10. 1145/782814. 782829.
  15. S. Gurumurthi, A. Sivasubramaniam, M. Kandemir, H. Franke, Drpm:Dynamic speed control for power management in server class disks,Computer Architecture, International Symposium on 0 (2003) 169. doi:http://doi. ieeecomputersociety. org/10. 1109/ISCA. 2003. 1206998.
  16. Jan Stoess , Christoph Klee , Stefan Domthera , Frank Bellosa, Transparent, Power-Aware Migration in Virtualized Systems.
  17. Akshat Verma, Puneet Ahuja and Anindya Neogi, pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems
  18. Live Data Center Migration acrossWANs:A Robust Cooperative Context Aware ApproachK. K. Ramakrishnan, Prashant Shenoy , Jacobus Van der MerweAT&T Labs-Research / ?? University of Massachusetts
  19. intelligence R. Jeyarani , R. Vasanth Ram , N. Nagaveni, Design and implementation of adaptive power-aware virtual machine provisioner (APA-VMP) using swarm
  20. Power-aware linear programming based scheduling for heterogeneous computer clusters. Rini T Kaushik, Milind Bhandarkar, GreenHDFS: Towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster.
  21. WOL – http://wikipedia. org/wol.
  22. PSO reference - http://www. swarmintelligence. org – Study about PSO.
Index Terms

Computer Science
Information Sciences

Keywords

Energy Efficiency Hadoop Reliability Pso