CFP last date
20 January 2025
Reseach Article

A Survey on Data Placement and Workload Scheduling Algorithms in Heterogeneous Network for Hadoop

Published on August 2015 by Ruchi Mittal, Harpreet Kaur
International Conference on Advancements in Engineering and Technology
Foundation of Computer Science USA
ICAET2015 - Number 3
August 2015
Authors: Ruchi Mittal, Harpreet Kaur
1e197d94-867e-46d4-8e3f-a0b702992cb7

Ruchi Mittal, Harpreet Kaur . A Survey on Data Placement and Workload Scheduling Algorithms in Heterogeneous Network for Hadoop. International Conference on Advancements in Engineering and Technology. ICAET2015, 3 (August 2015), 22-28.

@article{
author = { Ruchi Mittal, Harpreet Kaur },
title = { A Survey on Data Placement and Workload Scheduling Algorithms in Heterogeneous Network for Hadoop },
journal = { International Conference on Advancements in Engineering and Technology },
issue_date = { August 2015 },
volume = { ICAET2015 },
number = { 3 },
month = { August },
year = { 2015 },
issn = 0975-8887,
pages = { 22-28 },
numpages = 7,
url = { /proceedings/icaet2015/number3/22223-4044/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Advancements in Engineering and Technology
%A Ruchi Mittal
%A Harpreet Kaur
%T A Survey on Data Placement and Workload Scheduling Algorithms in Heterogeneous Network for Hadoop
%J International Conference on Advancements in Engineering and Technology
%@ 0975-8887
%V ICAET2015
%N 3
%P 22-28
%D 2015
%I International Journal of Computer Applications
Abstract

The elastic scalability and fault tolerance of the cloud computing has led to a wide range of real world applications. However, processing requirements of Big Data in these applications pose a humongous challenge for achieving desired performance levels. MapReduce is an effective parallel distributed programming model for handling large unstructured datasets in cloud applications. Hadoop, an open source implementation of the MapReduce model, is currently being employed for high performance processing of Big Data. The current Hadoop implementation considers the nodes of a cluster in a homogeneous environment where each node has the same computing capacity and workload. But in real world applications the nodes may have different computing capacities and workloads resulting in a heterogeneous environment. In such heterogeneous environment the default Hadoop implementation does not yield the expected performance. This paper includes a survey on the algorithms proposed by different authors on (a) data placement strategies and (b) workload scheduling for Hadoop in heterogeneous network.

References
  1. Julio C. S. Anjos, Ivan Carrera, Wagner Kolberg, Andre Luis Tibols, Luciana B. Arantes, Claudio R. Geyer, "MRA++: Scheduling and data placement on MapReduce for heterogeneous environments", in Future Generation Computer Systems, vol. 42, pp. 22-35, January 2015. Xiaofei Hou, Ashwin Kumar T K, Johnson P Thomas, Vijay Vardharajan, "Dynamic Workload Balancing for Hadoop MapReduce", in proceedings of 4th International Conference on Big Data and Cloud Computing, IEEE, Dec. 2014.
  2. Zhou Tang, Min Liu, Almoalmi Ammar, Kenli Li, Keqin Li, "An Optimized MapReduce Workload Scheduling Algorithm for Heterogeneous Computing", in The Journal of Supercomputing, Nov. 2014.
  3. Krish K. R. , Ali Anwar, Ali R. Butt , "?Sched: A Heterogeneity-Aware Hadoop Workflow Scheduler", in proceedings of 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, IEEE, pp. 255-264, Sep. 2014.
  4. Chia-Wei Lee, Kuang-Yu Hsieh Sun-Yuan Hsieh , Hung-Chang Hsiao, "A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments ", in Big Data Research, vol. 1, pp. 14-22, July 2014.
  5. Feng Yan, Ludmila Cherkasova, Zhuoyao Zhang, Evgenia Smirni, "Optimizing Power and Performance Trade-offs of MapReduce Job Processing with Heterogeneous Multi-Core Processors", in proceedings of 7th IEEE International Conference on Cloud Computing,pp. 240-247, July 2014.
  6. Jessica Hartog, Renan DelValle, Madhusudhan Govindaraju, Maichael J. Lewis, "Configuring A MapReduce Framework For Performance Heterogeneous Clusters", in proceedings of IEEE International Congress on Big Data, pp. 120-127, July 2014.
  7. Aysan Rasooli, Douglas G. Down, "COSHH: A Classification and Optimization Based Scheduler for Heterogeneous Hadoop Systems", Future Generation Computer Systems, vol. 36, pp. 1-15, July 2014.
  8. Xiaolong Xu, Lingling Cao, Xinheng Wang, "Adaptive Task Scheduling Strategy based on Dynamic Workload Adjustment (ATSDWA) for Heterogeneous Hadoop Clusters", in IEEE Systems Journal, issue 99, pp. 1-12, June 2014.
  9. Ashwin Kumar T K, Jongyeop Kim, K M George, Nohpill Park, "Dynamic Data Rebalancing in Hadoop", in proceedings of IEEE/ACIS 13th International Conference on Computer and Information Science, pp. 315- 320, June 2014.
  10. Zhao Li, Yao Shen, Bin Yao, Minyi Guo, "OFScheduler: A Dynamic Network Optimizer for MapReduce in Heterogeneous Cluster", in International Journal of Parallel Programming, Oct. 2013.
  11. Bin Ye, Xiaoshe Dong, Pengfei Zheng, Zengdong Zhu, Qiang Liu, Zhe Wang, "A Delay Scheduling Algorithm based on History Time in Heterogeneous Environments", in proceedings of 8th ChinaGrid Annual Conference, IEEE, pp. 86-91, Aug. 2013.
  12. Sutariya Kapil B. , Sowmya Kamath S. , "Resource Aware Scheduling in Hadoop for Heterogeneous Workloads based on Load Estimation", in proceedings of 4th International Conference on Computing, Communications and Networking Technologies, pp. 1-5, July 2013.
  13. Quan Chen, Minyi Guo, Qianni Deng, Long Zheng, Song Guo, Yao Shen, "HAT: History based Auto-Tuning MapReduce in Heterogeneous Environments", in The Journal of Supercomputing, vol. 64, pp. 1038-1054, June 2013.
  14. Yuanquan Fan, Weiguo Wu, Haijun Cao, Huo Zhu, Xu Zhao, Wei Wei, "A Heterogeneity Aware Data Distribution and Rebalance Method in Hadoop Cluster", in proceedings of 7th ChinaGrid Annual Conference, IEEE, pp. 255-264, Sep. 2012.
  15. Visalakshi P and Karthik TU, "MapReduce Scheduler Using Classifiers for Heterogeneous Workloads", in IJCSNS, vol. 11 no. 4, April 2011.
  16. Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Manzanares, Xiao Qin, "Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters", in proceedings of International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum, pp. 1-9 Apr. 2010.
  17. Quan Chen ,Daqiang Zhang, Minyi Guo, Qianni Deng,Song Guo, "SAMR: A Self-adaptive MapReduce Scheduling Algorithm In Heterogeneous Environment", in proceedings of 10th IEEE International Conference on CIT, pp. 2736-2743, 2010.
  18. Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Kartz, Ion Stocia, "Improving MapReduce Performance in Heterogeneous Environments", in proceedings of 8th USENIX Symposium on Operating Systems Design and Implementation, pp. 29-42, ACM Press, 2008.
  19. Ivanilton Polato, Reginaldo Re, Alfredo Goldman, Fabio Kon, "A comprehensive view of Hadoop research", in Journal of Network and Computer Applications, vol. 46, pp. 1-25, Nov. 2014.
  20. B G. Babu, Shabeera T P, Madhu Kumar S D, "Dynamic Colocation Algorithm for Hadoop", in proceedings of IEEE International Conference on Advances in Computing, Communications and Informatics, pp. 2643- 2647, Sep. 2014.
  21. S. Sujitha, Suresh Jaganathan, "Aggrandizing Hadoop in terms of Node Heterogeneity & Data Locality", in proceedings of IEEE International Conference on Smart Structures & Systems, pp. 145-151, Mar. 2013.
Index Terms

Computer Science
Information Sciences

Keywords

Cloud Computing Big Data Mapreduce Hadoop Heterogeneous Network.