CFP last date
20 February 2025
Reseach Article

Size based Multithreaded Scheduler for Hadoop Framework

Published on December 2015 by Poonam S. Patil, Rajesh N. Phursule
National Conference on Advances in Computing
Foundation of Computer Science USA
NCAC2015 - Number 6
December 2015
Authors: Poonam S. Patil, Rajesh N. Phursule

Poonam S. Patil, Rajesh N. Phursule . Size based Multithreaded Scheduler for Hadoop Framework. National Conference on Advances in Computing. NCAC2015, 6 (December 2015), 20-23.

@article{
author = { Poonam S. Patil, Rajesh N. Phursule },
title = { Size based Multithreaded Scheduler for Hadoop Framework },
journal = { National Conference on Advances in Computing },
issue_date = { December 2015 },
volume = { NCAC2015 },
number = { 6 },
month = { December },
year = { 2015 },
issn = 0975-8887,
pages = { 20-23 },
numpages = 4,
url = { /proceedings/ncac2015/number6/23397-5068/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Advances in Computing
%A Poonam S. Patil
%A Rajesh N. Phursule
%T Size based Multithreaded Scheduler for Hadoop Framework
%J National Conference on Advances in Computing
%@ 0975-8887
%V NCAC2015
%N 6
%P 20-23
%D 2015
%I International Journal of Computer Applications
Abstract

The majority of large-scale data severe applications executed by data centers are based on MapReduce or its open-source implementation i. e. Hadoop. For processing huge sum of data in parallel Hadoop programming framework provides Distributed File System (HDFS)[2] and MapReduce Programming Model[3]. Job scheduling is an imperative process in Hadoop MapReduce. Hadoop comes with three types of schedulers namely FIFO, Fair and Capacity Scheduler. In some processing scenario these traditional scheduling algorithm of Hadoop cannot meet the performance requirements and fairness criteria of Big Data Processing. To address this issue new efficient scheduler is require who can identify the data size first and processed accordingly for performance improvement. This new MapReduce scheduling scheme Will improves MapReduce performance and erasure high speed data processing. Proposed system will analyze the data size of individual DataNode and create threads based on threshold value decided by proposed scheduler. Processing of the threads is done parallel on individual DataNode by task tracker which will ultimately improve the data process performance. Because of that task Tracker will does the work in less time than the time required by the traditional Scheduler.

References
  1. Apache Hadoop. Available at http://hadoop. apache. org
  2. ApacheHDFS. Available at http://hadoop. apache. org/hdfs
  3. ApacheMapReduceAvailableathttp://hadoop. apache. org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial. html
  4. Apachefairescheduler. Availableathttp://hadoop. apache. org/docs/r1. 2. 1/fair_scheduler. html
  5. ApacheCapacityscheduler. Availableathttp://hadoop. apache. org/docs/r1. 2. 1/capacity_schedulerhtmlJournal of Computational Information Systems 7: 16 (2011) 5769-5775 Available at http://www. Jofcis. com "Research on Job Scheduling Algorithm in Hadoop" by Yang XIA, Lei WANG
  6. A community white paper developed by leading researchers across the United States "Challenges and Opportunities with Big Data"
  7. Jeffrey Dean and Sanjay Google, Inc. " MapReduce: Simplified Data Processing on Large Clusters"
  8. KyuseokShimSeoulNationalUniversityshim@ee. snu. ac. kr "MapReduce Algorithms for Big Data Analysis"
  9. Vasiliki Kalavri, Vladimir VlassovKTH The Royal Institute of Technology Stockholm, Sweden kalavri@kth. se "MapReduce: Limitations, Optimizations and Open Issues". TrustCom/ISPA/IUCC,Page1031-1038,IEEE,(2013)
  10. Yi Yao, Jianzhe Tai, Bo Sheng, Ningfang Mi, "LsPS: A Job Size-Based Scheduler for Efficient Task Assignments in Hadoop", In proceedings of the IEEE transaction, Copyright (c) 2014 IEEE
  11. Qutaibah Althebyan , Omar ALQudah, Yaser Jararweh Qussai Yaseen "Multi-Threading Based Map Reduce Tasks Scheduling", 2014 5th International Conference on Information and Communication Systems (ICICS)
  12. Jisha S Manjaly, Varghese S Chooralil Department "TaskTracker Aware Scheduling for Hadoop MapReduce" 2014 5th International Conference on Information and Communication Systems (ICICS)
  13. Runhui Li, Patrick P. C. Lee, Yuchong Hu "Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters" AoE/E-02/08 and ECS CUHK419212 from the University Grants Committee of Hong Kong, IEEE 2013
  14. Bin Ye, Xiaoshe Dong, Pengfei Zheng "A delay scheduling algorithm based on history time in heterogeneous environments" 2013 8th Annual ChinaGrid Conference
  15. S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan, "An analysisof traces from a production mapreduce cluster," in CCGRID'10,2010, pp. 94–103.
Index Terms

Computer Science
Information Sciences

Keywords

Mapreduce Big Data Scheduling Hdfs