CFP last date
20 February 2025
Reseach Article

A Survey on Workload Classification and Job Scheduling by using Johnson�s Algorithm under Hadoop Environment

Published on May 2014 by R. Manopriya, C. P. Saranya
International Conference on Simulations in Computing Nexus
Foundation of Computer Science USA
ICSCN - Number 3
May 2014
Authors: R. Manopriya, C. P. Saranya

R. Manopriya, C. P. Saranya . A Survey on Workload Classification and Job Scheduling by using Johnson�s Algorithm under Hadoop Environment. International Conference on Simulations in Computing Nexus. ICSCN, 3 (May 2014), 11-14.

@article{
author = { R. Manopriya, C. P. Saranya },
title = { A Survey on Workload Classification and Job Scheduling by using Johnson�s Algorithm under Hadoop Environment },
journal = { International Conference on Simulations in Computing Nexus },
issue_date = { May 2014 },
volume = { ICSCN },
number = { 3 },
month = { May },
year = { 2014 },
issn = 0975-8887,
pages = { 11-14 },
numpages = 4,
url = { /proceedings/icscn/number3/16160-1034/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Simulations in Computing Nexus
%A R. Manopriya
%A C. P. Saranya
%T A Survey on Workload Classification and Job Scheduling by using Johnson�s Algorithm under Hadoop Environment
%J International Conference on Simulations in Computing Nexus
%@ 0975-8887
%V ICSCN
%N 3
%P 11-14
%D 2014
%I International Journal of Computer Applications
Abstract

Bigdata deals with the larger datasets which focus on storing, sharing and processing the data. The organisation face difficulties to create, manipulate and manage the large datasets. For example, if we take the social media Facebook,there will be some posts on the page. The number of likes, shares and comments are given at a second for a particular post,it leads to creation of large datasets which gives trouble to store the data and process the data. It involves massive volume of both structured and unstructured data. The major problem exists in Bigdata community is workload classification and scheduling of jobs with respect to the disks. Identifying the computation time of individual jobs in the machine uses mapreduce concepts rather than minimizing the overall computation time of entire set of jobs. Mapreduce algorithm is initially applied for splitting the larger datsets into minimized output dataset. Mapreduce consists of two phases for processing the data: map and reduce phases. Under map phase,the given radar input dataset is splitted into individual key-value pairs and an intermediate output is obtained and in reduce phase that key value pair undergoes shuffle and sort operation. Intermediate files are created from map tasks are written to local disk and output files are written to distributed file system of Hadoop. The different types of jobs are given to different disks for the process of scheduling. Johnson's algorithm is used for obtaining the minimum optimal solution among different jobs given in the Hadoop environment. Job type and data locality of the jobs are two important factors for job scheduling process. The Performance analysis of individual disks are calculated on the basis of size of the dataset taken and formation of number of nodes.

References
  1. Abhishek Verma, Ludmila Cherkasova and Roy H. Campbell,Fellow,"Orchestrating an Ensemble of Mapreduce Jobs for minimizing their makespan" IEEE transactions on dependable and secure computing, VOL. 10, NO. 5, SEPTEMBER/OCTOBER 2013.
  2. Bikash Sharma, Chita R. Das, Mahmut T. Kandemir and Seung-Hwan Lim,"MR Orchestrator:A Fine-Grained Resource Orchestration for Hadoop mapreduce" Jan 2012.
  3. Bu-Sung Lee, Bingsheng He and Shanjiang Tang "Dynamic slot allocation technique for mapreduce clusters" IEEE 2013.
  4. Campbell. R,Cherkasova. L and Verma. A "ARIA:Automatic Resource Inference and Allocation for Mapreduce environments," Conference on Autonomic Computing (ICAC), 2011.
  5. Cheng-Zhong Xu , Jia Rao and Xiangping Bu, "Interference and Locality-Aware Task Scheduling for Mapreduce Applications in Virtual Clusters", 2011.
  6. Dhruba Borthakur, Ion Stoica ,Joydeep Sen Sarma and Scott Shenker , "Job scheduling for multi-user mapreduce clusters" April 30, 2009.
  7. Jeffrey Dean and Robert E. Gruber "Bigtable:A distributed storage system for structured data"OSDI'06 7th USENIX symposium on ooperatinng system and implementation.
  8. Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simplified Data on large clusters"USENIX Association OSDI'04 symposium on Operating system amd Implementation 2004.
  9. Joseph. A, Konwinski. A, Katz. R, Stoica. I and Zaharia. M "Improving mapreduce performance in heterogeneous environments," in OSDI, 2008.
  10. Nathan Gnanasambandam and Tridib Mukherjee, "Synchronous parallel processing of Bigdata analytical services to optimize performance in federated clouds" IEEE conference on cloud computing 2012.
  11. Sangwon Seo and Seungryoul Maeng,"HPMR:Prefetching in Shared mapreduce computation environment," IEEE 2009.
  12. Yuri Demchenko ,Zhiming Zhago and Paola Grosso,"Addressing Bigdata challenges for scientific data Infrastructure" 4th International conference on cloud computing Technology and science IEEE 2012.
Index Terms

Computer Science
Information Sciences

Keywords

Dataset Classification(radar Dataset) Mapreduce Algorithm Job Scheduling Using Johnson Algorithm Hadoop Distributed File System(hdfs).