CFP last date
20 January 2025
Reseach Article

Survey on Data Processing and Scheduling in Hadoop

by Somya Singh, Neetu Narayan, Gaurav Raj
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 119 - Number 22
Year of Publication: 2015
Authors: Somya Singh, Neetu Narayan, Gaurav Raj
10.5120/21370-4411

Somya Singh, Neetu Narayan, Gaurav Raj . Survey on Data Processing and Scheduling in Hadoop. International Journal of Computer Applications. 119, 22 ( June 2015), 27-30. DOI=10.5120/21370-4411

@article{ 10.5120/21370-4411,
author = { Somya Singh, Neetu Narayan, Gaurav Raj },
title = { Survey on Data Processing and Scheduling in Hadoop },
journal = { International Journal of Computer Applications },
issue_date = { June 2015 },
volume = { 119 },
number = { 22 },
month = { June },
year = { 2015 },
issn = { 0975-8887 },
pages = { 27-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume119/number22/21370-4411/ },
doi = { 10.5120/21370-4411 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:04:45.983124+05:30
%A Somya Singh
%A Neetu Narayan
%A Gaurav Raj
%T Survey on Data Processing and Scheduling in Hadoop
%J International Journal of Computer Applications
%@ 0975-8887
%V 119
%N 22
%P 27-30
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

There is an explosion in the volume of data in the world. The amount of data is increasing by leaps and bounds. The sources are individuals, social media, organizations, etc. The data may be structured, semi-structured or unstructured. Gaining knowledge from this data and using it for competitive advantage is the primary focus of all the organizations. In the last few years Big Data has found its way in almost every field, from government to private sectors, industry to academia. The major challenges associated with Big Data are data organization, modeling, data analysis and retrieval. Hadoop is a widely used software framework used for the large scale management and analysis of data. The main components of Hadoop: HDFS and MapReduce, enable the distributed storage and processing of data over a large number of commodity servers. This paper provides an overview of MapReduce and its capabilities and discusses the related issues.

References
  1. Heger, A. D. Hadoop Design, Architecture & MapReduce Performance. DHTechnologies.
  2. Olson, M. 2010 Hadoop: Scalable, Flexible Data Storage and Analysis. Cloudera, IQT Quarterly.
  3. Doug, L. 2001 3D Data Management: Controlling Data Volume, Velocity and Variety. Meta Group, File 949.
  4. White, C. 2012 MapReduce and the Data Scientist. BI Research.
  5. Einav,L. and Levin, J. 2013. The Data Revolution and Economic Analysis. In Proceedings of the NBER Innovation Policy and the Economy Conference, Stanford University and NBER.
  6. White, T. Hadoop: The Definitive Guide. 3rd Edition, O'Reilly.
  7. Zhiqiang ,M. L. G. The Limitation of MapReduce: A Probing Case and a Lightweight Solution.
  8. Yoo, D. and Sim K. M. 2011. A comparative review of Job Scheduling for MapReduce. In Proceedings of IEEE CCIS2011.
  9. Dean, J. and Ghemawat, S. 2010. MapReduce: Simplified Data Processing on Large Clusters. Google Inc.
  10. Dean, J. and Ghemawat, S. 2010. MapReduce: A Flexible Data Processing Tool. Communications of the ACM.
  11. Big Data, http://en. wikipedia. org/wiki/Big_data.
  12. Apache Hadoop, http://hadoop. apache. org/.
  13. Rao, B. T. and Reddy. L. S. S. Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments. IJC A, 2011.
  14. Haoop, https://en. wikipedia. org/wiki/Apache_Hadoop.
Index Terms

Computer Science
Information Sciences

Keywords

MapReduce Scheduling