International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 119 - Number 22 |
Year of Publication: 2015 |
Authors: Somya Singh, Neetu Narayan, Gaurav Raj |
10.5120/21370-4411 |
Somya Singh, Neetu Narayan, Gaurav Raj . Survey on Data Processing and Scheduling in Hadoop. International Journal of Computer Applications. 119, 22 ( June 2015), 27-30. DOI=10.5120/21370-4411
There is an explosion in the volume of data in the world. The amount of data is increasing by leaps and bounds. The sources are individuals, social media, organizations, etc. The data may be structured, semi-structured or unstructured. Gaining knowledge from this data and using it for competitive advantage is the primary focus of all the organizations. In the last few years Big Data has found its way in almost every field, from government to private sectors, industry to academia. The major challenges associated with Big Data are data organization, modeling, data analysis and retrieval. Hadoop is a widely used software framework used for the large scale management and analysis of data. The main components of Hadoop: HDFS and MapReduce, enable the distributed storage and processing of data over a large number of commodity servers. This paper provides an overview of MapReduce and its capabilities and discusses the related issues.