Improved Input Data Splitting in MapReduce

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper

Know more

The week's pick

Real-time Synchronization Mechanisms Between Batch-oriented Legacy Systems and Modern Interfaces in the Retirement Domain

Balamurugan Krishnaswamy Gnanasekaran

Random Articles

Trust Enhancing Model for Cloud Environment

December

2015

Fuzzy Crime Investigation Framework for Tracking Data Theft based on USB Storage

December

2013

A New Ranking Algorithm for Search Engine: Content’s Weight based Page Ranking

Oct

2016

Online Customer Care: An Android Application for Mobile Customers using Speech Synthesis

Jul

2016

Reseach Article

Improved Input Data Splitting in MapReduce

Published on September 2015 by Reema Rhine, Nikhila T Bhuvan

International Conference on Emerging Trends in Technology and Applied Sciences

Foundation of Computer Science USA

ICETTAS2015 - Number 2

September 2015

Authors: Reema Rhine, Nikhila T Bhuvan

Reema Rhine, Nikhila T Bhuvan . Improved Input Data Splitting in MapReduce. International Conference on Emerging Trends in Technology and Applied Sciences. ICETTAS2015, 2 (September 2015), 23-26.

@article{

author = { Reema Rhine, Nikhila T Bhuvan },

title = { Improved Input Data Splitting in MapReduce },

journal = { International Conference on Emerging Trends in Technology and Applied Sciences },

issue_date = { September 2015 },

volume = { ICETTAS2015 },

number = { 2 },

month = { September },

year = { 2015 },

issn = 0975-8887,

pages = { 23-26 },

numpages = 4,

url = { /proceedings/icettas2015/number2/22383-2582/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference on Emerging Trends in Technology and Applied Sciences

%A Reema Rhine

%A Nikhila T Bhuvan

%T Improved Input Data Splitting in MapReduce

%J International Conference on Emerging Trends in Technology and Applied Sciences

%@ 0975-8887

%V ICETTAS2015

%N 2

%P 23-26

%D 2015

%I International Journal of Computer Applications

Abstract

The performance of MapReduce greatly depends on its data splitting process which happens before the map phase. This is usually done using naive methods which are not at all optimal. In this paper, an Improved Input Splitting technology based on locality is explained which aims at addressing the input data splitting problems which affects the job performance seriously. Improved Input Splitting clusters data blocks from a same node into the same single partition, so that it is processed by one map task. This method avoids the time for slot reallocation and multiple tasks initializing. Experiment results demonstrated that this can improve the MapReduce processing performance largely than the traditional Hadoop implementation.

References

J. Tan, S. Meng, X. Meng, et al. , "Improving ReduceTask data locality for sequential MapReduce jobs," in INFOCOM, 2013 Proceedings IEEE, 2013, pp. 1627-1635
R. Vernica, A. Balmin, K. S. Beyer, et al. , "Adaptive MapReduce using situation-aware mappers," in Proceedings of the 15th International Conference on Extending Database Technology, 2012, pp. 420-431.
A. Rasmussen, M. Conley, G. Porter, et al. , "Themis: an I/O-efficient MapReduce," in Proceedings of the Third ACM Symposium on Cloud Computing, 2012, p. 13.
S. Ibrahim, H. Jin, L. Lu, et al. , "Maestro: Replica-aware map scheduling for mapreduce," in Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on, 2012, pp. 435-442.
M. Hammoud and M. F. Sakr, "Locality-aware reduce task scheduling for mapreduce," in Cloud Computing Technology and Science (Cloud- Com), 2011 IEEE Third International Conference on, 2011, pp. 570- 576.
T. Condie, N. Conway, P. Alvaro, et al. , "Online aggregation and continuous query support in mapreduce," in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, 2010, pp. 1115-1118.
J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51,pp. 107-113,2008.
H. -c. Yang, A. Dasdan, R. -L. Hsiao, et al. , "Map-reduce-merge:simplified relational data processing on large clusters," in Proceedings of the 2007 ACM SIGMOD international conference on Management of data, 2007, pp. 1029-1040.
Hadoop is released as source code tarballs with corresponding binary tarballs for convenience http://hadoop. apache. org/
https://www. mapr. com/blog/understanding-mapreduce-input-split-sizes-and-mapr-fs-chunk-sizes#. VQcuCfmUegI
http://dailyhadoopsoup. blogspot. in/2014/02/mapreduce-inputs-and-splitting. html
The paperwork for opening a business or getting unemployment http://www. openstack. org/
http://www. cloudera. com/content/cloudera/en/products-and-services/cdh/hdfs-and-mapreduce. html
http://www. revelytix. com/?q=content/hadoop-overview
MarkLogic Connector for Hadoop Developer's Guidehttp://docs. marklogic. com/hadoop:get-splits
http://grepcode. com/file/repository. cloudera. com/content/repositories/releases/com. cloudera. hadoop/hadoop-core/0. 20. 2737/org/apache/hadoop/mapreduce/lib/input/FileInputFormat. java
Chunguang Wang; Qingbo Wu; Yusong Tan; Wenzhu Wang; Quanyuan Wu, "Locality Based Data Partitioning in MapReduce," Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on , vol. , no. , pp. 1310,1317, 3-5 Dec. 2013

Index Terms

Computer Science

Information Sciences

Keywords

Hdfs Improved Input Splitting Mapreduce