CFP last date
20 January 2025
Reseach Article

Improved Input Data Splitting in MapReduce

Published on September 2015 by Reema Rhine, Nikhila T Bhuvan
International Conference on Emerging Trends in Technology and Applied Sciences
Foundation of Computer Science USA
ICETTAS2015 - Number 2
September 2015
Authors: Reema Rhine, Nikhila T Bhuvan
eccd6089-703d-4273-920d-137a5267cad2

Reema Rhine, Nikhila T Bhuvan . Improved Input Data Splitting in MapReduce. International Conference on Emerging Trends in Technology and Applied Sciences. ICETTAS2015, 2 (September 2015), 23-26.

@article{
author = { Reema Rhine, Nikhila T Bhuvan },
title = { Improved Input Data Splitting in MapReduce },
journal = { International Conference on Emerging Trends in Technology and Applied Sciences },
issue_date = { September 2015 },
volume = { ICETTAS2015 },
number = { 2 },
month = { September },
year = { 2015 },
issn = 0975-8887,
pages = { 23-26 },
numpages = 4,
url = { /proceedings/icettas2015/number2/22383-2582/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Emerging Trends in Technology and Applied Sciences
%A Reema Rhine
%A Nikhila T Bhuvan
%T Improved Input Data Splitting in MapReduce
%J International Conference on Emerging Trends in Technology and Applied Sciences
%@ 0975-8887
%V ICETTAS2015
%N 2
%P 23-26
%D 2015
%I International Journal of Computer Applications
Abstract

The performance of MapReduce greatly depends on its data splitting process which happens before the map phase. This is usually done using naive methods which are not at all optimal. In this paper, an Improved Input Splitting technology based on locality is explained which aims at addressing the input data splitting problems which affects the job performance seriously. Improved Input Splitting clusters data blocks from a same node into the same single partition, so that it is processed by one map task. This method avoids the time for slot reallocation and multiple tasks initializing. Experiment results demonstrated that this can improve the MapReduce processing performance largely than the traditional Hadoop implementation.

References
  1. J. Tan, S. Meng, X. Meng, et al. , "Improving ReduceTask data locality for sequential MapReduce jobs," in INFOCOM, 2013 Proceedings IEEE, 2013, pp. 1627-1635
  2. R. Vernica, A. Balmin, K. S. Beyer, et al. , "Adaptive MapReduce using situation-aware mappers," in Proceedings of the 15th International Conference on Extending Database Technology, 2012, pp. 420-431.
  3. A. Rasmussen, M. Conley, G. Porter, et al. , "Themis: an I/O-efficient MapReduce," in Proceedings of the Third ACM Symposium on Cloud Computing, 2012, p. 13.
  4. S. Ibrahim, H. Jin, L. Lu, et al. , "Maestro: Replica-aware map scheduling for mapreduce," in Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on, 2012, pp. 435-442.
  5. M. Hammoud and M. F. Sakr, "Locality-aware reduce task scheduling for mapreduce," in Cloud Computing Technology and Science (Cloud- Com), 2011 IEEE Third International Conference on, 2011, pp. 570- 576.
  6. T. Condie, N. Conway, P. Alvaro, et al. , "Online aggregation and continuous query support in mapreduce," in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, 2010, pp. 1115-1118.
  7. J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51,pp. 107-113,2008.
  8. H. -c. Yang, A. Dasdan, R. -L. Hsiao, et al. , "Map-reduce-merge:simplified relational data processing on large clusters," in Proceedings of the 2007 ACM SIGMOD international conference on Management of data, 2007, pp. 1029-1040.
  9. Hadoop is released as source code tarballs with corresponding binary tarballs for convenience http://hadoop. apache. org/
  10. https://www. mapr. com/blog/understanding-mapreduce-input-split-sizes-and-mapr-fs-chunk-sizes#. VQcuCfmUegI
  11. http://dailyhadoopsoup. blogspot. in/2014/02/mapreduce-inputs-and-splitting. html
  12. The paperwork for opening a business or getting unemployment http://www. openstack. org/
  13. http://www. cloudera. com/content/cloudera/en/products-and-services/cdh/hdfs-and-mapreduce. html
  14. http://www. revelytix. com/?q=content/hadoop-overview
  15. MarkLogic Connector for Hadoop Developer's Guidehttp://docs. marklogic. com/hadoop:get-splits
  16. http://grepcode. com/file/repository. cloudera. com/content/repositories/releases/com. cloudera. hadoop/hadoop-core/0. 20. 2737/org/apache/hadoop/mapreduce/lib/input/FileInputFormat. java
  17. Chunguang Wang; Qingbo Wu; Yusong Tan; Wenzhu Wang; Quanyuan Wu, "Locality Based Data Partitioning in MapReduce," Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on , vol. , no. , pp. 1310,1317, 3-5 Dec. 2013
Index Terms

Computer Science
Information Sciences

Keywords

Hdfs Improved Input Splitting Mapreduce