CFP last date
20 January 2025
Reseach Article

A Review on Apache Hadoop Performance Enhancement by using Network Levitated Merge

Published on December 2015 by Prashant B. Kanhere, Sathish Kumar Penchala
National Conference on Advances in Computing
Foundation of Computer Science USA
NCAC2015 - Number 4
December 2015
Authors: Prashant B. Kanhere, Sathish Kumar Penchala
3b5f0c55-4f4b-44e0-b8ad-b3f346990f9f

Prashant B. Kanhere, Sathish Kumar Penchala . A Review on Apache Hadoop Performance Enhancement by using Network Levitated Merge. National Conference on Advances in Computing. NCAC2015, 4 (December 2015), 28-31.

@article{
author = { Prashant B. Kanhere, Sathish Kumar Penchala },
title = { A Review on Apache Hadoop Performance Enhancement by using Network Levitated Merge },
journal = { National Conference on Advances in Computing },
issue_date = { December 2015 },
volume = { NCAC2015 },
number = { 4 },
month = { December },
year = { 2015 },
issn = 0975-8887,
pages = { 28-31 },
numpages = 4,
url = { /proceedings/ncac2015/number4/23382-5048/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Advances in Computing
%A Prashant B. Kanhere
%A Sathish Kumar Penchala
%T A Review on Apache Hadoop Performance Enhancement by using Network Levitated Merge
%J National Conference on Advances in Computing
%@ 0975-8887
%V NCAC2015
%N 4
%P 28-31
%D 2015
%I International Journal of Computer Applications
Abstract

Hadoop is popular large scale open source software framework which is written in JAVA programming for securely distributes storage and it is the master implementation of Map-Reduce programming used for cloud computation [1]. Now a days, hadoop faces a lot of problems to obtain the best outcomes from underlying system. The issue includes a serialization needs to gain quality performance which setback the aspect. Disk access and repetitive merges causes to current speedy interconnections that increases the volume of data sets. To stay with increasing volume of data sets, Hadoop also requires I/O ability from the underlying system nodes to process and examine data. So, for this 'HADOOP-A' [12] architecture is formed. Hadoop-A is an enhancement of framework that minimizes hadoop with peripherals for speedily data movement and bounding the existing limits to keep updating the architecture. A novel network algorithm for merging the data is explained in this paper. In supplementary, a full pipeline which is designed to overlay the shuffle, minimize phases and merge. The experimental results which shows that HADOOP-A is intensely speeds up data processing in Map – Reduce and extends the hadoop's throughput as double. HADOOP-A is significantly helps to optimize disk accesses which are caused by intermediate data.

References
  1. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Proc. Sixth Symp. Operating.
  2. Test-TCP. http://www. pcausa. com/Utilities/pcattcp. htm.
  3. D. Jiang, B. C. Ooi, L. Shi, and S. Wu, "The Performance of MapReduce: An In-Depth Study," Proc. VLDB Endowment,
  4. System Design and Implementation (OSDI '04), pp. 137-150, Dec. 2004. vol. 3, no. 1, pp. 472-483, 2010. M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica, "Improving MapReduce Performance in Heterogeneous Environments," Proc. Eighth USENIX Symp. Operating Systems Design and Implementation (OSDI '08), Dec. 2008.
  5. T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears, "MapReduce Online," Proc. Seventh USENIX Symp. Networked Systems Design and Implementation (NSDI), pp. 312-328, Apr. 2010.
  6. Apache Hadoop Project, http://hadoop. apache. org/, 2013.
  7. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Sixth Symp. On Operating System Design and Implementation (OSDI), pages 137–150, December 2004.
  8. InfinibandTradeAssociation. http://www. infinibandta. org.
  9. Dawei Jiang, Beng Chin Ooi, Lei Shi, and Sai Wu. The performance of MapReduce: An in-depth study. In Proceedings of the 36th International Conference on Very Large Data Bases (VLDB), volume 3, pages 472–483, 2010.
  10. Yandong Mao, Robert Morris, and Frans Kaashoek. Optimizing MapReduce for multicore architectures. Technical Report MIT-CSAIL-TR-2010-020, MIT, May 2010.
  11. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pages 1–10, Washington, DC, USA, 2010. IEEE Computer Society.
  12. Yandong Wang, Xinyu Que, Weikuan Yu. Hadoop Acceleration through Network Levitated Merge, pages 3 12,http://mmc. geofisica. unam. mx/acl/edp/SC11/src/pdf/papers/tp50.
  13. Weikuan Yu, Member, IEEE, Yandong Wang, and Xinyu Que. Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 3, MARCH 2014.
Index Terms

Computer Science
Information Sciences

Keywords

Serialization Repetitive Merges Disk Access Network Portability Network-levitated Merge Pipelined Shuffle Merge And Reduce.