CFP last date
20 January 2025
Reseach Article

Improving Current Hadoop MapReduce Workflow and Performance

by Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 116 - Number 15
Year of Publication: 2015
Authors: Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa
10.5120/20414-2828

Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa . Improving Current Hadoop MapReduce Workflow and Performance. International Journal of Computer Applications. 116, 15 ( April 2015), 38-42. DOI=10.5120/20414-2828

@article{ 10.5120/20414-2828,
author = { Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa },
title = { Improving Current Hadoop MapReduce Workflow and Performance },
journal = { International Journal of Computer Applications },
issue_date = { April 2015 },
volume = { 116 },
number = { 15 },
month = { April },
year = { 2015 },
issn = { 0975-8887 },
pages = { 38-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume116/number15/20414-2828/ },
doi = { 10.5120/20414-2828 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:57:13.840192+05:30
%A Hamoud Alshammari
%A Jeongkyu Lee
%A Hassan Bajwa
%T Improving Current Hadoop MapReduce Workflow and Performance
%J International Journal of Computer Applications
%@ 0975-8887
%V 116
%N 15
%P 38-42
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This study proposes an improvement andimplementation of enhanced Hadoop MapReduce workflow that develop the performance of the current Hadoop MapReduce. This architecture speeds up the process of manipulating BigData by enhancing different parameters in the processing jobs. BigData needs to be divided into many datasets or blocks and distributed to many nodes within the cluster. Thus, tasks can access these blocks in parallel mode and be processed easily. However, accessing the same datasets each time the job is executed causes data overloading problem, so we developed the current MapReduce workflow to improve the performance in terms of data size that is read in the relative jobs. This work uses a bioinformatics DNA datasets to implement the solution.

References
  1. S. Lohr, "The age of big data," New York Times, vol. 11, 2012.
  2. V. Marx, "Biology: The big challenges of big data," Nature, vol. 498, pp. 255-260, 06/13/print 2013.
  3. T. White, Hadoop: The definitive guide: " O'Reilly Media, Inc. ", 2012.
  4. J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, et al. , "SciHadoop: Array-based query processing in Hadoop," in High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, 2011, pp. 1-11.
  5. A. B. Patel, M. Birla, and U. Nair, "Addressing big data problem using Hadoop and Map Reduce," in Engineering (NUiCONE), 2012 Nirma University International Conference on, 2012, pp. 1-5.
  6. W. Xu, W. Luo, and N. Woodward, "Analysis and optimization of data import with hadoop," pp. 1058-1066.
  7. S. Wu, F. Li, S. Mehrotra, and B. C. Ooi, "Query optimization for massively parallel data processing," in Proceedings of the 2nd ACM Symposium on Cloud Computing, 2011, p. 12.
  8. L. D. Stein, "The case for cloud computing in genome informatics," Genome Biol, vol. 11, p. 207, 2010.
  9. M. C. Schatz, B. Langmead, and S. L. Salzberg, "Cloud computing and the DNA data race," Nature biotechnology, vol. 28, p. 691, 2010.
  10. P. C. Church, A. Goscinski, K. Holt, M. Inouye, A. Ghoting, K. Makarychev, et al. , "Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers," in Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE, 2011, pp. 924-927.
  11. H. Alshammari, H. Bajwa, and J. Lee, "Hadoop Based Enhanced Cloud Architecture," presented at the ASEE, USA, 2014.
  12. S. Leo, F. Santoni, and G. Zanetti, "Biodoop: Bioinformatics on Hadoop, Parallel Processing Workshops, International Conference on, pp. 415-422, 2009 International Conference on Parallel Processing Workshops, 2009," 2009.
  13. A. H. Zookeeper, "http://hadoop. apache. org/zookeeper/," accessed Feb 2015.
  14. A. Matsunaga, M. Tsugawa, and J. Fortes, "CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications," in eScience, 2008. eScience '08. IEEE Fourth International Conference on, 2008, pp. 222-229. 9.
Index Terms

Computer Science
Information Sciences

Keywords

Cloud Computing Hadoop bioinformatics BigData.