Improving Current Hadoop MapReduce Workflow and Performance

Hamoud Alshammari; Jeongkyu Lee; Hassan Bajwa

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Optimal Assistive Drive System using Mobile Cloud Computing

Mar

2019

Low Leakage Multi Threshold Level Shifter Design using Sleepy Keeper

June

2013

Service based Model using Context Awareness for Ubiquitous Computing

July

2014

Optimum Performance Bounds of Routing Protocols for VANET through Realistic Fading Channel

July

2015

Reseach Article

Improving Current Hadoop MapReduce Workflow and Performance

by Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 116 - Number 15

Year of Publication: 2015

Authors: Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa

10.5120/20414-2828

Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa . Improving Current Hadoop MapReduce Workflow and Performance. International Journal of Computer Applications. 116, 15 ( April 2015), 38-42. DOI=10.5120/20414-2828

@article{ 10.5120/20414-2828,

author = { Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa },

title = { Improving Current Hadoop MapReduce Workflow and Performance },

journal = { International Journal of Computer Applications },

issue_date = { April 2015 },

volume = { 116 },

number = { 15 },

month = { April },

year = { 2015 },

issn = { 0975-8887 },

pages = { 38-42 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume116/number15/20414-2828/ },

doi = { 10.5120/20414-2828 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:57:13.840192+05:30

%A Hamoud Alshammari

%A Jeongkyu Lee

%A Hassan Bajwa

%T Improving Current Hadoop MapReduce Workflow and Performance

%J International Journal of Computer Applications

%@ 0975-8887

%V 116

%N 15

%P 38-42

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This study proposes an improvement andimplementation of enhanced Hadoop MapReduce workflow that develop the performance of the current Hadoop MapReduce. This architecture speeds up the process of manipulating BigData by enhancing different parameters in the processing jobs. BigData needs to be divided into many datasets or blocks and distributed to many nodes within the cluster. Thus, tasks can access these blocks in parallel mode and be processed easily. However, accessing the same datasets each time the job is executed causes data overloading problem, so we developed the current MapReduce workflow to improve the performance in terms of data size that is read in the relative jobs. This work uses a bioinformatics DNA datasets to implement the solution.

References

S. Lohr, "The age of big data," New York Times, vol. 11, 2012.
V. Marx, "Biology: The big challenges of big data," Nature, vol. 498, pp. 255-260, 06/13/print 2013.
T. White, Hadoop: The definitive guide: " O'Reilly Media, Inc. ", 2012.
J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, et al. , "SciHadoop: Array-based query processing in Hadoop," in High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, 2011, pp. 1-11.
A. B. Patel, M. Birla, and U. Nair, "Addressing big data problem using Hadoop and Map Reduce," in Engineering (NUiCONE), 2012 Nirma University International Conference on, 2012, pp. 1-5.
W. Xu, W. Luo, and N. Woodward, "Analysis and optimization of data import with hadoop," pp. 1058-1066.
S. Wu, F. Li, S. Mehrotra, and B. C. Ooi, "Query optimization for massively parallel data processing," in Proceedings of the 2nd ACM Symposium on Cloud Computing, 2011, p. 12.
L. D. Stein, "The case for cloud computing in genome informatics," Genome Biol, vol. 11, p. 207, 2010.
M. C. Schatz, B. Langmead, and S. L. Salzberg, "Cloud computing and the DNA data race," Nature biotechnology, vol. 28, p. 691, 2010.
P. C. Church, A. Goscinski, K. Holt, M. Inouye, A. Ghoting, K. Makarychev, et al. , "Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers," in Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE, 2011, pp. 924-927.
H. Alshammari, H. Bajwa, and J. Lee, "Hadoop Based Enhanced Cloud Architecture," presented at the ASEE, USA, 2014.
S. Leo, F. Santoni, and G. Zanetti, "Biodoop: Bioinformatics on Hadoop, Parallel Processing Workshops, International Conference on, pp. 415-422, 2009 International Conference on Parallel Processing Workshops, 2009," 2009.
A. H. Zookeeper, "http://hadoop. apache. org/zookeeper/," accessed Feb 2015.
A. Matsunaga, M. Tsugawa, and J. Fortes, "CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications," in eScience, 2008. eScience '08. IEEE Fourth International Conference on, 2008, pp. 222-229. 9.

Index Terms

Computer Science

Information Sciences

Keywords

Cloud Computing Hadoop bioinformatics BigData.