Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster

Amresh Kumar; Kiran M.; Saikat Mukherjee; Ravi Prakash G.

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster

by Amresh Kumar, Kiran M., Saikat Mukherjee, Ravi Prakash G.

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 72 - Number 8

Year of Publication: 2013

Authors: Amresh Kumar, Kiran M., Saikat Mukherjee, Ravi Prakash G.

10.5120/12518-9099

Amresh Kumar, Kiran M., Saikat Mukherjee, Ravi Prakash G. . Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster. International Journal of Computer Applications. 72, 8 ( June 2013), 48-55. DOI=10.5120/12518-9099

@article{ 10.5120/12518-9099,

author = { Amresh Kumar, Kiran M., Saikat Mukherjee, Ravi Prakash G. },

title = { Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster },

journal = { International Journal of Computer Applications },

issue_date = { June 2013 },

volume = { 72 },

number = { 8 },

month = { June },

year = { 2013 },

issn = { 0975-8887 },

pages = { 48-55 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume72/number8/12518-9099/ },

doi = { 10.5120/12518-9099 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:37:25.465289+05:30

%A Amresh Kumar

%A Kiran M.

%A Saikat Mukherjee

%A Ravi Prakash G.

%T Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster

%J International Journal of Computer Applications

%@ 0975-8887

%V 72

%N 8

%P 48-55

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

With the development of information technology, a large volume of data is growing and getting stored electronically. Thus, the data volumes processing by many applications will routinely cross the petabyte threshold range, in that case it would increase the computational requirements. Efficient processing algorithms and implementation techniques are the key in meeting the scalability and performance requirements in such scientific data analyses. So for the same here, it has been analyzed with the various MapReduce Programs and a parallel clustering algorithm (PKMeans) on Hadoop cluster, using the Concept of MapReduce. Here, in this experiment we have verified and validated various MapReduce applications like wordcount, grep, terasort and parallel K-Means Clustering Algorithm. It has been found that as the number of nodes increases the execution time decreases, but also some of the interesting cases has been found during the experiment and recorded the various performance change and drawn different performance graphs. This experiment is basically a research study of above MapReduce applications and also to verify and validate the MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster having four nodes.

References

Apache Hadoop. http://hadoop. apache. org/
Sanjay Ghemawat, Howard Gobioff, and Shun-TakLeung "The Google File System", Google,Sosp'03, October 19–22, 2003, Bolton Landing,New York, USA. Copyright 2003 ACM 1-58113-757.
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, "The Hadoop Distributed File System". Yahoo! Sunnyvale, California USA, IEEE, 2010
Jeffrey Dean and Sanjay Ghemawat "MapReduce: Simplified Data Processing On Large Clusters" 2009,
,Google, Inc. , Usenix Association OSDI '04: 6thSymposium on Operating Systems Design andImplementation.
http://www. ise. bgu. ac. il/faculty/liorr/hbchap15. pdf
http://en. wikipedia. org/wiki/Cluster_analysis#Newer_developments
http://www. cloudera. com
https://wiki. cloudera. com/display/DOC/CDH+Installation+Guide.
http://en. wikipedia. org/wiki/Machine_learning
Mahesh Maurya and Sunita Mahajan. "Performance analysis of MapReduce Programs on Hadoop cluster", World Congress on Information and Communication Technologies 2012.
Weizhong Zhao, Huifang Ma, and Qing He, "Parallel K-Means Clustering Based on MapReduce", 2009.
http://www. kdnuggets. com/gpspubs/aimag-kdd-overview-1996-Fayyad. pdf
http://ieeexplore. ieee. org/
http://www. springerlink. com/content/c621194607866223
David Barber, "Bayesian Reasoning and Machine Learning", Cambridge 2011; New York: Cambridge University Press.
Ron Bekkerman, Mikhail Bilenko, John Langford, "Scalable Machine Learning" Cambridge University Press, 2012.
Jimmy Lin and Chris Dyer, "Data-Intensive Text Processing with MapReduce", April 2010, University of Maryland, College Park.
Tom White, "Hadoop: The Definitive Guide", 2009 Published by O'Reilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

Index Terms

Computer Science

Information Sciences

Keywords

Machine learning Hadoop MapReduce k-means wordcount grep terasort