We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster

by Amresh Kumar, Kiran M., Saikat Mukherjee, Ravi Prakash G.
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 72 - Number 8
Year of Publication: 2013
Authors: Amresh Kumar, Kiran M., Saikat Mukherjee, Ravi Prakash G.
10.5120/12518-9099

Amresh Kumar, Kiran M., Saikat Mukherjee, Ravi Prakash G. . Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster. International Journal of Computer Applications. 72, 8 ( June 2013), 48-55. DOI=10.5120/12518-9099

@article{ 10.5120/12518-9099,
author = { Amresh Kumar, Kiran M., Saikat Mukherjee, Ravi Prakash G. },
title = { Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster },
journal = { International Journal of Computer Applications },
issue_date = { June 2013 },
volume = { 72 },
number = { 8 },
month = { June },
year = { 2013 },
issn = { 0975-8887 },
pages = { 48-55 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume72/number8/12518-9099/ },
doi = { 10.5120/12518-9099 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:37:25.465289+05:30
%A Amresh Kumar
%A Kiran M.
%A Saikat Mukherjee
%A Ravi Prakash G.
%T Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster
%J International Journal of Computer Applications
%@ 0975-8887
%V 72
%N 8
%P 48-55
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the development of information technology, a large volume of data is growing and getting stored electronically. Thus, the data volumes processing by many applications will routinely cross the petabyte threshold range, in that case it would increase the computational requirements. Efficient processing algorithms and implementation techniques are the key in meeting the scalability and performance requirements in such scientific data analyses. So for the same here, it has been analyzed with the various MapReduce Programs and a parallel clustering algorithm (PKMeans) on Hadoop cluster, using the Concept of MapReduce. Here, in this experiment we have verified and validated various MapReduce applications like wordcount, grep, terasort and parallel K-Means Clustering Algorithm. It has been found that as the number of nodes increases the execution time decreases, but also some of the interesting cases has been found during the experiment and recorded the various performance change and drawn different performance graphs. This experiment is basically a research study of above MapReduce applications and also to verify and validate the MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster having four nodes.

References
  1. Apache Hadoop. http://hadoop. apache. org/
  2. Sanjay Ghemawat, Howard Gobioff, and Shun-TakLeung "The Google File System", Google,Sosp'03, October 19–22, 2003, Bolton Landing,New York, USA. Copyright 2003 ACM 1-58113-757.
  3. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, "The Hadoop Distributed File System". Yahoo! Sunnyvale, California USA, IEEE, 2010
  4. Jeffrey Dean and Sanjay Ghemawat "MapReduce: Simplified Data Processing On Large Clusters" 2009,
  5. ,Google, Inc. , Usenix Association OSDI '04: 6thSymposium on Operating Systems Design andImplementation.
  6. http://www. ise. bgu. ac. il/faculty/liorr/hbchap15. pdf
  7. http://en. wikipedia. org/wiki/Cluster_analysis#Newer_developments
  8. http://www. cloudera. com
  9. https://wiki. cloudera. com/display/DOC/CDH+Installation+Guide.
  10. http://en. wikipedia. org/wiki/Machine_learning
  11. Mahesh Maurya and Sunita Mahajan. "Performance analysis of MapReduce Programs on Hadoop cluster", World Congress on Information and Communication Technologies 2012.
  12. Weizhong Zhao, Huifang Ma, and Qing He, "Parallel K-Means Clustering Based on MapReduce", 2009.
  13. http://www. kdnuggets. com/gpspubs/aimag-kdd-overview-1996-Fayyad. pdf
  14. http://ieeexplore. ieee. org/
  15. http://www. springerlink. com/content/c621194607866223
  16. David Barber, "Bayesian Reasoning and Machine Learning", Cambridge 2011; New York: Cambridge University Press.
  17. Ron Bekkerman, Mikhail Bilenko, John Langford, "Scalable Machine Learning" Cambridge University Press, 2012.
  18. Jimmy Lin and Chris Dyer, "Data-Intensive Text Processing with MapReduce", April 2010, University of Maryland, College Park.
  19. Tom White, "Hadoop: The Definitive Guide", 2009 Published by O'Reilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
Index Terms

Computer Science
Information Sciences

Keywords

Machine learning Hadoop MapReduce k-means wordcount grep terasort