International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 72 - Number 8 |
Year of Publication: 2013 |
Authors: Amresh Kumar, Kiran M., Saikat Mukherjee, Ravi Prakash G. |
10.5120/12518-9099 |
Amresh Kumar, Kiran M., Saikat Mukherjee, Ravi Prakash G. . Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster. International Journal of Computer Applications. 72, 8 ( June 2013), 48-55. DOI=10.5120/12518-9099
With the development of information technology, a large volume of data is growing and getting stored electronically. Thus, the data volumes processing by many applications will routinely cross the petabyte threshold range, in that case it would increase the computational requirements. Efficient processing algorithms and implementation techniques are the key in meeting the scalability and performance requirements in such scientific data analyses. So for the same here, it has been analyzed with the various MapReduce Programs and a parallel clustering algorithm (PKMeans) on Hadoop cluster, using the Concept of MapReduce. Here, in this experiment we have verified and validated various MapReduce applications like wordcount, grep, terasort and parallel K-Means Clustering Algorithm. It has been found that as the number of nodes increases the execution time decreases, but also some of the interesting cases has been found during the experiment and recorded the various performance change and drawn different performance graphs. This experiment is basically a research study of above MapReduce applications and also to verify and validate the MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster having four nodes.