CFP last date
20 December 2024
Reseach Article

Parallel Optimal Grid-Clustering algorithm exploration on MapReduce Framework

by B. Hanmanthu, R. Rajesh, P. Niranjan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 180 - Number 39
Year of Publication: 2018
Authors: B. Hanmanthu, R. Rajesh, P. Niranjan
10.5120/ijca2018917041

B. Hanmanthu, R. Rajesh, P. Niranjan . Parallel Optimal Grid-Clustering algorithm exploration on MapReduce Framework. International Journal of Computer Applications. 180, 39 ( May 2018), 35-39. DOI=10.5120/ijca2018917041

@article{ 10.5120/ijca2018917041,
author = { B. Hanmanthu, R. Rajesh, P. Niranjan },
title = { Parallel Optimal Grid-Clustering algorithm exploration on MapReduce Framework },
journal = { International Journal of Computer Applications },
issue_date = { May 2018 },
volume = { 180 },
number = { 39 },
month = { May },
year = { 2018 },
issn = { 0975-8887 },
pages = { 35-39 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume180/number39/29389-2018917041/ },
doi = { 10.5120/ijca2018917041 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:03:08.605064+05:30
%A B. Hanmanthu
%A R. Rajesh
%A P. Niranjan
%T Parallel Optimal Grid-Clustering algorithm exploration on MapReduce Framework
%J International Journal of Computer Applications
%@ 0975-8887
%V 180
%N 39
%P 35-39
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The MapReduce frame work is one which is proven that is as the best suitable framework which can be used to carry out Big data analytics. The big data analytics playing a vital role in real time data analysis applications. Where as in the conventional data mining techniques the clustering technique is proven as that the most useful technique for effective data analysis. From our literature review we found that there are no sufficient clustering techniques suitable for processing big data. Taking this as a disadvantage we are exploring the optimal grid clustering techniques for big data analysis using MapReduce architecture. The initial level experiments conducted using this proposed model is shown magnificent upshot.

References
  1. I. S. Dhillon, and D.S. Modha, “A data-clustering algorithm on distributed memory multiprocessors,” In Large-Scale Parallel Data Mining. Springer Berlin Heidelberg, p. 245-260, 2000.
  2. K. Stoffel and A. Belkoniene, “Parallel k/h-means clustering for large data sets,” In Euro-Par’99 Parallel Processing. Springer Berlin Heidelberg, p. 1451-1454, 1999.
  3. H. S. Nagesh, S. Goil, and A. Choudhary, “A scalable parallel subspace clustering algorithm for massive data sets,” In Parallel Processing, 2000. Proceedings. International Conference on. IEEE, p. 477-484, 2000.
  4. A. Fahad, N. Alshatri, Z. Tari, A. ALAmri, A. Y. Zomaya, I. Khalil, F. Sebti, and A. Bouras, “A Survey of Clustering Algorithms for Big Data: Taxonomy & Empirical Analysis,” IEEE transactions on emerging topics in computing, 2014.
  5. The Big Data Long Tail. Blog post by Bloomberg, Jason. On January 17, 2013. [online] http://www.devx.com/blog/the-big-data-long-tail.html.
  6. The Fourth Paradigm: Data-Intensive Scientific Discovery. Edited by Hey, T. , Tansley, S. and Tolle, K.. Microsoft Corporation, October 2009. ISBN 978-0-9825442-0-4.
  7. Rajaraman A, Ullman JD: Mining of Massive Datasets. Cambridge – United Kingdom: Cambridge University Press; 2012.
  8. Coulouris GF, Dollimore J, Kindberg T: Distributed Systems: Concepts and Design: Pearson Education; 2005
  9. de Oliveira Branco M: Distributed Data Management for Large Scale Applications. Southampton – United Kingdom: University of Southampton; 2009.
  10. A. Sherin, S. Uma, K.Saranya and M. Saranya Vani ”Survey On Big Data Mining Platforms, Algorithms And Challenges”. International Journal of Computer Science & Engineering Technology,Vol. 5 No, 2014.
  11. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” in Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST ’10), pp. 1–6, IEEE, May 2010.
  12. D. Sobhy, Y. El-Sonbaty, and M. Abou Elnasr, “MedCloud: healthcare cloud computing system,” in Proceedings of the International Conference for Internet Technology and Secured Transactions, pp. 161–166, IEEE, London, UK, December 2012.
  13. J.Dean and S.Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
  14. F. Wang, V. Ercegovac, T. Syeda-Mahmood et al., “Largescale multimodal mining for healthcare with mapreduce,” in Proceedings of the 1st ACM International Health Informatics Symposium, pp. 479–483,ACM,November 2010.
  15. W.S. Li, J. Yan, Y. Yan, and J. Zhang, “Xbase: cloud-enabled information appliance for healthcare,” in Proceedings of the 13th International Conference on ExtendingDatabase Technology (EDBT ’10), pp. 675–680, March 2010.
  16. A.BEN AYED, M.BEN HALIMA and M. ALIMI, “Survey on clustering methods: Towards fuzzy clustering for Big Data,” In Soft Computing and Pattern Recognition (SoCPaR), 6th International Conference of. IEEE, p. 331-336, 2014.
  17. Keim, D.et all A. Optimal Grid-clustering:Towards breaking the curse of dimensionality in high-dimensional clustering. In Proceedings of the 25th Conference on VLDB, 506-517, 1999.
  18. Kaufman, L., and Rousseeuw, P. J. Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, Inc., New York, NY, 1990.
  19. Bezdek, J. C., Ehrlich, R., and Full, W. Fcm: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2):191–203, 1984.
  20. Zhang, T., Ramakrishnan, R., and Livny, M. Birch: an efficient data clustering method for very large databases. ACM SIGMOD Record, volume 25, pp. 103–114, 1996
  21. S.ARORA, I.CHANA, “A survey of clustering techniques for Big Data analysis,” in Confluence The Next Generation Information Technology Summit (Confluence), 5th International Conference-. IEEE, p. 59-65, 2014.
  22. P. Batra NAGPAL, and P. Ahlawat MANN, “Survey of Density Based Clustering Algorithms,” International journal of Computer Science and its Applications, vol. 1, no 1, p. 313-317,2011
  23. R. XU and D. WUNSCH, “Survey of clustering algorithms,” Neural Networks, IEEE Transactions, vol. 16, no 3, p. 645-678, 2005.
  24. C. YADAV, S. WANG, et M. KUMAR, “Algorithm and approaches to handle large Data-A Survey,” International Journal of computer science and network, vol 2, issue 3, 2013.
  25. A. S. Shirkhorshidi, S. Aghabozorgi, T. Y. Wah, and T. Herawan, “Big Data Clustering: A Review,” In Computational Science and Its Applications–ICCSA 2014. Springer International Publishing, p. 707- 720. 2014.
  26. Hinneburg, A., and Keim, D. A. Optimal Grid-clustering:Towards breaking the curse of dimensionality in high-dimensional clustering. In Proceedings of the 25th Conference on VLDB, 506-517, 1999.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering algorithm Parallel OptiGrid Data analytics