We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

A Modified Projected K-Means Clustering Algorithm with Effective Distance Measure

by B. Shanmugapriya, M. Punithavalli
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 44 - Number 8
Year of Publication: 2012
Authors: B. Shanmugapriya, M. Punithavalli
10.5120/6285-8468

B. Shanmugapriya, M. Punithavalli . A Modified Projected K-Means Clustering Algorithm with Effective Distance Measure. International Journal of Computer Applications. 44, 8 ( April 2012), 32-36. DOI=10.5120/6285-8468

@article{ 10.5120/6285-8468,
author = { B. Shanmugapriya, M. Punithavalli },
title = { A Modified Projected K-Means Clustering Algorithm with Effective Distance Measure },
journal = { International Journal of Computer Applications },
issue_date = { April 2012 },
volume = { 44 },
number = { 8 },
month = { April },
year = { 2012 },
issn = { 0975-8887 },
pages = { 32-36 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume44/number8/6285-8468/ },
doi = { 10.5120/6285-8468 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:35:01.988321+05:30
%A B. Shanmugapriya
%A M. Punithavalli
%T A Modified Projected K-Means Clustering Algorithm with Effective Distance Measure
%J International Journal of Computer Applications
%@ 0975-8887
%V 44
%N 8
%P 32-36
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering high dimensional data has been a big issue for clustering algorithms because of the intrinsic sparsity of the data points. Several recent research results signifies that in case of high dimensional data, even the notion of proximity or clustering possibly will not be significant. K-Means is one of the basic clustering algorithm which is commonly used in several applications, but it is not possible to discover subspace clusters. The subspaces are explicit to the clusters themselves. In this paper, an algorithm called Modified Projected K-Means Clustering Algorithm with Effective Distance Measure is designed to generalize K-Means algorithm with the objective of managing the high dimensional data. The experimental results confirm that the proposed algorithm is an efficient algorithm with better clustering accuracy and very less execution time than the Standard K-Means and General K-Means algorithms.

References
  1. Ali Alijamaat, Madjid Khalilian and Norwati Mustapha, "A Novel Approach for High Dimensional Data Clustering", Third International Conference on Knowledge Discovery and Data Mining, pp. 264-267, 2010.
  2. Witten, Ian H and Eibe Frank, "Data Mining–Practical Machine Learning Tools and Techniques", 2nd Edition, Morhan Kaufmann, San Fransisco, 2005.
  3. Tan, Pang Nin, Michael Steinbach and Vipin Kumar, "Introduction to Data Mining", Pearson International Edition, Boston, 2006.
  4. Poncelet, Pascal, Maguelonne Teisseire and Florent Masseglia, "Data Mining Patterns: New Method and Application", London, 2008.
  5. K. Beyer, J. Goldstein, R. Ramakrishnan and U. Shaft, "When is nearest neighbor meaningful?", Lecture Notes in Computer Science, Vol. 1540, Pp. 217-235, 1999.
  6. Gabriela Moise and Jorg Sander, "Finding Non- Redundant, Statistically Significant Regions in High Dimensional Data: A Novel Approach to Projected and Subspace Clustering", Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, DOI: 10. 1145/1401890. 1401956, 2008.
  7. Charu C. Aggarwal, J. Han, J. Wang, and S. Philip Yu, "A Framework for Projected Clustering of High Dimensional Data Streams", Proc. Very Large Data Base (VLDB '04), Pp. 852-863, 2004.
  8. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", Proc. ACM SIGMOD International conference on Management of data (SIGMOD 1998), Pp. 94-105, 1998.
  9. L. Parsons, E. Haque, and H. Liu, "Subspace Clustering for High Dimensional Data: A Review", ACM SIGKDD Explorations Newsletter, Vol. 6, Pp. 90-105, 2004.
  10. C. Böhm, K. Kailing, H. Kriegel, and P. Kröger, "Density Connected Clustering with Local Subspace Preferences," Proc. IEEE International Conference on Data Mining (ICDM 2004), Pp. 27-34, 2004.
  11. G. Moise, J. Sander and M. Ester, "P3C: A Robust Projected Clustering Algorithm", Sixth International Conference on Data Mining (ICDM '06), Pp. 414 – 425, 2006.
  12. K. Y. Yip, D. W. Cheung and M. K. Ng, "HARP: a practical projected clustering algorithm", IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, Pp. 1387 – 1397, 2004.
  13. Ada Wai-chee Fu and E. Ng Ka Ka, "Efficient algorithm for projected clustering", Proceedings. 18th International Conference on Data Engineering, DOI: 10. 1109/ICDE. 2002. 994727, 2002.
  14. Man Lung Yiu and N. Mamoulis, "Iterative projected clustering by subspace mining", IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 12, Pp. 176 – 189, 2005.
  15. Yufen Sun, Gang Liu and Kun Xu, "A k-Means-Based Projected Clustering Algorithm", Third International Joint Conference on Computational Science and Optimization (CSO), Vol. 1, Pp. 466 – 470, 2010.
  16. Nguyen, Ngoc-Thanh; Trawinski, Bogdan; Jung, Jason J. (Eds. ), "New Challenges for Intelligent Information and Database Systems", Vol. 251, 1st Edition, 2011.
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining Projected Clustering K-means High Dimensional Data General K-means Efficient Projected Clustering (epc)