We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Avoiding Objects with few Neighbors in the K-Means Process and Adding ROCK Links to Its Distance

by Hadi A. Alnabriss, Wesam Ashour
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 28 - Number 10
Year of Publication: 2011
Authors: Hadi A. Alnabriss, Wesam Ashour
10.5120/3421-4040

Hadi A. Alnabriss, Wesam Ashour . Avoiding Objects with few Neighbors in the K-Means Process and Adding ROCK Links to Its Distance. International Journal of Computer Applications. 28, 10 ( August 2011), 12-17. DOI=10.5120/3421-4040

@article{ 10.5120/3421-4040,
author = { Hadi A. Alnabriss, Wesam Ashour },
title = { Avoiding Objects with few Neighbors in the K-Means Process and Adding ROCK Links to Its Distance },
journal = { International Journal of Computer Applications },
issue_date = { August 2011 },
volume = { 28 },
number = { 10 },
month = { August },
year = { 2011 },
issn = { 0975-8887 },
pages = { 12-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume28/number10/3421-4040/ },
doi = { 10.5120/3421-4040 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:14:25.398875+05:30
%A Hadi A. Alnabriss
%A Wesam Ashour
%T Avoiding Objects with few Neighbors in the K-Means Process and Adding ROCK Links to Its Distance
%J International Journal of Computer Applications
%@ 0975-8887
%V 28
%N 10
%P 12-17
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

K-means is considered as one of the most common and powerful algorithms in data clustering, in this paper we're going to present new techniques to solve two problems in the K-means traditional clustering algorithm, the 1st problem is its sensitivity for outliers, in this part we are going to depend on a function that will help us to decide if this object is an outlier or not, if it was an outlier it will be expelled from our calculations, that will help the K-means to make good results even if we added more outlier points; in the second part we are going to make K-means depend on Rock links in addition to its traditional distance, Rock links takes into account the number of common neighbors between two objects, that will make the K-means able to detect shapes that can't be detected by the traditional K-means.

References
  1. J. Hartigan and M. Wang. A K-means clustering algorithm. Applied Statistics, 28:100{108, 1979.
  2. S. P. Lloyd. Least squares quantization in pcm. Technical note, Bell Laboratories, 1957. Pub- lished in 1982 in IEEE Transactions on Information Theory 28, 128-137.
  3. J. MacQueen. Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 1967.
  4. D. Arthur and S. Vassilvitskii. K-means++: The advantages of careful seeding. In Bay Area Theory Symposium, BATS 06, 2006.
  5. Hautamaki, V., Karkkainen, I., Franti, Outlier detection using k-nearest neighbour graph. In: 17th International Conference on Pattern Recognition (ICPR 2004), Cambridge, United Kingdom (2004) 430–433.
  6. Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, ROCK: A Robust Clustering Algorithm for Categorical Attributes.
  7. Ville Hautam¨aki, Svetlana Cherednichenko, Ismo Karkkainen, Tomi Kinnunen, and Pasi Franti, Improving K-means by Outlier Removal.
  8. Mu-Chun Su and Chien-Hsing Chou, A K-means Algorithm with a Novel Non-Metric Distance.
  9. Wesam Barbakh, Similarity Graphs.
  10. A. Likas, N. Vlassis and J. J. Verbeek, The Global K-means Clustering Algorithm. Pattern Recognition, vol. 2, pp. 451-461, 2002.
  11. Xiaoping Qing, Shijue Zheng, A new method for initialising the K-means clustering algorithm, 2009 Second International Symposium on Knowledge Acquisition and Modeling.
  12. G. H. Ball and D.I. Hall, “Some Fundamental Concepts and Synthesis Procedures for Pattern Recognition Preprocessors,” in Proc. of Int. Conf. Microwaves, Circuit Theory, and Information Theory, Tokyo, Japan, pp. 281-297, Sep. 1964.
  13. Mu-Chun Su and Chien-Hsing Chou, A K-means Algorithm with a Novel Non-Metric Distance.
  14. D. Reisfeld, H. Wolfsow, and Y. Yeshurun,“Context-Free Attentional Operators: the Generalized Symmetry Transform,” international Journal of Computer Vision, vol. 14, pp. 119 -130, 1995.
  15. Xiaochuan Wu and Colin Fyfe, On initializing prototypes for clustering.
  16. L. Breiman. Bagging predictors. Machine Learning, 24(2):123-140, 1996.
  17. W. Barbakh, M. Crowe, and C. Fyfe. A family of novel clustering algorithms. In 7th international conference on intelligent data engineering and automated learning, IDEAL2006, pages 283–290, September 2006. ISSN 0302-9743 ISBN-13 978-3-540-45485-4.
  18. M. Khalilian, N. Mustapha, M. N. Sulaiman, and F. Z. Boroujeni, "K-Means Divide and Conquer Clustering," in ICCAE, Thiland, Bangkok, 2009, pp. 306-309.
  19. Girolami, M. (2002). Mercer kernel based clustering in feature space. IEEE Transactionson Neural Networks (13(3)), 780-784.
  20. Kaufman, L., & Rousseuw, P. J. (1990). Finding Groups in Data. An Introduction to Cluster Analysis. John Wiley & Sons, Inc.
Index Terms

Computer Science
Information Sciences

Keywords

Robust K-means Rock links Initializing K-means electing centroids Optimizing K-means distance measurement