CFP last date
20 February 2025
Reseach Article

Implementation of Clustering using K-Means in Python

by Ahmad Farhan AlShammari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 40
Year of Publication: 2024
Authors: Ahmad Farhan AlShammari
10.5120/ijca2024923990

Ahmad Farhan AlShammari . Implementation of Clustering using K-Means in Python. International Journal of Computer Applications. 186, 40 ( Sep 2024), 12-17. DOI=10.5120/ijca2024923990

@article{ 10.5120/ijca2024923990,
author = { Ahmad Farhan AlShammari },
title = { Implementation of Clustering using K-Means in Python },
journal = { International Journal of Computer Applications },
issue_date = { Sep 2024 },
volume = { 186 },
number = { 40 },
month = { Sep },
year = { 2024 },
issn = { 0975-8887 },
pages = { 12-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number40/implementation-of-clustering-using-k-means-in-python/ },
doi = { 10.5120/ijca2024923990 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-09-27T00:46:19+05:30
%A Ahmad Farhan AlShammari
%T Implementation of Clustering using K-Means in Python
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 40
%P 12-17
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The goal of this research is to develop a clustering program using k-means method in Python. Clustering helps to divide data into clusters (or groups) based on their features. K-means is used to assign the data points to the cluster of the closest center. Euclidean distance is used to measure the distances between the data points and the centers. K-means is an iterative method that continues in processing to update the centers until the final clusters are obtained. The basic steps of clustering using k-means are explained: preparing data, initializing centers, computing labels (computing distances, finding minimum distance, and assigning labels), computing clusters, computing error function, updating centers, and plotting clusters. The developed program was tested on an experimental dataset. The program successfully performed the basic steps of clustering using k-means and provided the required results.

References
  1. Sammut, C., & Webb, G. I. (2011). "Encyclopedia of Machine Learning". Springer Science & Business Media.
  2. Jung, A. (2022). "Machine Learning: The Basics". Singapore: Springer.
  3. Kubat, M. (2021). "An Introduction to Machine Learning". Cham, Switzerland: Springer.
  4. Li, H. (2023). "Machine Learning Methods". Springer Nature.
  5. Mohammed, M., Khan, M. B., & Bashier, E. B. M. (2016). "Machine Learning: Algorithms and Applications". Crc Press.
  6. Dey, A. (2016). "Machine Learning Algorithms: A Review". International Journal of Computer Science and Information Technologies, 7 (3), 1174-1179.
  7. Bonaccorso, G. (2018). "Machine Learning Algorithms: Popular Algorithms for Data Science and Machine Learning". Packt Publishing.
  8. Jo, T. (2021). "Machine Learning Foundations: Supervised, Unsupervised, and Advanced Learning". Springer.
  9. Chopra, D., & Khurana, R. (2023). "Introduction to Machine Learning with Python". Bentham Science Publishers.
  10. Müller, A. C., & Guido, S. (2016). "Introduction to Machine Learning with Python: A Guide for Data Scientists". O'Reilly Media.
  11. Raschka, S. (2015). "Python Machine Learning". Packt Publishing.
  12. Forsyth, D. (2019). "Applied Machine Learning". Cham, Switzerland: Springer.
  13. Sarkar, D., Bali, R., & Sharma, T. (2018). "Practical Machine Learning with Python". Apress.
  14. Han, J., Kamber, M., Pei, J. (2011). "Data Mining: Concepts and Techniques". Morgan Kaufmann, Burlington.
  15. Hand, D., Smyth, P. (2001). "Principles of Data Mining". MIT Press, Cambridge
  16. Kong, Q., Siauw, T., & Bayen, A. (2020). "Python Programming and Numerical Methods: A Guide for Engineers and Scientists". Academic Press.
  17. Unpingco, J. (2022). "Python for Probability, Statistics, and Machine Learning". Cham, Switzerland: Springer.
  18. Brandt, S. (2014). "Data Analysis: Statistical and Computational Methods for Scientists and Engineers". Springer.
  19. VanderPlas, J. (2017). "Python Data Science Handbook: Essential Tools for Working with Data". O'Reilly Media.
  20. James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). "An Introduction to Statistical Learning: With Applications in Python". Springer Nature.
  21. Oyewole, G. J., & Thopil, G. A. (2023). "Data Clustering: Application and Trends". Artificial Intelligence Review, 56(7), 6439-6475.
  22. Hartigan, J. A., & Wong, M. A. (1979). "A K-Means Clustering Algorithm". Applied Statistics, 28(1), 100-108.
  23. Wilkin, G. A., Huang, X. (2007). "K-Means Clustering Algorithms: Implementation and Comparison". In: Proceedings of the Second International Multi-Symposiums on Computer and Computational Sciences, pp. 133–136. IEEE.
  24. Pham, D. T., Dimov, S. S., & Nguyen, C. D. (2005). "Selection of K in K-means Clustering". Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 219(1), 103-119.
  25. Kodinariya, T. M., & Makwana, P. R. (2013). "Review on Determining Number of Cluster in K-Means Clustering". International Journal, 1(6), 90-95.
  26. Lloyd, S. P. (1957). "Least Squares Quantization in PCM". Bell Labs Paper. Published later in IEEE Transactions on Information Theory. (1957/1982), 18(11).
  27. Forgy, E. W. (1965). "Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classifications". Biometrics. 21 (3): 768–769.
  28. Python: https://www.python.org
  29. Numpy: https://www.numpy.org
  30. Pandas: https:// pandas.pydata.org
  31. Matplotlib: https://www. matplotlib.org
  32. NLTK: https://www.nltk.org
  33. SciPy: https://scipy.org
  34. SK Learn: https://scikit-learn.org
  35. Kaggle: https://www.kaggle.com
Index Terms

Computer Science
Information Sciences

Keywords

Artificial Intelligence Machine Learning Clustering K-Means Euclidean Distance Centers Labels Clusters Error Function Python Programming.