Implementation of Clustering using K-Means in Python

Ahmad Farhan AlShammari

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper

Know more

The week's pick

Assessing LLMs as Cognitive Interpreters of Student Prompts: A Typological Framework

Tadeu da Ponte Matevz Vremec Matej Mertik

Random Articles

Reseach Article

Implementation of Clustering using K-Means in Python

by Ahmad Farhan AlShammari

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 186 - Number 40

Year of Publication: 2024

Authors: Ahmad Farhan AlShammari

10.5120/ijca2024923990

Ahmad Farhan AlShammari . Implementation of Clustering using K-Means in Python. International Journal of Computer Applications. 186, 40 ( Sep 2024), 12-17. DOI=10.5120/ijca2024923990

@article{ 10.5120/ijca2024923990,

author = { Ahmad Farhan AlShammari },

title = { Implementation of Clustering using K-Means in Python },

journal = { International Journal of Computer Applications },

issue_date = { Sep 2024 },

volume = { 186 },

number = { 40 },

month = { Sep },

year = { 2024 },

issn = { 0975-8887 },

pages = { 12-17 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume186/number40/implementation-of-clustering-using-k-means-in-python/ },

doi = { 10.5120/ijca2024923990 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-09-27T00:46:19+05:30

%A Ahmad Farhan AlShammari

%T Implementation of Clustering using K-Means in Python

%J International Journal of Computer Applications

%@ 0975-8887

%V 186

%N 40

%P 12-17

%D 2024

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The goal of this research is to develop a clustering program using k-means method in Python. Clustering helps to divide data into clusters (or groups) based on their features. K-means is used to assign the data points to the cluster of the closest center. Euclidean distance is used to measure the distances between the data points and the centers. K-means is an iterative method that continues in processing to update the centers until the final clusters are obtained. The basic steps of clustering using k-means are explained: preparing data, initializing centers, computing labels (computing distances, finding minimum distance, and assigning labels), computing clusters, computing error function, updating centers, and plotting clusters. The developed program was tested on an experimental dataset. The program successfully performed the basic steps of clustering using k-means and provided the required results.

References

Sammut, C., & Webb, G. I. (2011). "Encyclopedia of Machine Learning". Springer Science & Business Media.
Jung, A. (2022). "Machine Learning: The Basics". Singapore: Springer.
Kubat, M. (2021). "An Introduction to Machine Learning". Cham, Switzerland: Springer.
Li, H. (2023). "Machine Learning Methods". Springer Nature.
Mohammed, M., Khan, M. B., & Bashier, E. B. M. (2016). "Machine Learning: Algorithms and Applications". Crc Press.
Dey, A. (2016). "Machine Learning Algorithms: A Review". International Journal of Computer Science and Information Technologies, 7 (3), 1174-1179.
Bonaccorso, G. (2018). "Machine Learning Algorithms: Popular Algorithms for Data Science and Machine Learning". Packt Publishing.
Jo, T. (2021). "Machine Learning Foundations: Supervised, Unsupervised, and Advanced Learning". Springer.
Chopra, D., & Khurana, R. (2023). "Introduction to Machine Learning with Python". Bentham Science Publishers.
Müller, A. C., & Guido, S. (2016). "Introduction to Machine Learning with Python: A Guide for Data Scientists". O'Reilly Media.
Raschka, S. (2015). "Python Machine Learning". Packt Publishing.
Forsyth, D. (2019). "Applied Machine Learning". Cham, Switzerland: Springer.
Sarkar, D., Bali, R., & Sharma, T. (2018). "Practical Machine Learning with Python". Apress.
Han, J., Kamber, M., Pei, J. (2011). "Data Mining: Concepts and Techniques". Morgan Kaufmann, Burlington.
Hand, D., Smyth, P. (2001). "Principles of Data Mining". MIT Press, Cambridge
Kong, Q., Siauw, T., & Bayen, A. (2020). "Python Programming and Numerical Methods: A Guide for Engineers and Scientists". Academic Press.
Unpingco, J. (2022). "Python for Probability, Statistics, and Machine Learning". Cham, Switzerland: Springer.
Brandt, S. (2014). "Data Analysis: Statistical and Computational Methods for Scientists and Engineers". Springer.
VanderPlas, J. (2017). "Python Data Science Handbook: Essential Tools for Working with Data". O'Reilly Media.
James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). "An Introduction to Statistical Learning: With Applications in Python". Springer Nature.
Oyewole, G. J., & Thopil, G. A. (2023). "Data Clustering: Application and Trends". Artificial Intelligence Review, 56(7), 6439-6475.
Hartigan, J. A., & Wong, M. A. (1979). "A K-Means Clustering Algorithm". Applied Statistics, 28(1), 100-108.
Wilkin, G. A., Huang, X. (2007). "K-Means Clustering Algorithms: Implementation and Comparison". In: Proceedings of the Second International Multi-Symposiums on Computer and Computational Sciences, pp. 133–136. IEEE.
Pham, D. T., Dimov, S. S., & Nguyen, C. D. (2005). "Selection of K in K-means Clustering". Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 219(1), 103-119.
Kodinariya, T. M., & Makwana, P. R. (2013). "Review on Determining Number of Cluster in K-Means Clustering". International Journal, 1(6), 90-95.
Lloyd, S. P. (1957). "Least Squares Quantization in PCM". Bell Labs Paper. Published later in IEEE Transactions on Information Theory. (1957/1982), 18(11).
Forgy, E. W. (1965). "Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classifications". Biometrics. 21 (3): 768–769.
Python: https://www.python.org
Numpy: https://www.numpy.org
Pandas: https:// pandas.pydata.org
Matplotlib: https://www. matplotlib.org
NLTK: https://www.nltk.org
SciPy: https://scipy.org
SK Learn: https://scikit-learn.org
Kaggle: https://www.kaggle.com

Index Terms

Computer Science

Information Sciences

Keywords

Artificial Intelligence Machine Learning Clustering K-Means Euclidean Distance Centers Labels Clusters Error Function Python Programming.