CFP last date
20 January 2025
Call for Paper
February Edition
IJCA solicits high quality original research papers for the upcoming February edition of the journal. The last date of research paper submission is 20 January 2025

Submit your paper
Know more
Reseach Article

Implementation of Feature Selection using Correlation Matrix in Python

by Ahmad Farhan AlShammari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 58
Year of Publication: 2024
Authors: Ahmad Farhan AlShammari
10.5120/ijca2024924341

Ahmad Farhan AlShammari . Implementation of Feature Selection using Correlation Matrix in Python. International Journal of Computer Applications. 186, 58 ( Dec 2024), 29-34. DOI=10.5120/ijca2024924341

@article{ 10.5120/ijca2024924341,
author = { Ahmad Farhan AlShammari },
title = { Implementation of Feature Selection using Correlation Matrix in Python },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2024 },
volume = { 186 },
number = { 58 },
month = { Dec },
year = { 2024 },
issn = { 0975-8887 },
pages = { 29-34 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number58/implementation-of-feature-selection-using-correlation-matrix-in-python/ },
doi = { 10.5120/ijca2024924341 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-12-27T02:46:14+05:30
%A Ahmad Farhan AlShammari
%T Implementation of Feature Selection using Correlation Matrix in Python
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 58
%P 29-34
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The goal of this research is to develop a feature selection program using correlation matrix in Python. Feature selection is used to determine the most important features in data. It helps to reduce the number of features, decrease the complexity of computations, increase the accuracy, and improve the performance of the applied model. Correlation matrix is used to measure the correlation between the input (independent) features and the output (dependent) feature. The input features that are highly correlated with the output feature are identified, filtered, and selected. The basic steps of feature selection using correlation matrix are explained: preparing data (input and output), creating transpose of input data, creating data matrix, computing correlation matrix, plotting correlation matrix, selecting features (adding relevant features and removing redundant features), and printing selected features. The developed program was tested on an experimental dataset. The program successfully performed the basic steps of feature selection using correlation matrix and provided the required results.

References
  1. Sammut, C., & Webb, G. I. (2011). "Encyclopedia of Machine Learning". Springer Science & Business Media.
  2. Jung, A. (2022). "Machine Learning: The Basics". Singapore: Springer.
  3. Kubat, M. (2021). "An Introduction to Machine Learning". Cham, Switzerland: Springer.
  4. Li, H. (2023). "Machine Learning Methods". Springer Nature.
  5. Dey, A. (2016). "Machine Learning Algorithms: A Review". International Journal of Computer Science and Information Technologies, 7 (3), 1174-1179.
  6. Bonaccorso, G. (2018). "Machine Learning Algorithms: Popular Algorithms for Data Science and Machine Learning". Packt Publishing.
  7. Jo, T. (2021). "Machine Learning Foundations: Supervised, Unsupervised, and Advanced Learning". Springer.
  8. Jordan, M. I., & Mitchell, T. M. (2015). "Machine Learning: Trends, Perspectives, and Prospects". Science, 349(6245), 255-260.
  9. Forsyth, D. (2019). "Applied Machine Learning". Cham, Switzerland: Springer.
  10. Chopra, D., & Khurana, R. (2023). "Introduction to Machine Learning with Python". Bentham Science Publishers.
  11. Müller, A. C., & Guido, S. (2016). "Introduction to Machine Learning with Python: A Guide for Data Scientists". O'Reilly Media.
  12. Zollanvari, A. (2023). "Machine Learning with Python: Theory and Implementation". Springer Nature.
  13. Raschka, S. (2015). "Python Machine Learning". Packt Publishing.
  14. Sarkar, D., Bali, R., & Sharma, T. (2018). "Practical Machine Learning with Python". Apress.
  15. Swamynathan, M. (2019). "Mastering Machine Learning with Python in Six Steps: A Practical Implementation Guide to Predictive Data Analytics using Python". Apress.
  16. Kong, Q., Siauw, T., & Bayen, A. (2020). "Python Programming and Numerical Methods: A Guide for Engineers and Scientists". Academic Press.
  17. Yale, K., Nisbet, R., & Miner, G. D. (2017). "Handbook of Statistical Analysis and Data Mining Applications". Elsevier.
  18. Unpingco, J. (2022). "Python for Probability, Statistics, and Machine Learning". Cham, Switzerland: Springer.
  19. Brandt, S. (2014). "Data Analysis: Statistical and Computational Methods for Scientists and Engineers". Springer.
  20. VanderPlas, J. (2017). "Python Data Science Handbook: Essential Tools for Working with Data". O'Reilly Media.
  21. James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). "An Introduction to Statistical Learning: With Applications in Python". Springer Nature.
  22. Hall, M. A. (1999). "Correlation-based Feature Selection for Machine Learning". (Doctoral Dissertation, The University of Waikato).
  23. Raschka, S. (2018). "Feature Selection, Model Selection, and Algorithm Selection in Machine Larning". arXiv preprint arXiv:1811.12808.
  24. Gopika, N., & ME, A. M. K. (2018). "Correlation based Feature Selection Algorithm for Machine Learning". In 2018 3rd International Conference on Communication and Electronics Systems (ICCES) (pp. 692-695). IEEE.
  25. Cai, J., Luo, J., Wang, S., & Yang, S. (2018). "Feature Selection in Machine Learning: A New Perspective". Neuro Computing, 300, 70-79.
  26. Blum, A. L., & Langley, P. (1997). "Selection of Relevant Features and Examples in Machine Learning". Artificial Intelligence, 97(1-2), 245-271.
  27. Solorio-Fernández, S., Carrasco-Ochoa, J. A., & Martínez-Trinidad, J. F. (2020). "A Review of Unsupervised Feature Selection Methods". Artificial Intelligence Review, 53(2), 907-948.
  28. Python: https://www.python.org
  29. Numpy: https://www.numpy.org
  30. Pandas: https:// pandas.pydata.org
  31. Matplotlib: https://www. matplotlib.org
  32. NLTK: https://www.nltk.org
  33. SciPy: https://scipy.org
  34. SK Learn: https://scikit-learn.org
  35. Kaggle: https://www.kaggle.com
Index Terms

Computer Science
Information Sciences

Keywords

Artificial Intelligence Machine Learning Feature Selection Filtering Correlation Matrix Correlation Coefficient Features Relevant Redundant Python Programming