CFP last date
20 December 2024
Reseach Article

Statistical Approach for Predicting the Most Accurate Classification Algorithm for a Data Set in Analysis

by Shriniwas Nayak, Aditya Mahaddalkar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 176 - Number 28
Year of Publication: 2020
Authors: Shriniwas Nayak, Aditya Mahaddalkar
10.5120/ijca2020920306

Shriniwas Nayak, Aditya Mahaddalkar . Statistical Approach for Predicting the Most Accurate Classification Algorithm for a Data Set in Analysis. International Journal of Computer Applications. 176, 28 ( Jun 2020), 1-7. DOI=10.5120/ijca2020920306

@article{ 10.5120/ijca2020920306,
author = { Shriniwas Nayak, Aditya Mahaddalkar },
title = { Statistical Approach for Predicting the Most Accurate Classification Algorithm for a Data Set in Analysis },
journal = { International Journal of Computer Applications },
issue_date = { Jun 2020 },
volume = { 176 },
number = { 28 },
month = { Jun },
year = { 2020 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume176/number28/31373-2020920306/ },
doi = { 10.5120/ijca2020920306 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:43:39.203569+05:30
%A Shriniwas Nayak
%A Aditya Mahaddalkar
%T Statistical Approach for Predicting the Most Accurate Classification Algorithm for a Data Set in Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 176
%N 28
%P 1-7
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Classification algorithms under the category of data mining have widespread applications in the modern world finding their use in almost every field and area that aims at predicting an outcome class for some data instance. As a result of which many supervised classification algorithms have been studied in the field of machine learning. Many classification algorithms can be used to serve the purpose, K-Nearest Neighbor, Gaussian Naive Bayes, Decision Tree to name a few. However even today it is a time consuming and complex task to decide the most suitable algorithm for the data under consideration. This article discusses an approach that predicts an algorithm that would produce best accuracy for the given data, depending upon internal data parameters : size of data, ratio of numerical attributes, count of outliers, average correlation, number of classes in target and average number of classes in attributes. This paper analyses the relation between the performance of K-Nearest Neighbor, Logistic Regression, Gaussian Naive Bayes and Decision Tree classification algorithms and internal data parameters thereby evaluating a generic approach to determine the most accurate algorithm and also studies some limitations, like the inability of incorporating external factors namely memory requirement and others.

References
  1. Kaggle, (accessed March 2020). https://www.kaggle.com/.
  2. University of California Irvine Machine Learning Repository, (accessed March 2020). https://archive.ics.uci.edu/ml/index.php.
  3. Hetal Bhavsar and Amit Ganatra. A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering (IJSCE), 2(4):2231–2307, 2012.
  4. Giuseppe Bonaccorso. Machine learning algorithms. Packt Publishing Ltd, 2017.
  5. N. S. Chauhan. Decision tree algorithm explained, December 2019 (accessed March 2020). https://towardsdatascience.com/decision-tree-algorithmexplained- 83beb6e78ef4.
  6. Thomas Cover and Peter Hart. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1):21–27, 1967.
  7. Rafet Duriqi, Vigan Raca, and Betim Cico. Comparative analysis of classification algorithms on three different datasets using weka. In 2016 5th Mediterranean Conference on Embedded Computing (MECO), pages 335–338. IEEE, 2016.
  8. Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network classifiers. Machine learning, 29(2-3):131–163, 1997.
  9. Jiawei Han, Jian Pei, and Micheline Kamber. Data mining: concepts and techniques. Elsevier, 2011.
  10. Sayali D Jadhav and HP Channe. Comparative study of k-nn, naive bayes and decision tree classification techniques. International Journal of Science and Research (IJSR), 5(1):1842–1845, 2016.
  11. Ron Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, volume 14, pages 1137–1145. Montreal, Canada, 1995.
  12. V Krishnaiah, G Narsimha, and N Subhash Chandra. Survey of classification techniques in data mining. International Journal of Computer Sciences and Engineering, 2(9):65–74, 2014.
  13. Gang Luo. A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Network Modeling Analysis in Health Informatics and Bioinformatics, 5(1):18, 2016.
  14. Sagar S Nikam. A comparative study of classification techniques in data mining algorithms. Oriental journal of computer science & technology, 8(1):13–19, 2015.
  15. N Satyanarayana, CH Ramalingaswamy, and Y Ramadevi. Survey of classification techniques in data mining. International Journal of Innovative Science, Engineering & Technology, 1:268–278, 2014.
  16. Emc Education Services. Data science and big data analytics: Discovering, analyzing, visualizing and presenting data. pages 205–229, 2015.
  17. R. Shaikh. Choosing the best algorithm for your classification model, November 2018 (accessed March 2020). https://medium.com/datadriveninvestor/choosing-thebest- algorithm-for-your-classification-model-7c632c78f38f.
  18. Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.
  19. Farha Syeda, Mustafa Ali Baig Mirza, Ali Baig, andMPawar. Performance evaluation of different data mining classification algorithm and predictive analysis. IOSR, 10, 01 2013.
Index Terms

Computer Science
Information Sciences

Keywords

Supervised Learning Classification Algorithm Decision Tree Logistic Regression K Nearest Neighbors Naive Bayes