International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 176 - Number 41 |
Year of Publication: 2020 |
Authors: Md. Istiaq Habib Khan, M. Rubaiyat Hossain Mondal |
10.5120/ijca2020920549 |
Md. Istiaq Habib Khan, M. Rubaiyat Hossain Mondal . Data-Driven Diagnosis of Heart Disease. International Journal of Computer Applications. 176, 41 ( Jul 2020), 46-54. DOI=10.5120/ijca2020920549
This paper focuses on the data-driven diagnosis of heart disease using three freely available datasets. The first dataset has 303 instances with 14 attributes, the second dataset has 462 instances with 10 attributes and the third dataset has 70000 instances with 12 attributes. Scikit-learn library of Python programing language is used for data analysis purpose. Univariate feature selection algorithm is applied in order to find the most valuable attributes and risk factors associated with heart disease. Experimental results show that the most important attribute of the first dataset is the maximum heart rate achieved by a patient, while that of the second and third dataset is the patient age. Next, the heart disease is predicted using several machine learning algorithms including support vector machine (SVM), decision tree, k-nearest neighbors (kNN), logistic regression, naïve Bayes, random forest and majority voting. The training and testing portion of each dataset is separated using holdout and cross-validation methods. The performance of different algorithms for three datasets are evaluated in terms of testing accuracy, precision, recall and F1-score. It is shown here that majority voting as a combination of logistic regression, SVM and naïve Bayes exhibits the best accuracy of 88.89% when applied to the first dataset.