Data-Driven Diagnosis of Heart Disease

Md. Istiaq Habib Khan; M. Rubaiyat Hossain Mondal

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Wirelessly Transmitting a Grayscale Image using Visible Light

November

2012

Development and Performance Evaluation of Mismatched Filter using Differential Evolution

May

2012

A Novel Prioritised Concealment and Flexible Macroblock Ordering Scheme for Video Transmission

Sep

2016

An Optimizing Technique based on Genetic Algorithm for Power Management in Heterogeneous Multi-Tier Web Clusters

April

2015

Reseach Article

Data-Driven Diagnosis of Heart Disease

by Md. Istiaq Habib Khan, M. Rubaiyat Hossain Mondal

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 176 - Number 41

Year of Publication: 2020

Authors: Md. Istiaq Habib Khan, M. Rubaiyat Hossain Mondal

10.5120/ijca2020920549

Md. Istiaq Habib Khan, M. Rubaiyat Hossain Mondal . Data-Driven Diagnosis of Heart Disease. International Journal of Computer Applications. 176, 41 ( Jul 2020), 46-54. DOI=10.5120/ijca2020920549

@article{ 10.5120/ijca2020920549,

author = { Md. Istiaq Habib Khan, M. Rubaiyat Hossain Mondal },

title = { Data-Driven Diagnosis of Heart Disease },

journal = { International Journal of Computer Applications },

issue_date = { Jul 2020 },

volume = { 176 },

number = { 41 },

month = { Jul },

year = { 2020 },

issn = { 0975-8887 },

pages = { 46-54 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume176/number41/31477-2020920549/ },

doi = { 10.5120/ijca2020920549 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:41:03.842305+05:30

%A Md. Istiaq Habib Khan

%A M. Rubaiyat Hossain Mondal

%T Data-Driven Diagnosis of Heart Disease

%J International Journal of Computer Applications

%@ 0975-8887

%V 176

%N 41

%P 46-54

%D 2020

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper focuses on the data-driven diagnosis of heart disease using three freely available datasets. The first dataset has 303 instances with 14 attributes, the second dataset has 462 instances with 10 attributes and the third dataset has 70000 instances with 12 attributes. Scikit-learn library of Python programing language is used for data analysis purpose. Univariate feature selection algorithm is applied in order to find the most valuable attributes and risk factors associated with heart disease. Experimental results show that the most important attribute of the first dataset is the maximum heart rate achieved by a patient, while that of the second and third dataset is the patient age. Next, the heart disease is predicted using several machine learning algorithms including support vector machine (SVM), decision tree, k-nearest neighbors (kNN), logistic regression, naïve Bayes, random forest and majority voting. The training and testing portion of each dataset is separated using holdout and cross-validation methods. The performance of different algorithms for three datasets are evaluated in terms of testing accuracy, precision, recall and F1-score. It is shown here that majority voting as a combination of logistic regression, SVM and naïve Bayes exhibits the best accuracy of 88.89% when applied to the first dataset.

References

Go, A. S., Mozaffarian, D., Roger, V. L., Benjamin, E. J., Berry, J. D., Blaha, M.J., “Executive summary: heart disease and stroke statistics-2014 update: a report from the American heart association”, Circulation, vol. 129, no. 3, pp. 399-410, Jan. 2014. doi: 10.1161/01.cir.0000442015.53336.12.
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J. and Scholkopf, B., “Support vector machines”, in IEEE Intelligent systems and their applications, vol. 13, no. 4, pp. 18-28, July-Aug. 1998.
Wang, G., “A survey on training algorithms for support vector machine classifiers”, 2008 international conference on networked computing and advanced information management, pp. 123-128, Gyeongju, 2008.
Laaksonen, J. and Oja, E., “Classification with learning k-nearest neighbors”, Proceedings of International Conference on Neural Networks (ICNN'96), Washington, DC, USA, 1996, pp. 1480-1483 vol.3.
Sanz, J.A., Galar, M., Jurio, A., Brugos, A., Pagola, M. and Bustince, H., “Medical diagnosis of cardiovascular diseases using an interval-valued fuzzy rule-based classification system”, Applied Soft Computing, vol. 20, pp. 103-111, July 2014. doi: 10.1016/j.asoc.2013.11.009.
Setiawan, N.A., “Fuzzy decision support system for coronary artery disease diagnosis based on rough set theory”, International Journal of Rough Sets and Data Analysis, vol. 1, no. 1, pp. 65-80, Jan. 2014. doi: 10.4018/ijrsda.2014010105.
Shouman, M., Turner, T. and Stocker, R., “Using decision tree for diagnosing heart disease patients”, Proceedings of the Ninth Australian Data Mining Conference, Australia, 2011, pp. 23-30.
Marateb, H.R. and Goudarzi, S., “A noninvasive method for coronary artery disease diagnosis using a clinically interpretable fuzzy-rule based system”, Journal of Research in Medical Sciences, vol. 20, no. 3, pp. 214-223, March 2015.
Goni, M. Osman, “Development of a web based expert system for diagnosis of heart disease using fuzzy logic”, M. Engg. Project, Institute of Information and Communication Technology, BUET, 2019.
Latha, C.B.C and Jeeva, S.C., “Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques”, Informatics in Medicine Unlocked, vol. 16, 2019, 100203. doi: 10.1016/j.imu.2019.100203.
Raihan-Al-Masud M, Mondal MRH. Data-driven diagnosis of spinal abnormalities using feature selection and machine learning algorithms. PLOS ONE. 2020; 15(2): e0228422.
Heart disease dataset, UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/Heart+Disease [Last accessed on 31 Mar. 2020].
Cardiovascular disease, https://www.kaggle.com/yassinehamdaoui1/cardiovascular-disease [Last accessed on 31 Mar. 2020].
Cardiovascular disease dataset, https://www.kaggle.com/sulianova/cardiovascular-disease-dataset [Last accessed on 31 Mar. 2020].
Anaconda distribution website, https://www.anaconda.com/distribution/ [Last accessed on 12 Feb. 2020].
Saeys Y, Inza I, and Larranaga p. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507–2517.
He J, Hu HJ, Harrison R, Tai PC, & Pan Y (2006). Transmembrane segments prediction and understanding using support vector machine and decision tree. Expert Systems with Applications, 30(1), 64–72.
Witten IH, & Frank E (2005). Data mining: Practical machine learning tools and techniques.
Keerthi SS, & Lin CJ (2003). Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Computation, 15(7), 1667–1689.
Lin HT, & Lin CJ (2003). A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Taipei: Department of Computer Science and Information Engineering, National Taiwan University.
Chen M, Hao Y, Hwang K, Wang L, and Wang L, Disease Prediction by Machine Learning Over Big Data From Healthcare Communities, IEEE Access.
Bharati S., Podder P., Mondal M. R. H., and Robel M. R. A., “Threats and Countermeasures of Cyber Security in Direct and Remote Vehicle Communication Systems”, Journal of Information Assurance and Security, MIR Labs, USA, vol. 15 (2020), pp. 153-164, May 2020.
Bharati S., Podder P., and Mondal M. R. H., Diagnosis of Polycystic Ovary Syndrome Using Machine Learning Algorithms. Presented at 2020 IEEE Region 10 Symposium (TENSYMP), 5-7 June 2020, Bangladesh.
Mondal M. R. H., Bharati S., Podder P., Podder P., “Data Analytics for Novel Coronavirus Disease”, Informatics in Medicine Unlocked, Elsevier, Early version available in June 2020.
Khanam F., Nowrin I., and Mondal M. R. H., “Data Visualization and Analyzation of COVID-19”, Journal of Scientific Research and Reports, vol. 26, no. 3, pp. 42-52, Apr. 2020.
Bharati, S., Podder, P., “Adaptive PAPR Reduction Scheme for OFDM Using SLM with the Fusion of Proposed Clipping and Filtering Technique in Order to Diminish PAPR and Signal Distortion". Wireless Personal Communication (2020). https://doi.org/10.1007/s11277-020-07323-0.
Mondal, M. R. H., and Armstrong, J., "Analysis of the effect of vignetting on MIMO optical wireless systems using spatial OFDM", Journal of Lightwave Technology, IEEE & OSA, vol. 32, no. 5, pp. 922-929, March 2014.
Sarker, N., Islam, M. A., and Mondal, M. R. H., "Two Novel Multiband Centimetre-Wave Patch Antennas for a Novel OFDM Based RFID System", Journal of Communications (JCM), ISSN: 1796-2021, vol. 13, no. 6, Jun. 2018.

Index Terms

Computer Science

Information Sciences

Keywords

feature selection heart disease SVM logistic regression recall machine learning disease prediction.