International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 184 - Number 48 |
Year of Publication: 2023 |
Authors: Hossam Meshref |
10.5120/ijca2023922598 |
Hossam Meshref . Investigating the Impact of Prominent Factors on the Diagnosis of Diabetes and its Associated Diseases using Ensemble Machine Learning Models. International Journal of Computer Applications. 184, 48 ( Feb 2023), 19-30. DOI=10.5120/ijca2023922598
Diabetes Mellitus (DM) is a prevalent chronic condition that can lead to serious health consequences and even death. It is marked by hyperglycemia in which blood sugar levels are abnormally high. According to recent data, there will be 642 million diabetics by 2040, which implies one in every ten persons will have diabetes. Obviously, this worrying figure requires a great deal of attention. Diabetes screening can be made more affordable, faster, and more generally available by making it possible to predict a patient's diabetic status based on just a few key attributes. The purpose of this study is two-fold. First, the impact of salient features in the diagnosis of diabetes cases will be investigated. Using random forest, and recursive feature elimination with majority voting procedures, essential features were first identified for the prediction models to be built. State-of-the-art model performance was achieved by employing 13 distinct machine learning classifiers. Experimental results using patient data collected from 130 hospitals in the US suggest that ensemble models outperformed the individual ones in terms of overall performance. The data was further analyzed to discover the salient risk factors and how they affect diabetes classification. Second, it is believed that once diagnosed as diabetes, there could be many factors that affect the patients’ chances of developing diabetes-related diseases. This research investigated these factors to build models for predicting patients’ diabetes-related diseases such as circulatory, nervous, and digestive systems’ diseases. The prediction models achieved state-of-the-art model performance by deploying ensemble machine learning techniques. In addition, to increase confidence in the designed machine learning models, a few interpretations behind the decisions made by these prediction models were provided. Thus, it is believed that the designed models can assist physicians, clinicians, and patients to better understand the risk of acquiring diabetes.