CFP last date
20 December 2024
Reseach Article

Investigating the Impact of Prominent Factors on the Diagnosis of Diabetes and its Associated Diseases using Ensemble Machine Learning Models

by Hossam Meshref
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 184 - Number 48
Year of Publication: 2023
Authors: Hossam Meshref
10.5120/ijca2023922598

Hossam Meshref . Investigating the Impact of Prominent Factors on the Diagnosis of Diabetes and its Associated Diseases using Ensemble Machine Learning Models. International Journal of Computer Applications. 184, 48 ( Feb 2023), 19-30. DOI=10.5120/ijca2023922598

@article{ 10.5120/ijca2023922598,
author = { Hossam Meshref },
title = { Investigating the Impact of Prominent Factors on the Diagnosis of Diabetes and its Associated Diseases using Ensemble Machine Learning Models },
journal = { International Journal of Computer Applications },
issue_date = { Feb 2023 },
volume = { 184 },
number = { 48 },
month = { Feb },
year = { 2023 },
issn = { 0975-8887 },
pages = { 19-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume184/number48/32630-2023922598/ },
doi = { 10.5120/ijca2023922598 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:24:18.934198+05:30
%A Hossam Meshref
%T Investigating the Impact of Prominent Factors on the Diagnosis of Diabetes and its Associated Diseases using Ensemble Machine Learning Models
%J International Journal of Computer Applications
%@ 0975-8887
%V 184
%N 48
%P 19-30
%D 2023
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Diabetes Mellitus (DM) is a prevalent chronic condition that can lead to serious health consequences and even death. It is marked by hyperglycemia in which blood sugar levels are abnormally high. According to recent data, there will be 642 million diabetics by 2040, which implies one in every ten persons will have diabetes. Obviously, this worrying figure requires a great deal of attention. Diabetes screening can be made more affordable, faster, and more generally available by making it possible to predict a patient's diabetic status based on just a few key attributes. The purpose of this study is two-fold. First, the impact of salient features in the diagnosis of diabetes cases will be investigated. Using random forest, and recursive feature elimination with majority voting procedures, essential features were first identified for the prediction models to be built. State-of-the-art model performance was achieved by employing 13 distinct machine learning classifiers. Experimental results using patient data collected from 130 hospitals in the US suggest that ensemble models outperformed the individual ones in terms of overall performance. The data was further analyzed to discover the salient risk factors and how they affect diabetes classification. Second, it is believed that once diagnosed as diabetes, there could be many factors that affect the patients’ chances of developing diabetes-related diseases. This research investigated these factors to build models for predicting patients’ diabetes-related diseases such as circulatory, nervous, and digestive systems’ diseases. The prediction models achieved state-of-the-art model performance by deploying ensemble machine learning techniques. In addition, to increase confidence in the designed machine learning models, a few interpretations behind the decisions made by these prediction models were provided. Thus, it is believed that the designed models can assist physicians, clinicians, and patients to better understand the risk of acquiring diabetes.

References
  1. Butler AE., and MisselbrookD., “Distinguishing between type 1 and type 2 diabetes,” British Medical Journal, 2020;370:m2998, pp.1-3.
  2. PhamTB., NguyenTT., TruongHT, TrinhCH, DuHNT, et al., “Effects of Diabetic Complications on Health-Related Quality of Life Impairment in Vietnamese Patients with Type 2 Diabetes,” Journal of Diabetes Research 2020, vol.6, pp.1-8.
  3. Raghda Essam Ali, El-KadiHatem, Soha Safwat Labib and Yasmine Ibrahim Saad, “Prediction of Potential-Diabetic Obese-Patients using Machine Learning Techniques” International Journal of Advanced Computer Science and Applications(IJACSA), vol.10(8), 2019, pp.80-88.
  4. ChoN., ShawJ., S. Karuranga, Y. Huang, J. da Rocha, et al., “Diabetes Atlas,” Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res. Clin. Pract. 2018, 138, 271–281.
  5. Al-RubeaanK., HA., Al-ManaaT. Khoja, AhmadN., A., AlsharqawiA., et al., “The Saudi Abnormal Glucose Metabolism and Diabetes Impact Study (SAUDI-DM),” Ann. Saudi Med. 2014, 34, 465–475.
  6. AlotaibiA., PerryL., GholizadehL., and Al-GanmiA., “Incidence and prevalence rates of diabetes mellitus in Saudi Arabia: An overview,” J. Epidemiol. Glob. Health 2017, 7, 211–218.
  7. AlsulimanMA., AlotaibiSA., ZhangQ., and DurgampudiPK., “A systematic review of factors associated with uncontrolled diabetes and meta-analysis of its prevalence in Saudi Arabia since 2006. Diabetes/Metab. Res. Rev. 2020.
  8. AlmutairiE., AbbodM., and ItagakiT., “Mathematical Modelling of Diabetes Mellitus and Associated Risk Factors in Saudi Arabia,” Int. J. Simul. Sci. Technol. 2020, 21, 1–7.
  9. SyedAH., and KhanT., “Machine Learning-Based Application for Predicting Risk of Type 2 Diabetes Mellitus (T2DM) in Saudi Arabia: A Retrospective Cross-Sectional Study,” IEEE Access 2020, 8, 199539–199561.
  10. Tahani Daghistani and Riyad Alshammari, “Diagnosis of Diabetes by Applying Data Mining Classification Techniques” International Journal of Advanced Computer Science and Applications(IJACSA), vol.7(7), 2016, pp.328-332.
  11. DaanouniO., CherradiB. and TmiriA., “Type 2 diabetes mellitus prediction model based on machine learning approach,” in Proc. of the 3rd International Conference on Smart City Applications, Tetouan, Morocco, pp. 454-469.
  12. LaiH., HuangH., KeshavjeeK., GuergachiA., and GaoX., “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocrine Disorders, 19, pp. 1–9, 2019.
  13. AlićB., GurbetaL., and BadnjevicA., “Machine learning techniques for classification of diabetes and cardiovascular diseases,” in Proc. of the 6th Mediterranean Conference on Embedded Computing (MECO), Bar, Montenegro, pp. 1-4, 2017.
  14. S. Uddin, KhanA., HossainM. E., and MoniM. A., “Comparing different supervised machine learning algorithms for disease prediction,” BMC Medical Informatics and Decision Making, 19, pp. 1–16, 2019.
  15. National Health and Nutrition Examination Survey (NHANES). National Center for Health Statistics, Centers for Disease Control and Prevention. [Online]. Available: https://wwwn.cdc.gov/nchs/nhanes/
  16. YuW., LiuT., ValdezR., GwinnM., and KhouryM. J., “Application of support vector machine modeling for prediction of common diseases: The case of diabetes and pre-diabetes,” BMC Medical Informatics and Decision Making, vol. 10, no. 16, pp. 1-7, 2010.
  17. SemerdjianJ., and FrankS., “An ensemble classifier for predicting the onset of type II diabetes,” arXiv:1708.07480, arXiv, 2017.
  18. DinhA., MiertschinS., YoungA., and MohantyS., “A data-driven approach to predicting diabetes and cardiovascular disease with machine learning,” BMC Medical Informatics and Decision Making, vol. 19, no. 211, pp. 1-15, 2019.
  19. Martín-GonzálezF., González-RobledoJ., Sánchez-HernándezaF. and Moreno-GarcíaM. N., “Success/failure prediction of noninvasive mechanical ventilation in intensive care units,” Methods of Information in Medicine, vol. 55, no. 3, pp. 234–241, 2016.
  20. TomarD. and AgarwalS., “Hybrid feature selection based weighted least squares twin support vector machine approach for diagnosing breast cancer, hepatitis, and diabetes,” Advances in Artificial Neural Systems, vol. 2015, article ID. 265637, 2015.
  21. BalakrishnanS., NarayanaswamyR., SavarimuthuN., and SamikannuR., “SVM ranking with backward search for feature selection in type II diabetes databases,” In Proc. of the 2008 IEEE International Conference on Systems, Man and Cybernetics, Singapore, pp. 2628–2633, 2008.
  22. EphzibahE., “Cost effective approach on feature selection using genetic algorithms and fuzzy logic for diabetes diagnosis,” arXiv:1103.0087, arXiv 2011.
  23. AslamM. W., ZhuZ. and NandiA. K., “Feature generation using genetic programming with comparative partner selection for diabetes classification,” Expert Systems with Applications, vol. 40, no. 13, pp. 5402–5412, 2013.
  24. Rodríguez-RodríguezI., RodríguezJ. V., González-VidalA. and ZamoraM. A., “Feature selection for blood glucose level prediction in type 1 diabetes mellitus by using the sequential input selection algorithm (SISAL),” Symmetry, vol. 11, no. 9, 2019.
  25. IencoD., and MeoR., “Exploration and reduction of the feature space by hierarchical clustering,” In Proc. of the 2008 SIAM International Conference on Data Mining, Atlanta, GA, USA, pp. 577–587, 2008.
  26. StrackB., DeShazoJ. P., GenningsC., OlmoJ. L., S. Ventura et al., “Impact of HbA1c measurement on hospital readmission rates: Analysis of 70,000 clinical database patient records,” BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.
  27. YeS., RuanP., YongJ., ShenH., LiaoZ. and DongX., “The impact of the HbA1c level of type 2 diabetics on the structure of haemoglobin,” Scientific Reports 6, 33352, 2016. https://doi.org/10.1038/srep33352.
  28. TaghiyevA., AltunA., AllahverdiN., and . CaglarS, “A Machine Learning Framework to Identify the Causes of HbA1c in patients with type 2 Diabetes Mellitus,” Journal of Control Engineering and Applied Informatics, vol. 21, no. 2, 2019.
  29. LinC.-Y., SinghH. S., . KarR, and RazaU., “What are predictors of medication change and hospital readmission in diabetic patients?,” Berkeley, 2018.
  30. DaimonT., Box–Cox transformation. In Lovric M. (eds) International Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg, 2011. https://doi.org/10.1007/978-3-642-04898-2_152.
  31. WestonJ., MukherjeeS., ChapelleO., PontilM., PoggioT., et al., Feature selection for SVMs., In Advances in Neural Information Processing Systems, 13 (NIPS 2000); MIT Press: Cambridge, MA, USA, 2001.
  32. BatistaG., BazzanA., and MonardM., “Balancing Training Data for Automated Annotation of Keywords: a Case Study,” Journal of artificial intelligence research, 3(2):15–20, 2003.
  33. Zekic-SusacM., SarlijaN., HasA., and BilandzicA., “Predicting company growth using logistic regression and neural networks,” Croatian Operational Research Review 2016, vol. 7, no. 2, pp. 229-248.
  34. KunchevaLI., SkurichinaM., and DuinRPW, “An experimental study on diversity for bagging and boosting with linear classifiers,” Information Fusion 2002, vol. 3, no. 4, pp. 245-258.
  35. WangB., and PineauJ., “Online bagging and boosting for imbalanced data streams,” IEEE Transactions on Knowledge and Data Engineering 2016, vol. 28, no. 12, pp. 3353 - 3366.
  36. FriedmanJ., HastieT., and TibshiraniR., “Additive logistic regression: a statistical view of boosting,” The Annals of Statistics 2000, vol. 28., no.2, pp. 337-407.
  37. BiauG., and ScornetE., “A random forest guided tour,” TEST 2016, vol. 25, no. 2, pp. 197–227.
  38. TanP., SteinbachM., KarpatneA., and KumarV., Introduction to Data Mining, 2nd edition; Pearson, 2018.
  39. RoigerR., DATA MIINING: A Tutorial-Based Primer, 2nd edition: Chapman and Hall/CRC, 2017.
  40. MeshrefH., “Cardiovascular Disease Diagnosis: A Machine Learning Interpretation Approach,” International Journal of Advanced Computer Science and Applications (IJACSA), Vol.10(12), 2019.
  41. MeshrefH., “Predicting Loan Approval of Bank Direct Marketing Data Using Ensemble Machine Learning Algorithms,” International Journal of Circuits, Systems and Signal Processing, Vol.14, pp. 914-922, 2020.
  42. WittenIH, FrankE., and HallMA., Data Mining: Practical Machine Learning Tools and Techniques, 4th ed.: Morgan Kaufmann Publications: San Francisco, United States, 2017.
  43. LundbergSM., and LeeSI, “A unified approach to interpreting model predictions,” NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768–4777, 2017.
  44. Ho-PhamLT., NguyenU.D., TranTX.,andNguyenTV.,”Discordance in the diagnosis of diabetes: Comparison between HbA1c and fasting plasma glucose,” PLoS ONE, vol. 12, no. 8, 2017.
  45. NishimuraR., NakagamiT., SoneH., OhashiY., TajimaN., “Relationship between hemoglobin A1c and cardiovascular disease in mild-to-moderate hypercholesterolemic Japanese individuals: Subanalysis of a large-scale randomized controlled trial,” Cardiovascular diabetology 2011, 10:58.
  46. TsengPH., LeeYC., ChiuHM., ChenCC., LiaoWC., et al., “Association of diabetes and HbA1c levels with gastrointestinal manifestations,” Diabetes Care, 2012, vol. 35, no. 5, pp. 1053-1060.
  47. LiCI.,LiTC., LiuCS., LinWY., CC. Chen, et al., “Extreme values of hemoglobin a1c are associated with increased risks of chronic obstructive pulmonary disease in patients with type 2 diabetes: a competing risk analysis in national cohort of Taiwan diabetes study,” Medicine (Baltimore) 2015, vol. 94, no. 1:e367.
Index Terms

Computer Science
Information Sciences

Keywords

Diabetes Mellitus Machine Learning Ensemble Techniques HbA1c Diabetic Cases Interpretability