Comprehensive Benchmarking of several Machine Learning and Bayesian Models for Early-Stage Diabetes Risk Prediction: A Large-Scale Comparative Study

Md. Iqbal Hossain; Najila Alam Porno

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Comprehensive Benchmarking of several Machine Learning and Bayesian Models for Early-Stage Diabetes Risk Prediction: A Large-Scale Comparative Study

by Md. Iqbal Hossain, Najila Alam Porno

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Number 57

Year of Publication: 2025

Authors: Md. Iqbal Hossain, Najila Alam Porno

10.5120/ijca2025925995

Md. Iqbal Hossain, Najila Alam Porno . Comprehensive Benchmarking of several Machine Learning and Bayesian Models for Early-Stage Diabetes Risk Prediction: A Large-Scale Comparative Study. International Journal of Computer Applications. 187, 57 ( Nov 2025), 9-16. DOI=10.5120/ijca2025925995

@article{ 10.5120/ijca2025925995,

author = { Md. Iqbal Hossain, Najila Alam Porno },

title = { Comprehensive Benchmarking of several Machine Learning and Bayesian Models for Early-Stage Diabetes Risk Prediction: A Large-Scale Comparative Study },

journal = { International Journal of Computer Applications },

issue_date = { Nov 2025 },

volume = { 187 },

number = { 57 },

month = { Nov },

year = { 2025 },

issn = { 0975-8887 },

pages = { 9-16 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume187/number57/comprehensive-benchmarking-of-several-machine-learning-and-bayesian-models-for-early-stage-diabetes-risk-prediction-a-large-scale-comparative-study/ },

doi = { 10.5120/ijca2025925995 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2025-11-18T21:11:12.091035+05:30

%A Md. Iqbal Hossain

%A Najila Alam Porno

%T Comprehensive Benchmarking of several Machine Learning and Bayesian Models for Early-Stage Diabetes Risk Prediction: A Large-Scale Comparative Study

%J International Journal of Computer Applications

%@ 0975-8887

%V 187

%N 57

%P 9-16

%D 2025

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Diabetes remains a critical global health challenge, with early detection is crucial for effective management. This study presents a comprehensive benchmarking analysis of 14 diverse machine learning and Bayesian models for early-stage diabetes risk prediction using clinical data [2] from Sylhet, Bangladesh. This research evaluated traditional methods (Logistic Regression, Decision Trees), ensemble techniques (Random Forest, XGBoost, LightGBM), Bayesian approaches (BART, Bayesian Logistic Regression), and advanced neural architectures (Deep Belief Networks) using both 70-30 train-test splits and 10-fold crossvalidation. The results demonstrate that ensemble methods consistently outperformed other approaches, with Random Forest(RF) achieving the highest cross-validated AUC (0.9951) and accuracy (0.9699). The study provides valuable insights into model selection for clinical decision support systems and highlights the robustness of tree-based ensemble methods for medical diagnosis tasks.Diabetes Prediction, Machine Learning Benchmarking, Cross- Validation, Ensemble Methods, Bayesian Models, Clinical Decision Support

References

American diabetes association. https://diabetes.org/ about-diabetes/statistics/about-diabetes, 2025. [Online; accessed 19-Oct-2025].
Uc irvine machine learning repository. https: //archive.ics.uci.edu/dataset/529/early+stage+ diabetes+risk+prediction+dataset, 2025. [Online; accessed 19-Oct-2025].
Ramya Akula, Ni Nguyen, and Ivan Garibay. Supervised machine learning based ensemble model for accurate prediction of type 2 diabetes. In 2019 SoutheastCon, pages 1–8. IEEE, 2019.
Zainab Abood Ahmed Al Bairmani and Aasha Abdulkhleq Ismael. Using logistic regression model to study the most important factors which affects diabetes for the elderly in the city of hilla/2019. In Journal of Physics: Conference Series, volume 1818, page 012016. IOP Publishing, 2021.
Norou Diawara, Tiffany Henley, Samuel L Brown, and Md Iqbal Hossain. In search of the rational voter in the 2020 presidential election: Understanding the impact of voter costs and benefits on turnout. In Understanding Voter Behavior With Predictive Modeling, pages 35–60. IGI Global Scientific Publishing, 2026.
MasoudMHassan. A fully bayesian logistic regression model for classification of zada diabetes dataset. Science Journal of University of Zakho, 8(3):105–111, 2020.
Tiffany Henley, Samuel Brown, Norou Diawara, Md Iqbal Hossain, and Gregory Rivera. Contemporary voter suppression: Impact on the 2020 general election. Ralph Bunche Journal of Public Affairs, 7(1):4, 2024.
Tiffany Henley, Norou Diawara, Md Iqbal Hossain, and Sam Brown. In search of the rational voter in the 2020 presidential election: Understanding the impact of voter costs and benefits on turnout. 2024.
Koushik Chandra Howlader, Md Shahriare Satu, Md Abdul Awal, Md Rabiul Islam, Sheikh Mohammed Shariful Islam, JulianMWQuinn, and Mohammad Ali Moni. Machine learning models for classification and identification of significant attributes to detect type 2 diabetes. Health information science and systems, 10(1):2, 2022.
Yazan Jian, Michel Pasquier, Assim Sagahyroon, and Fadi Aloul. A machine learning approach to predicting diabetes complications. In Healthcare, volume 9, page 1712. MDPI, 2021.
Lionel P Joseph, Erica A Joseph, and Ramendra Prasad. Explainable diabetes classification using hybrid bayesianoptimized tabnet architecture. Computers in Biology and Medicine, 151:106178, 2022.
Ram D Joshi and Chandra K Dhakal. Predicting type 2 diabetes using logistic regression and machine learning approaches. International journal of environmental research and public health, 18(14):7346, 2021.
Nada Ali Noori and Ali A Yassin. A comparative analysis for diabetic prediction based on machine learning techniques. Journal of Basrah Researches ((Sciences)), 47(1), 2021.
Monalisa Panda, Debani Prashad Mishra, Sopa Mousumi Patro, and Surender Reddy Salkuti. Prediction of diabetes disease using machine learning algorithms. IAES International Journal of Artificial Intelligence, 11(1):284, 2022.
P Prabhu and S Selvabharathi. Deep belief neural network model for prediction of diabetes mellitus. In 2019 3rd international conference on imaging, signal processing and communication (ICISPC), pages 138–142. IEEE, 2019.
Natalya Pya and Simon N Wood. Shape constrained additive models. Statistics and computing, 25(3):543–559, 2015.
Priyanka Rajendra and Shahram Latifi. Prediction of diabetes using logistic regression and ensemble techniques. Computer Methods and Programs in Biomedicine Update, 1:100032, 2021.
Derara Duba Rufo, Taye Girma Debelee, Achim Ibenthal, and Worku Gachena Negera. Diagnosis of diabetes mellitus using gradient boosting machine (lightgbm). Diagnostics, 11(9):1714, 2021.
Rodney A Sparapani, Lisa E Rein, Sergey S Tarima, Tourette A Jackson, and John R Meurer. Non-parametric recurrent events analysis with bart and an application to the hospital admissions of patients with diabetes. Biostatistics, 21(1):69–85, 2020.
Yaya Xie, Xiu Li, EWT Ngai, and Weiyun Ying. Customer churn prediction using improved balanced random forests. Expert Systems with Applications, 36(3):5445–5449, 2009.

Index Terms

Computer Science

Information Sciences

Keywords

Diabetes Prediction Machine Learning Benchmarking Cross- Validation Ensemble Methods Bayesian Models Clinical Decision Support