| International Journal of Computer Applications |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 187 - Number 57 |
| Year of Publication: 2025 |
| Authors: Md. Iqbal Hossain, Najila Alam Porno |
10.5120/ijca2025925995
|
Md. Iqbal Hossain, Najila Alam Porno . Comprehensive Benchmarking of several Machine Learning and Bayesian Models for Early-Stage Diabetes Risk Prediction: A Large-Scale Comparative Study. International Journal of Computer Applications. 187, 57 ( Nov 2025), 9-16. DOI=10.5120/ijca2025925995
Diabetes remains a critical global health challenge, with early detection is crucial for effective management. This study presents a comprehensive benchmarking analysis of 14 diverse machine learning and Bayesian models for early-stage diabetes risk prediction using clinical data [2] from Sylhet, Bangladesh. This research evaluated traditional methods (Logistic Regression, Decision Trees), ensemble techniques (Random Forest, XGBoost, LightGBM), Bayesian approaches (BART, Bayesian Logistic Regression), and advanced neural architectures (Deep Belief Networks) using both 70-30 train-test splits and 10-fold crossvalidation. The results demonstrate that ensemble methods consistently outperformed other approaches, with Random Forest(RF) achieving the highest cross-validated AUC (0.9951) and accuracy (0.9699). The study provides valuable insights into model selection for clinical decision support systems and highlights the robustness of tree-based ensemble methods for medical diagnosis tasks.Diabetes Prediction, Machine Learning Benchmarking, Cross- Validation, Ensemble Methods, Bayesian Models, Clinical Decision Support