Improving Diabetes Prediction Accuracy and Interpretability with SMOTE and SHAP

Authors

  • Wajeeha Iftikhar Faculty of Computing, Riphah International University, Lahore, Pakistan
  • Muhammad Yaseen Faculty of Computing, Riphah International University, Lahore, Pakistan
  • Gohar Rahman Faculty of Computing and Informatics, University Malaysia Sabah (UMS), Kota Kinabalu, Sabah, Malaysia
  • Muhammad Asif Nauman Faculty of Computing, Riphah International University, Lahore, Pakistan
  • Umar Farooq Khattak Faculty of Artificial Intelligence and Frontier Technologies, UNITAR International University, Selangor, Malaysia
Volume: 16 | Issue: 2 | Pages: 34276-34282 | April 2026 | https://doi.org/10.48084/etasr.16247

Abstract

Diabetes can cause a lot of serious health problems, and its early detection is very important. This study proposes a hybrid machine learning framework that enhances diabetes prediction accuracy and interpretability by combining the Synthetic Minority Over-Sampling Technique (SMOTE) and Shapley Additive Explanations (SHAP), examining five machine learning models—Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and XGBoost—on two datasets. The results showed that RF and XGBoost achieved the highest predictive performance. SMOTE improved class balance and model robustness, while SHAP provided transparent explanations of key predictors such as glucose, BMI, and age. The proposed approach demonstrates that the combination of SMOTE and SHAP enhances both the reliability and the interpretability of models for practical diabetes prediction.

Keywords:

diabetes prediction, machine learning, random forest, SMOTE, Explainable AI (XAI), SHAP

Downloads

Download data is not yet available.

References

U. Allani, "Interactive Diabetes Risk Prediction Using Explainable Machine Learning: A Dash-Based Approach with SHAP, LIME, and Comorbidity Insights." arXiv, 2025.

V. Bardia and E. Sophiya, "Diabetes Prediction Using Machine Learning Algorithm: A Comparative Analysis," in 2024 10th International Conference on Advanced Computing and Communication Systems (ICACCS), Mar. 2024, pp. 1973–1979. DOI: https://doi.org/10.1109/ICACCS60874.2024.10717264

S. Mahmud, B. U. Islam, N. Haque Anik, and T. Ghosh, "Diabetes Prediction: A Comparative Analysis of Machine Learning Algorithms with SMOTE," in 2024 IEEE International Conference on Computing, Applications and Systems (COMPAS), Sept. 2024, pp. 1–6. DOI: https://doi.org/10.1109/COMPAS60761.2024.10796405

M. Abdelaoui, "Analysis of the diabetes dataset using a SMOTE machine learning approach," Studies in Engineering and Exact Sciences, vol. 5, no. 2, 2024, Art. no. e12076. DOI: https://doi.org/10.54021/seesv5n2-772

R. Taher, S. H. Basha, and A. Abdalla, "Improving Machine Learning Techniques with Imbalanced Data Treatment for Predicting Diabetes," in Proceedings of the 9th International Conference on Advanced Intelligent Systems and Informatics 2023, 2023, pp. 380–391. DOI: https://doi.org/10.1007/978-3-031-43247-7_34

M. Alghamdi, M. Al-Mallah, S. Keteyian, C. Brawner, J. Ehrman, and S. Sakr, "Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project," PLOS ONE, vol. 12, no. 7, 2017, Art. no. e0179805. DOI: https://doi.org/10.1371/journal.pone.0179805

A. N. Okere, T. Li, C. Theran, E. Nyasani, and A. A. Ali, "Evaluation of factors predicting transition from prediabetes to diabetes among patients residing in underserved communities in the United States – A machine learning approach," Computers in Biology and Medicine, vol. 187, Mar. 2025, Art. no. 109824.

R. Kaur, R. Kumar, S. Kaur, G. Singh, A. Kaur, and S. Singh, "Machine Learning for Diabetes Prediction: Performance Analysis Using Logistic Regression, Naïve Bayes, and Decision Tree Models," Healthcraft Frontiers, vol. 02, no. 04, pp. 169–187, Dec. 2024. DOI: https://doi.org/10.56578/hf020401

W. Li, Y. Peng, and K. Peng, "Diabetes prediction model based on GA-XGBoost and stacking ensemble algorithm," PLOS ONE, vol. 19, no. 9, 2024, Art. no. e0311222. DOI: https://doi.org/10.1371/journal.pone.0311222

N. Nagarjuna and D. L. Hn, "Predictive Modeling of Diabetes Mellitus Utilizing Machine Learning Techniques," CVR Journal of Science and Technology, vol. 26, no. 1, pp. 112–117, June 2024. DOI: https://doi.org/10.32377/cvrjst2618

H. El-Sofany, S. A. El-Seoud, O. H. Karam, Y. M. Abd El-Latif, and I. A. T. F. Taj-Eddin, "A Proposed Technique Using Machine Learning for the Prediction of Diabetes Disease through a Mobile App," International Journal of Intelligent Systems, vol. 2024, no. 1, 2024, Art. no. 6688934. DOI: https://doi.org/10.1155/2024/6688934

Z. Rafie, M. S. Talab, B. E. Z. Koor, A. Garavand, C. Salehnasab, and M. Ghaderzadeh, "Leveraging XGBoost and explainable AI for accurate prediction of type 2 diabetes," BMC Public Health, vol. 25, no. 1, Oct. 2025, Art. no. 3688. DOI: https://doi.org/10.1186/s12889-025-24953-w

A. Nkemdirim Okere, T. Li, C. Theran, E. Nyasani, and A. A. Ali, "Evaluation of factors predicting transition from prediabetes to diabetes among patients residing in underserved communities in the United States – A machine learning approach," Computers in Biology and Medicine, vol. 187, Mar. 2025, Art. no. 109824. DOI: https://doi.org/10.1016/j.compbiomed.2025.109824

A. M. AbdulAbbas, R. Alkanany, Y. A. K. Al-Nuaimi, and Z. M. A. Al-Hamdawee, "A Sequential Data Preprocessing Pipeline for Diabetes Prediction: A Data Leakage Prevention and Dual-Validation Approach," Engineering, Technology & Applied Science Research, vol. 15, no. 6, pp. 30059–30066, Dec. 2025. DOI: https://doi.org/10.48084/etasr.14155

P. Netayawijit, W. Chansanam, and K. Sorn-In, "Interpretable Machine Learning Framework for Diabetes Prediction: Integrating SMOTE Balancing with SHAP Explainability for Clinical Decision Support," Healthcare, vol. 13, no. 20, Oct. 2025. DOI: https://doi.org/10.3390/healthcare13202588

P. Gogoi and J. A. Valan, "Chronic kidney disease prediction using machine learning techniques: a comparative study of feature selection methods with SMOTE and SHAP," Multiscale and Multidisciplinary Modeling, Experiments and Design, vol. 8, no. 4, Mar. 2025, Art. no. 216. DOI: https://doi.org/10.1007/s41939-025-00806-2

Md. M. Islam, H. R. Rifat, Md. S. B. Shahid, A. Akhter, M. A. Uddin, and K. M. M. Uddin, "Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME," Engineering Reports, vol. 7, no. 1, 2025, Art. no. e13080. DOI: https://doi.org/10.1002/eng2.13080

M. Kutlu, T. B. Donmez, and C. Freeman, "Machine learning interpretability in diabetes risk assessment: a SHAP analysis," Computers and Electronics in Medicine, vol. 1, no. 1, pp. 34–44, July 2024. DOI: https://doi.org/10.69882/adba.cem.2024075

H. B. Kibria, M. Nahiduzzaman, M. O. F. Goni, M. Ahsan, and J. Haider, "An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI," Sensors, vol. 22, no. 19, Sept. 2022. DOI: https://doi.org/10.3390/s22197268

M. B. Almadhoun and M. A. Burhanuddin, "Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study," JMIR Bioinformatics and Biotechnology, vol. 6, no. 1, July 2025, Art. no. e70621. DOI: https://doi.org/10.2196/70621

"Pima Indians Diabetes Database." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database.

"Diabetes Prediction Dataset." Kaggle, [Online]. Available: https://www.kaggle.com/dataset s/marshalpatel3558/diabetes-prediction-dataset-legit-dataset.

G. Yudheksha, V. Murugadoss, P. S. Reddy, T. Harshavardan, and S. Sriramulu, "A Machine Learning based Approach to Detect Early Stage Diabetes Prediction," in 2022 6th International Conference on Electronics, Communication and Aerospace Technology, Dec. 2022, pp. 919–924. DOI: https://doi.org/10.1109/ICECA55336.2022.10009113

Downloads

How to Cite

[1]
W. Iftikhar, M. Yaseen, G. Rahman, M. A. Nauman, and U. F. Khattak, “Improving Diabetes Prediction Accuracy and Interpretability with SMOTE and SHAP”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 2, pp. 34276–34282, Apr. 2026.

Metrics

Abstract Views: 85
PDF Downloads: 49

Metrics Information