Improving Early Detection of Type 2 Diabetes from Primary Care Records with Sparse-Balanced SVM

Authors

  • Umna Iftikhar Faculty of Engineering Science and Technology, Iqra University, Karachi Author https://orcid.org/0009-0002-8853-2548
  • Theodora C. School of Science and Technology, International Hellenic University, Thermi Author

DOI:

https://doi.org/10.70062/jmih.v1i2.168

Keywords:

Type 2 diabetes, Electronic health records, Imbalanced learning, Sparse SVM, Interpretability, Clinical decision support, Primary care data

Abstract

Early detection of Type 2 Diabetes (T2D) from primary-care Electronic Health Records (EHRs) is challenged by high-dimensional features, class imbalance, and limited model interpretability. We introduce a Sparse-Balanced Support Vector Machine (SB-SVM) that combines sparsity-promoting regularization with class-dependent weighting to enhance detection of the minority class while maintaining clinical interpretability. Using the FIMMG primary-care EHR dataset from Italian general practitioners, we tested SB-SVM in three progressively complex scenarios. We compared it with linear/gaussian SVM, KNN, Decision Tree, Random Forest, and deep models (MLP, DBN). Performance was evaluated using stratified cross-validation, with AUC and recall reported. Sparsity was measured using the  norm. Training, validation, and testing efficiency were analyzed. SB-SVM achieved mean AUCs of 0.91, 0.81, and 0.69 across the three cases, with higher recall than most baselines. Gains in recall and AUC were statistically significant compared to most competitors (p < 0.05), though differences with Decision Tree and Random Forest were not always significant. The model produced sparse, interpretable coefficients ( = 0.39, 0.91, 0.57), consistently highlighting clinically relevant predictors (e.g., HbA1c, age, renal function, hypertension, and antidiabetic prescriptions). SB-SVM also showed lower runtime than ensemble and deep models, supporting real-time applications. By combining class balancing and sparsity within a linear margin-based classifier, SB-SVM offers accurate, interpretable, and computationally efficient T2D risk prediction suitable for integration into Clinical Decision Support Systems in primary care.

References

[1] H. Sun et al., “IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045,” Diabetes Research and Clinical Practice, vol. 183, p. 109119, Dec. 2021, doi: 10.1016/j.diabres.2021.109119.

[2] S. Tarumi et al., “Leveraging Artificial Intelligence to Improve Chronic Disease Care: Methods and Application to Pharmacotherapy Decision Support for Type-2 Diabetes Mellitus,” Methods of Information in Medicine, vol. 60, no. 2, pp. 59–70, May 2021, doi: 10.1055/s-0041-1728757.

[3] L. Fregoso-Aparicio, J. Noguez, L. Montesinos, and J. A. García-García, “Machine learning and deep learning predictive models for type 2 diabetes: a systematic review,” Diabetology & Metabolic Syndrome, vol. 13, no. 1, p. 7, Dec. 2021, doi: 10.1186/s13098-021-00767-9.

[4] S. Afolabi, N. Ajadi, A. Jimoh, and I. Adenekan, “Predicting diabetes using supervised machine learning algorithms on E-health records,” Informatics and Health, vol. 25, Mar. 2025, doi: 10.1016/j.infoh.2024.12.002.

[5] E. M. Hameed, H. Joshi, and Q. K. Kadhim, “Advancements in Artificial Intelligence Techniques for Diabetes Prediction: A Comprehensive Literature Review,” Journal of Robotics and Control (JRC), vol. 6, no. 1, Feb. 2025, doi: 10.18196/jrc.v6i1.22258.

[6] A. Wibowo, A. F. N. Masruriyah, and S. Rahmawati, “Refining Diabetes Diagnosis Models: The Impact of SMOTE on SVM, Logistic Regression, and Naïve Bayes,” Journal of Electronics Electromedical Engineering and Medical Informatics, vol. 7, no. 1, Jan. 2025, doi: 10.35882/jeeemi.v7i1.596.

[7] S. K. Arumugam, J. Patterson, P. Petridis, and S. Masoud, “Machine Learning for Early Non-invasive Diabetes Detection Using Electronic Health Records,” Journal of Intelligent Computing & Health Informatics, vol. 6, no. 1, Mar. 2025, doi: 10.26714/jichi.v6i1.17299.

[8] M. Agraz, Y. Deng, G. E. Karniadakis, and C. S. Mantzoros, “Enhancing severe hypoglycemia prediction in type 2 diabetes mellitus through multi-view co-training machine learning model for imbalanced dataset,” Scientific Reports, vol. 14, no. 1, Sep. 2024, doi: 10.1038/s41598-024-69844-z.

[9] T.-L. Hu, C.-M. Chao, C.-C. Wu, T.-N. Chien, and C. Li, “Machine Learning-Based Predictions of Mortality and Readmission in Type 2 Diabetes Patients in the ICU,” Applied Sciences, vol. 14, no. 18, p. 8443, Sep. 2024, doi: 10.3390/app14188443.

[10] H. Yang, J. Li, S. Liu, X. Yang, and J. Liu, “Predicting Risk of Hypoglycemia in Patients With Type 2 Diabetes by Electronic Health Record–Based Machine Learning: Development and Validation,” JMIR Medical Informatics, vol. 10, no. 6, Jun. 2022, doi: 10.2196/36958.

[11] V. Glanz, V. Dudenkov, and A. Velikorodny, “Development and validation of a type 2 diabetes machine learning classification model for clinical decision support framework,” Research Square, Sep. 2022, doi: 10.21203/rs.3.rs-2033259/v1.

[12] V. Glanz, V. Dudenkov, and A. Velikorodny, “Development and validation of a type 2 diabetes machine learning classification model for EHR-based diagnostics and clinical decision support,” bioRxiv, Oct. 2022, doi: 10.1101/2022.10.08.511400.

[13] R. Akula, N. Nguyen, and I. Garibay, “Supervised Machine Learning based Ensemble Model for Accurate Prediction of Type 2 Diabetes,” arXiv preprint, Oct. 2019, doi: 10.48550/arxiv.1910.09356.

[14] R. Akula, N. Nguyen, and I. Garibay, “Supervised Machine Learning based Ensemble Model for Accurate Prediction of Type 2 Diabetes,” in Proc. IEEE SoutheastCon, Apr. 2019, doi: 10.1109/southeastcon42311.2019.9020358.

[15] M. Agraz, Y. Deng, G. E. Karniadakis, and C. S. Mantzoros, “Long-term Prediction of Severe Hypoglycemia in Type 2 Diabetes Based on Multi-view Co-training,” medRxiv, Aug. 2023, doi: 10.1101/2023.08.08.23293518.

[16] S. Ahmed, M. S. Kaiser, M. S. Hossain, and K. Andersson, “A Comparative Analysis of LIME and SHAP Interpreters With Explainable ML-Based Diabetes Predictions,” IEEE Access, vol. 12, Jul. 2024, doi: 10.1109/access.2024.3422319.

[17] U. Allani, “Interactive Diabetes Risk Prediction Using Explainable Machine Learning: A Dash-Based Approach with SHAP, LIME, and

Downloads

Published

2025-09-19