Development and validation of machine learning models to predict unplanned hospitalizations of patients with diabetes within the next 12 months

A. E. Andreychenko; A. D. Ermak; D. V. Gavrilov; R. E. Novitskiy; A. V. Gusev

doi:10.14341/DM13065

Development and validation of machine learning models to predict unplanned hospitalizations of patients with diabetes within the next 12 months

A. E. Andreychenko, A. D. Ermak, D. V. Gavrilov, R. E. Novitskiy, A. V. Gusev

https://doi.org/10.14341/DM13065

Full Text:

PDF (Rus) HTML (Rus) XML (Rus)

Generate QR code

Abstract

BACKGROUND: The incidence of diabetes mellitus (DM) both in the Russian Federation and in the world has been steadily increasing for several decades. Stable population growth and current epidemiological characteristics of DM lead to enormous economic costs and significant social losses throughout the world. The disease often progresses with the development of specific complications, while significantly increasing the likelihood of hospitalization. The creation and inference of a machine learning model for predicting hospitalizations of patients with DM to an inpatient medical facility will make it possible to personalize the provision of medical care and optimize the load on the entire healthcare system.

AIM: Development and validation of models for predicting unplanned hospitalizations of patients with diabetes due to the disease itself and its complications using machine learning algorithms and data from real clinical practice.

MATERIALS AND METHODS: 170,141 depersonalized electronic health records of 23,742 diabetic patients were included in the study. Anamnestic, constitutional, clinical, instrumental and laboratory data, widely used in routine medical practice, were considered as potential predictors, a total of 33 signs. Logistic regression (LR), gradient boosting methods (LightGBM, XGBoost, CatBoost), decision tree-based methods (RandomForest and ExtraTrees), and a neural network-based algorithm (Multi-layer Perceptron) were compared. External validation was performed on the data of the separate region of Russian Federation.

RESULTS: The best results and stability to external validation data were shown by the LightGBM model with an AUC of 0.818 (95% CI 0.802–0.834) in internal testing and 0.802 (95% CI 0.773–0.832) in external validation.

CONCLUSION: The metrics of the best model were superior to previously published studies. The results of external validation showed the relative stability of the model to new data from another region, that reflects the possibility of the model’s application in real clinical practice.

Keywords

diabetes mellitus, hospitalization, predictive models, machine learning, artificial intelligence

About the Authors

A. E. Andreychenko

K-SkAI LLC
Russian Federation

Anna E. Andreychenko, PhD in Physics and Mathematics.

17 Varkaus Embankment, 185901 Petrozavodsk

Competing Interests:

none

A. D. Ermak

K-SkAI LLC
Russian Federation

Andrey D. Ermak - Data analyst, Artificial Intelligence Department, K-SkAI.

Petrozavodsk

Competing Interests:

none

D. V. Gavrilov

K-SkAI LLC
Russian Federation

Denis V. Gavrilov

Petrozavodsk

Competing Interests:

none

R. E. Novitskiy

K-SkAI LLC
Russian Federation

Roman E. Novitskiy.

Petrozavodsk

Competing Interests:

none

A. V. Gusev

Federal Research Institute for Health Organization and Informatics; Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies
Russian Federation

Alexander V. Gusev - PhD in Engineering, Senior Researcher.

Moscow

Competing Interests:

none

References

1. Dedov II, Shestakova M V., Mayorov AY, et al. Standards of specialized diabetes care. Edited by Dedov I.I., Shestakova M.V., Mayorov A.Yu. 10th edition. Diabetes mellitus. 2022;24(1S):1-148. (In Russ.) https://doi.org/10.14341/DM12802

2. World Health Organization, International Diabetes Federation. Definition and diagnosis of diabetes mellitus and intermediate hyperglycaemia. Report of a WHO/IDF consultation. Geneva; 2006

3. Dedov II, Shestakova M V., Vikulova OK, Zheleznyakova AV., Isakov MА. Epidemiological characteristics of diabetes mellitus in the Russian Federation: clinical and statistical analysis according to the Federal diabetes register data of 01.01.2021. Diabetes Mellit. 2021;24(3):204-221. (In Russ.) https://doi.org/10.14341/DM12759

4. Saeedi P, Petersohn I, Salpea P, Malanda B, Karuranga S, Unwin N, et al. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas. Diabetes research and clinical practice. 2019; 157:107843

5. Jeong IS, Kang CM. Time to Diagnosis and Treatment of Diabetes Mellitus among Korean Adults with Hyperglycemia: Using a Community-Based Cohort Study. International Journal of Environmental Research and Public Health. 2022; 19(19):12090. https://doi.org/10.3390/ijerph191912090

6. Bommer C, Heesemann E, Sagalova V, et al. The global economic burden of diabetes in adults aged 20-79 years: a cost-of-illness study. Lancet Diabetes Endocrinol 2017;5(6):423–30. https://doi.org/10.1016/S2213-8587(17)30097-9

7. Caughey GE, Pratt NL, Barratt JD, Shakib S, Kemp‐Casey AR, Roughead EE. Understanding 30‐day re‐admission after hospitalisation of older patients for diabetes: identifying those at greatest risk. Med J Aust. 2017;206(4):170-175. https://doi.org/10.5694/mja16.00671

8. Li T-C, Li C-I, Liu C-S, et al. Development and validation of prediction models for the risks of diabetes-related hospitalization and in-hospital mortality in patients with type 2 diabetes. Metabolism. 2018;85:38-47. https://doi.org/10.1016/j.metabol.2018.02.003

9. Brisimi TS, Xu T, Wang T, Dai W, Paschalidis IC. Predicting diabetes-related hospitalizations based on electronic health records. Stat Methods Med Res. 2019; 28: 3667–82

10. Lu H, Uddin S. Explainable Stacking-Based Model for Predicting Hospital Readmission for Diabetic Patients. Information. 2022; 13(9):436

11. Hai AA, et al. Deep Learning vs Traditional Models for Predicting Hospital Readmission among Patients with Diabetes

12. Bhuvan MS, Kumar A, Zafar A, Kishore V. Identifying Diabetic Patients with High Risk of Readmission. Feb. 2016, [Online]

13. Moons KG, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1-73

14. Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence. 2020; 2(1): 56–67. https://doi.org/10.1038/s42256-019-0138-9

15. Strilets V, Bakumenko N, Donets V, et al. Machine Learning Methods in Medicine Diagnostics Problem, 16th International Conference on ICT in Education, Research and Industrial Applications. Integration, Harmonization and Knowledge Transfer: proceedings, 2020, Vol. II: pp. 89–101

16. Van Calster B, McLernon D J, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Medicine. 2019; 17(1): 230. https://doi.org/10.1186/s12916-019-1466-7

17. Zoubir AM, Iskandler DR. Bootstrap Methods and Applications. In IEEE Signal Processing Magazine. 2007;24(4):10-19. https://doi.org/10.1109/MSP.2007.4286560

18. Ding, Yufeng and Jeffrey S. Simonoff. An Investigation of Missing Data Methods for Classification Trees. Econometrics: Data Collection & Data Estimation Methodology eJournal (2006)

19. Awais M, Shamshad F, Bae S. (2019). Towards an Adversarially Robust Normalization Approach. ArXiv, abs/2006.11007

20. Fischer BG, Evans AT SpPin and SnNout Are Not Enough. It’s Time to Fully Embrace Likelihood Ratios and Probabilistic Reasoning to Achieve Diagnostic Excellence. J Gen Intern Med. 2023.

21. Dedov II, Shestakova MV, Vikulova OK, et al. Diabetes mellitus in the Russian Federation: dynamics of epidemiological indicators according to the Federal Register of Diabetes Mellitus for the period 2010–2022. Diabetes mellitus. 2023;26(2):104-123. (In Russ.) https://doi.org/10.14341/DM13035

22. Riley RD, Debray TPA., Collins GS, et al. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Statistics in Medicine. 2021; 40(19): 4230–4251

Supplementary files

	1. Figure 1. Study design.
	Subject
	Type	Исследовательские инструменты
	View (430KB)	Indexing metadata ▾

	2. Figure 2. Algorithm for assigning a case/record to a class with the presence or absence of the target event.
	Subject
	Type	Исследовательские инструменты
	View (345KB)	Indexing metadata ▾

	3. Figure 3. Algorithm for selecting the final model.
	Subject
	Type	Исследовательские инструменты
	View (666KB)	Indexing metadata ▾

	4. Figure 4. Completeness of feature values (A) and distribution of the number of completed features in treatment cases/records (B) in the final data set.
	Subject
	Type	Исследовательские инструменты
	View (855KB)	Indexing metadata ▾

	5. Figure 5. Prediction of hospitalizations in patients with diabetes mellitus. AUC on internal testing and external validation sets for selected models of all investigated machine learning methods. Black vertical lines indicate the limits of the 95% confidence intervals.
	Subject
	Type	Исследовательские инструменты
	View (162KB)	Indexing metadata ▾

	6. Figure 6. ROC curve with 95% CI of the final LightGBM model obtained from prediction results on the external validation data set.
	Subject
	Type	Исследовательские инструменты
	View (214KB)	Indexing metadata ▾

	7. Figure 7. Top 10 significant features of the best LightGBM model.
	Subject
	Type	Исследовательские инструменты
	View (223KB)	Indexing metadata ▾

	8. Figure 8. Probability calibration curve of the best LightGBM model.
	Subject
	Type	Исследовательские инструменты
	View (154KB)	Indexing metadata ▾

Review

For citations:

Andreychenko A.E., Ermak A.D., Gavrilov D.V., Novitskiy R.E., Gusev A.V. Development and validation of machine learning models to predict unplanned hospitalizations of patients with diabetes within the next 12 months. Diabetes mellitus. 2024;27(2):142-157. (In Russ.) https://doi.org/10.14341/DM13065

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).

ISSN 2072-0351 (Print)
ISSN 2072-0378 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

	Title	Figure 1. Study design.
	Type	Исследовательские инструменты
	Date	2024-05-08
Digital Object Identifier	https://doi.org/10.14341/DM13065-6996

	Title	Figure 2. Algorithm for assigning a case/record to a class with the presence or absence of the target event.
	Type	Исследовательские инструменты
	Date	2024-05-08
Digital Object Identifier	https://doi.org/10.14341/DM13065-6997

	Title	Figure 3. Algorithm for selecting the final model.
	Type	Исследовательские инструменты
	Date	2024-05-08
Digital Object Identifier	https://doi.org/10.14341/DM13065-6998

	Title	Figure 4. Completeness of feature values (A) and distribution of the number of completed features in treatment cases/records (B) in the final data set.
	Type	Исследовательские инструменты
	Date	2024-05-08
Digital Object Identifier	https://doi.org/10.14341/DM13065-6999

	Title	Figure 5. Prediction of hospitalizations in patients with diabetes mellitus. AUC on internal testing and external validation sets for selected models of all investigated machine learning methods. Black vertical lines indicate the limits of the 95% confidence intervals.
	Type	Исследовательские инструменты
	Date	2024-05-08
Digital Object Identifier	https://doi.org/10.14341/DM13065-7000

	Title	Figure 6. ROC curve with 95% CI of the final LightGBM model obtained from prediction results on the external validation data set.
	Type	Исследовательские инструменты
	Date	2024-05-08
Digital Object Identifier	https://doi.org/10.14341/DM13065-7001

	Title	Figure 7. Top 10 significant features of the best LightGBM model.
	Type	Исследовательские инструменты
	Date	2024-05-08
Digital Object Identifier	https://doi.org/10.14341/DM13065-7002

	Title	Figure 8. Probability calibration curve of the best LightGBM model.
	Type	Исследовательские инструменты
	Date	2024-05-08
Digital Object Identifier	https://doi.org/10.14341/DM13065-7003

User

Diabetes mellitus

Development and validation of machine learning models to predict unplanned hospitalizations of patients with diabetes within the next 12 months

Full Text:

Abstract

Keywords

About the Authors

References

Supplementary files

Review

For citations:

Cookies policy