Early Detection of Subclinical Myocardial Dysfunction in Diabetes Using Leak-Safe Machine Learning on Routine Clinical Data
Abstract Body: Background Diabetic patients may develop subclinical myocardial dysfunction despite preserved ejection fraction (EF≥50%) and normal Doppler indices. Speckle-tracking echocardiography (STE) detects these abnormalities earlier, but STE is resource-intensive and not universally available. A machine-learning model using routine clinical data could identify high-risk patients for targeted imaging and earlier intervention. Objective To develop and validate a leakage-safe machine-learning approach identifying subclinical myocardial dysfunction—defined as abnormal STE despite EF≥50% and normal Doppler parameters—in adults with diabetes. Methods We analyzed a single-center cohort of diabetic adults. Positive class: EF≥50%, normal Doppler indices (E/e′≤14, tricuspid regurgitation velocity≤2.8 m/s, left atrial volume index≤34 mL/m^2), and abnormal global longitudinal strain (GLS). Negative class: same EF/Doppler criteria with normal GLS; others were excluded. Input features included demographics, vitals, anthropometrics, comorbidities, medications, and laboratories. All STE variables were excluded to prevent circularity. Preprocessing (median imputation, one-hot encoding, scaling) used a unified pipeline. Grouped 5-fold cross-validation by patient ID prevented patient-level leakage. Primary metrics: AUROC and average precision (AP). Models tested: logistic regression, random forest, and XGBoost. Results Among 233 eligible patients, 199 (85.4%) were GLS-abnormal and 34 (14.6%) GLS-normal. XGBoost achieved perfect discrimination (AUROC 1.000, AP 1.000) in cross-validation and out-of-fold testing. Random Forest performed strongly (CV AUROC 0.963±0.022, AP 0.994±0.003; OOF AUROC 0.962, AP 0.994), surpassing logistic regression (CV AUROC 0.788±0.059, AP 0.961±0.008; OOF AUROC 0.800, AP 0.961). Routine clinical variables accurately identified STE-positive patients. Given exceptional XGBoost performance, sensitivity analyses and external validation are warranted. Conclusions Leakage-aware ML using standard clinical data can flag diabetic patients with STE-defined subclinical dysfunction despite preserved EF and normal Doppler indices, enabling earlier risk stratification and targeted imaging before symptomatic heart failure. Future work includes multi-center validation, calibration assessment, and clinical workflow integration.
Tran, Tam
(
Washington University School of Medicine
, Saint Louis , Missouri , United States )
Lee, Wei Jun
(
SUNY Downstate Health Sciences University
, Brooklyn , New York , United States )
Nguyen, Dang
(
Harvard University
, Cambridge , Massachusetts , United States )
Marzouk, Sammer
(
Northwestern Feinberg School of Medicine
, Chicago , Illinois , United States )
Le, Trang
(
Cardiovascular Research, Methodist
, Brooklyn , New York , United States )
Truong, Hieu
(
Prime Saint Francis Hospital
, Evanston , Illinois , United States )
Erkelens, Bryce
(
University of Southern California
, Los Angeles , California , United States )
Huynh, Han
(
Cardiovascular Research, Methodist
, Brooklyn , New York , United States )
Le, Minh
(
Taipei Medical University
, Houston , Texas , United States )