Early Detection of Subclinical Myocardial Dysfunction in Diabetes Using Leak-Safe Machine Learning on Routine Clinical Data
Abstract Body: Background Diabetic patients may develop subclinical myocardial dysfunction despite preserved ejection fraction (EF≥50%) and normal Doppler indices. Speckle-tracking echocardiography (STE) detects these abnormalities earlier, but STE is resource-intensive and not universally available. A machine-learning model using routine clinical data could identify high-risk patients for targeted imaging and earlier intervention. Objective To develop and validate a leakage-safe machine-learning approach identifying subclinical myocardial dysfunction—defined as abnormal STE despite EF≥50% and normal Doppler parameters—in adults with diabetes. Methods We analyzed a single-center cohort of diabetic adults. Positive class: EF≥50%, normal Doppler indices (E/e′≤14, tricuspid regurgitation velocity≤2.8 m/s, left atrial volume index≤34 mL/m^2), and abnormal global longitudinal strain (GLS). Negative class: same EF/Doppler criteria with normal GLS; others were excluded. Input features included demographics, vitals, anthropometrics, comorbidities, medications, and laboratories. All STE variables were excluded to prevent circularity. Preprocessing (median imputation, one-hot encoding, scaling) used a unified pipeline. Grouped 5-fold cross-validation by patient ID prevented patient-level leakage. Primary metrics: AUROC and average precision (AP). Models tested: logistic regression, random forest, and XGBoost. Results Among 233 eligible patients, 199 (85.4%) were GLS-abnormal and 34 (14.6%) GLS-normal. XGBoost achieved perfect discrimination (AUROC 1.000, AP 1.000) in cross-validation and out-of-fold testing. Random Forest performed strongly (CV AUROC 0.963±0.022, AP 0.994±0.003; OOF AUROC 0.962, AP 0.994), surpassing logistic regression (CV AUROC 0.788±0.059, AP 0.961±0.008; OOF AUROC 0.800, AP 0.961). Routine clinical variables accurately identified STE-positive patients. Given exceptional XGBoost performance, sensitivity analyses and external validation are warranted. Conclusions Leakage-aware ML using standard clinical data can flag diabetic patients with STE-defined subclinical dysfunction despite preserved EF and normal Doppler indices, enabling earlier risk stratification and targeted imaging before symptomatic heart failure. Future work includes multi-center validation, calibration assessment, and clinical workflow integration.
Tran, Tam
( Washington University School of Medicine
, Saint Louis
, Missouri
, United States
)
Lee, Wei Jun
( SUNY Downstate Health Sciences University
, Brooklyn
, New York
, United States
)
Nguyen, Dang
( Harvard University
, Cambridge
, Massachusetts
, United States
)
Marzouk, Sammer
( Northwestern Feinberg School of Medicine
, Chicago
, Illinois
, United States
)
Le, Trang
( Cardiovascular Research, Methodist
, Brooklyn
, New York
, United States
)
Truong, Hieu
( Prime Saint Francis Hospital
, Evanston
, Illinois
, United States
)
Erkelens, Bryce
( University of Southern California
, Los Angeles
, California
, United States
)
Huynh, Han
( Cardiovascular Research, Methodist
, Brooklyn
, New York
, United States
)
Le, Minh
( Taipei Medical University
, Houston
, Texas
, United States
)