Prediction of Mortality and Morbidity After Acute Myocardial Infarction Using Medication-Enriched Real-World Clinical Data and Machine Learning
Abstract Body (Do not enter title and authors here): Background: Despite advances in acute myocardial infarction (AMI) management, long-term risks such as cardiovascular mortality, recurrent AMI, and heart failure remain substantial. Traditional risk scores often lack sufficient accuracy due to the nonlinear interplay of clinical variables. Machine learning (ML) models can leverage high-dimensional structured data to improve cardiovascular risk prediction. Objective: To develop and validate ML models using real-world electronic health records (EHRs) to predict four post-AMI outcomes: cardiovascular mortality, all-cause mortality, recurrent AMI, and heart failure. We also compared model performance and interpretability across algorithms. Methods: In this retrospective cohort study, we analyzed 3,929 patients with first-time AMI from three tertiary hospitals in Taiwan (2014–2021). Patients with prior PCI, CABG, or heart failure were excluded. A total of 49 clinical and pharmacological predictors were derived from demographics, comorbidities, lab values, vital signs, and medications. Preprocessing included multiple imputation, normalization (z-score and Yeo-Johnson), and class imbalance correction via ROSE. Random Forest, AdaBoost, and XGBoost were trained using stratified 5-fold cross-validation. Model performance was evaluated using AUC, precision, recall, and F1-score. SHAP (SHapley Additive exPlanations) values were used for model interpretation. Results: Among the four predicted outcomes, model performance was most robust for all-cause and cardiovascular mortality. Random Forest achieved the highest AUC (0.882) and F1-score (0.734) for all-cause mortality, with age, serum creatinine, and body weight identified as top predictors. AdaBoost performed best for cardiovascular mortality (AUC = 0.779), with strong contributions from renal function, cardiac enzymes, and specific medications. Prediction for recurrent AMI and heart failure was comparatively modest and not emphasized. SHAP analysis revealed nonlinear, heterogeneous feature effects across individuals. Conclusions: ML models trained on EHR data can effectively predict mortality after AMI. Random Forest and AdaBoost demonstrated strong and interpretable performance for all-cause and cardiovascular mortality. These findings highlight the potential of explainable ML in post-AMI risk stratification and long-term care planning.
Hsiao, Buyuan
( Taipei medical university hospital
, Taipei
, Taiwan
)
Author Disclosures:
BUYUAN HSIAO:DO NOT have relevant financial relationships