Explainable Machine Learning-Based Identification of Clinical and Nutritional Determinants of Cardiovascular Diseases
Abstract Body: Background: Cardiovascular diseases (CVDs) remain the leading cause of death globally. Traditional risk models have emphasized adverse clinical factors, yet very few studies jointly quantify potentially protective nutritional influences alongside clinical measures.
Hypothesis: We hypothesize that integrating clinical and dietary variables will accurately discriminate prevalent CVDs, with age, hypertension, kidney function, and lipid measures as dominant features; and SHapley Additive exPlanations (SHAP) will provide clinically coherent directions of effect.
Methods: We conducted a retrospective cross-sectional analysis of adults ≥20 years from the NHANES dataset. Prevalent CVDs were defined by self-reported heart failure, coronary heart diseases, angina, myocardial infarction, or stroke. Elastic-net (EN) logistic regression, random forest (RF), and XGBoost machine learning (ML) models were trained using an 80/20 stratified split. Test-set performance was assessed with area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, precision, and F1 score. Predictor contributions were interpreted with SHAP for clinical-only, intake-only, and combined models.
Results: Among 22,516 participants, 2,502 (11.1%) reported CVDs. All ML models demonstrated strong discrimination: AUROC = 0.862 for XGBoost, AUROC = 0.857 for EN, and AUROC = 0.854 for RF. XGBoost achieved the highest accuracy of 0.769; sensitivity was highest for RF (0.836). Precision ranged 0.278-0.298; and the highest F1 was 0.434 for XGBoost. SHAP results were clinically concordant. In the clinical-only model, age, hypertension, lower eGFR, smoking, and lipid measures were most influential; diabetes and HbA1c contributed moderately. In the intake-only model, higher potassium, protein, calcium, iron, folate, and vitamin D intakes were among the most informative features. In the combined model, age and hypertension remained dominant, with additional contributions from non-HDL cholesterol, income-to-poverty ratio, eGFR, diabetes, bilirubin, smoking, HbA1c, and HDL.
Conclusions: Age and hypertension were the principal determinants, with kidney function and lipid measures strongly influential; dietary patterns added smaller, complementary contributions. Findings support interpretable, ML-based risk profiling that integrates routine clinical data with lifestyle information while acknowledging the need for rigorous phenotype definitions.
Le, Minh
( Taipei Medical University
, Taipei
, Taiwan
)
Chau, Lam
( School of Public Health, The University of Texas Health Science Center at Houston
, Houston
, Texas
, United States
)
Rutledge-jukes, Heath
( Washington University in St. Louis
, St. Louis
, Missouri
, United States
)
Jonnalagadda, Pallavi
( Washington University in St. Louis
, St. Louis
, Missouri
, United States
)
Sabet, Cameron
( Georgetown Medicine
, Washington
, District of Columbia
, United States
)
Ashar, Perisa
( Duke University
, Durham
, North Carolina
, United States
)
Tamirisa, Ketan
( Washington University in St. Louis
, St. Louis
, Missouri
, United States
)
Olaniran, Olabiyi
( Harvard T.H.Chan School of Public Health, Harvard University
, College Park
, Maryland
, United States
)
Natsume-kitatani, Yayoi
( National Institutes of Biomedical Innovation, Health, and Nutrition,
, Osaka
, Japan
)
Nguyen, Thanh-huy
( School of Computer Science, Carnegie Mellon University
, Pittsburgh
, Pennsylvania
, United States
)
Tran, Tam
( Washington University School of Medicine
, Saint Louis
, Missouri
, United States
)
Vu, Thien
( National Institutes of Biomedical Innovation, Health and Nutrition
, Osaka
, Japan
)
Xu, Min
( Carnegie Mellon University
, Pittsburgh
, Pennsylvania
, United States
)
Huynh, Phat
( North Carolina A&T State University
, Greensboro
, North Carolina
, United States
)
Kpodonu, Jacques
( Beth Israel Deaconess Medical Center, Harvard Medical School
, Boston
, Massachusetts
, United States
)
Le, Nguyen Quoc Khanh
( College of Medicine, Taipei Medical University
, Taipei
, Taiwan
)
Vinh, Tuan
( Oxford University
, Oxford
, United Kingdom
)
Nguyen, Dang
( Harvard T.H.Chan School of Public Health, Harvard University
, College Park
, Maryland
, United States
)
Huynh, Han
( Taipei Medical University
, Taipei
, Taiwan
)
Nguyen, Le Kim Chi
( National Cerebral and Cardiovascular Center
, Suita, Osaka
, Japan
)
Nguyen, Tu N
( Woolcock Institute of Medical Research
, Ho Chi Minh
, Viet Nam
)
Nguyen, Thanh T.
( University of Sydney
, Sydney
, New South Wales
, Australia
)
Le, Thu Huynh Minh
( School of Public Health, The University of Texas Health Science Center at Houston
, Houston
, Texas
, United States
)