Explainable Stroke Risk Prediction Using Machine Learning and Large Language Models: Toward a Mobile-Enabled Clinical Decision Support Application AHA Conference Repository

American Heart Association

Final ID: MPTU05

Explainable Stroke Risk Prediction Using Machine Learning and Large Language Models: Toward a Mobile-Enabled Clinical Decision Support Application

Abstract Body: Background:Stroke is the second leading cause of death worldwide, accounting for approximately eleven percent of all deaths. Effective prevention depends on early identification of at-risk individuals and clear communication of modifiable risk factors. Traditional risk calculators often lack transparency and adaptability for diverse clinical settings. Advances in artificial intelligence now allow integration of machine learning and natural language generation to enhance both prediction accuracy and patient communication. Therefore, we investigated the use of machine learning and LLM-based natural language generation to evaluate stroke risk prediction performance and the effectiveness of model-driven patient communication.

Methods: A dataset of 5,110 patient records containing demographic, behavioral, and clinical variables such as age, hypertension, heart disease, body mass index, glucose level, and smoking status was analyzed. Multiple machine learning algorithms (logistic regression, decision tree, random forest, gradient-boosted trees) were trained and compared for stroke prediction. Model interpretability was assessed using SHAP-based feature attribution to identify key predictors. Large language models (e.g., GPT, Claude, Gemini) were used to generate patient-friendly explanations, providing reasoning and context alongside predictions. Semantic consistency and differences across LLM explanations were compared. A mobile-responsive prototype tool was developed, allowing participants to enter individual variable values and receive risk prediction and interpretive feedback via QR code.

Results: Random Forest achieved the highest performance with 97% accuracy, 97% precision, 97% recall, and 97% F1-score, followed by Decision Tree (94% accuracy, 94% precision, 94% recall, 94% F1-score), Voting Classifier (91% accuracy, 91% precision, 91% recall, 91% F1-score), and Logistic Regression (87% accuracy). Feature attribution highlighted the most influential clinical and behavioral predictors. LLMs successfully generated individualized, understandable explanations of predicted risk, enabling patients to grasp modifiable factors. Comparative analysis of LLM outputs revealed subtle differences in reasoning and communication style.

Conclusions:This work presents an explainable and accessible stroke prediction framework combining machine learning and LLMs. This approach enhances patient engagement, promotes early intervention, and supports equitable stroke prevention.

Gupta, Isheeta ( Washington University in St. Louis , Saint Louis , Missouri , United States )
Thaker, Vishrut ( Emory University School of Medicine , Morrow , Georgia , United States )
Pandey, Saugat ( Washington University in St. Louis , University City , Missouri , United States )
Yepuri, Harita ( University of California San Fransisco School of Medicine , San Fransisco , California , United States )
Bita Ongolo, Pierre Manuel ( Emory University School of Medicine , Morrow , Georgia , United States )
Jaiswal, Vikash ( JCCR Cardiology Research , Jaunpur , India )

Author Disclosures:

Meeting Info:

EPI-Lifestyle Scientific Sessions 2026

2026

Boston, Massachusetts

Session Info:

Heath Tech/Big Data/Machine Learning/AI + Mobile Health Tech and Wearables

Tuesday, 03/17/2026 , 05:00PM - 07:00PM

Moderated Poster Session

More abstracts from these authors:

Explainable OCT-Based Atherosclerotic Plaque Segmentation Using Convolutional Neural Networks and Large Language Models

Gupta Isheeta, Verma Mallikarjun

Beyond Traditional Risk Factors: A Data-Driven Approach to Coronary Artery Disease (CAD) Prediction