Phenotyping Cardiac Surgery Patients Using Retrieval-Augmented Large Language Models AHA Conference Repository

American Heart Association

Final ID: MP2346

Phenotyping Cardiac Surgery Patients Using Retrieval-Augmented Large Language Models

Abstract Body (Do not enter title and authors here): Introduction:
Large Language Models (LLMs) are powerful tools for text extraction, but their tendency to hallucinate limits their reliability in clinical domains. We present a novel application of retrieval-augmented generation (RAG) to reduce hallucinations. Our approach restricts context to short, high-similarity segments within cardiac imaging reports, enabling more focused, conservative inference. We applied RAG to extract echocardiographic features from intraoperative transesophageal echocardiography (TEE) reports in a mixed cardiac surgery population to identify distinct patient phenotypes.
Hypothesis:
We hypothesized that RAG would outperform direct LLM querying in extracting key echocardiographic features by reducing hallucinations. We aimed to group patients into clinically meaningful clusters by their echocardiographic features.
Methods:
We developed a RAG pipeline that restricts LLM input to the most semantically relevant portions of TEE reports (Figure 1). We validated this pipeline on 500 manually labeled reports, extracting pre- and post-intervention left ventricular ejection fraction (LVEF), tricuspid regurgitation (TR), and right ventricular systolic function (RVSF), as well as pre-intervention aortic stenosis (AS), aortic regurgitation (AR), and mitral regurgitation (MR). RAG performance was compared to direct querying on these validation reports. Next, the pipeline was scaled to 7106 TEE reports to extract the features and intervention types. Patients were clustered using k-means, and each cluster’s characteristics were analyzed.
Results:
RAG’s conservative behavior—favoring “not found” over potential fabrications—resulted in fewer hallucinations compared to direct LLM queries (Figure 2): RAG improved adjusted accuracy across all validation features (LVEF pre: +1.24%, LVEF post: +0.47%, TR pre: +3.64%, TR post: +4.67%, RVSF pre: +5.31%, RVSF post: +4.33%, AS pre: +11.44%, AR pre: +3.93%, MR pre: +1.94%). Clustering revealed five distinct phenotypes: (1) an aortic disease group, (2) a CABG-dominant low risk group, (3) an advanced heart failure group, (4) a mixed valve disease group, and (5) a tricuspid disease group (Table 1).
Conclusions:
Our RAG pipeline improves the reliability of LLM-based clinical data extraction from TEE reports, enabling large-scale phenotyping of heterogeneous cardiac surgery populations. This approach has potential applications for personalized risk stratification and targeted clinical decision support in cardiac surgery.

Goldfinger, Shir ( University of Pennsylvania , Cherry Hill , New Jersey , United States )
Chan, Trevor ( University of Pennsylvania , Philadelphia , Pennsylvania , United States )
Grasfield, Rachel ( Des Moines University , Des Moines , Iowa , United States )
Eswar, Vikram ( University of Pennsylvania , Cherry Hill , New Jersey , United States )
Li, Kelly ( Harvard University , Boston , Massachusetts , United States )
Cao, Quy ( University of Pennsylvania , Cherry Hill , New Jersey , United States )
Pouch, Alison ( University of Pennsylvania , Cherry Hill , New Jersey , United States )
Mackay, Emily ( University of Pennsylvania , Cherry Hill , New Jersey , United States )

Author Disclosures:

Shir Goldfinger:

DO NOT have relevant financial relationships

Trevor Chan:

DO NOT have relevant financial relationships

Rachel Grasfield:

No Answer

Vikram Eswar:

No Answer

Kelly Li:

DO NOT have relevant financial relationships

Quy Cao:

No Answer

Alison Pouch:

DO NOT have relevant financial relationships

Emily Mackay:

No Answer

Meeting Info:

Scientific Sessions 2025

2025

New Orleans, Louisiana

Session Info:

Artificial Intelligence in Mechanistic and Preclinical Research: Bridging Discovery and Translational Science

Monday, 11/10/2025 , 01:45PM - 02:40PM

Moderated Digital Poster Session

More abstracts from these authors:

Patient-Specific Ascending Aortic Wall Shear Stress and Strain Analysis from 4D CT

Lobo Tricia, Wu Wensi, Litt Harold, Freas Melanie, Goldfinger Shir, Ferrari Victor, Bavaria Joseph, Desai Nimesh, Pouch Alison

A Systematic Approach to Prompting Large Language Models for Automated Feature Extraction from Cardiovascular Imaging Reports

Goldfinger Shir, Mackay Emily, Chan Trevor, Eswar Vikram, Grasfield Rachel, Yan Vivian, Barreto David, Pouch Alison

American Heart Association