Logo

American Heart Association

  10
  0


Final ID: MP2346

Phenotyping Cardiac Surgery Patients Using Retrieval-Augmented Large Language Models

Abstract Body (Do not enter title and authors here): Introduction:
Large Language Models (LLMs) are powerful tools for text extraction, but their tendency to hallucinate limits their reliability in clinical domains. We present a novel application of retrieval-augmented generation (RAG) to reduce hallucinations. Our approach restricts context to short, high-similarity segments within cardiac imaging reports, enabling more focused, conservative inference. We applied RAG to extract echocardiographic features from intraoperative transesophageal echocardiography (TEE) reports in a mixed cardiac surgery population to identify distinct patient phenotypes.
Hypothesis:
We hypothesized that RAG would outperform direct LLM querying in extracting key echocardiographic features by reducing hallucinations. We aimed to group patients into clinically meaningful clusters by their echocardiographic features.
Methods:
We developed a RAG pipeline that restricts LLM input to the most semantically relevant portions of TEE reports (Figure 1). We validated this pipeline on 500 manually labeled reports, extracting pre- and post-intervention left ventricular ejection fraction (LVEF), tricuspid regurgitation (TR), and right ventricular systolic function (RVSF), as well as pre-intervention aortic stenosis (AS), aortic regurgitation (AR), and mitral regurgitation (MR). RAG performance was compared to direct querying on these validation reports. Next, the pipeline was scaled to 7106 TEE reports to extract the features and intervention types. Patients were clustered using k-means, and each cluster’s characteristics were analyzed.
Results:
RAG’s conservative behavior—favoring “not found” over potential fabrications—resulted in fewer hallucinations compared to direct LLM queries (Figure 2): RAG improved adjusted accuracy across all validation features (LVEF pre: +1.24%, LVEF post: +0.47%, TR pre: +3.64%, TR post: +4.67%, RVSF pre: +5.31%, RVSF post: +4.33%, AS pre: +11.44%, AR pre: +3.93%, MR pre: +1.94%). Clustering revealed five distinct phenotypes: (1) an aortic disease group, (2) a CABG-dominant low risk group, (3) an advanced heart failure group, (4) a mixed valve disease group, and (5) a tricuspid disease group (Table 1).
Conclusions:
Our RAG pipeline improves the reliability of LLM-based clinical data extraction from TEE reports, enabling large-scale phenotyping of heterogeneous cardiac surgery populations. This approach has potential applications for personalized risk stratification and targeted clinical decision support in cardiac surgery.
  • Goldfinger, Shir  ( University of Pennsylvania , Cherry Hill , New Jersey , United States )
  • Chan, Trevor  ( University of Pennsylvania , Philadelphia , Pennsylvania , United States )
  • Grasfield, Rachel  ( Des Moines University , Des Moines , Iowa , United States )
  • Eswar, Vikram  ( University of Pennsylvania , Cherry Hill , New Jersey , United States )
  • Li, Kelly  ( Harvard University , Boston , Massachusetts , United States )
  • Cao, Quy  ( University of Pennsylvania , Cherry Hill , New Jersey , United States )
  • Pouch, Alison  ( University of Pennsylvania , Cherry Hill , New Jersey , United States )
  • Mackay, Emily  ( University of Pennsylvania , Cherry Hill , New Jersey , United States )
  • Author Disclosures:
    Shir Goldfinger: DO NOT have relevant financial relationships | Trevor Chan: DO NOT have relevant financial relationships | Rachel Grasfield: No Answer | Vikram Eswar: No Answer | Kelly Li: DO NOT have relevant financial relationships | Quy Cao: No Answer | Alison Pouch: DO NOT have relevant financial relationships | Emily Mackay: No Answer
Meeting Info:

Scientific Sessions 2025

2025

New Orleans, Louisiana

Session Info:
More abstracts on this topic:
A Machine Learning Algorithm to Detect Pediatric Supraventricular Tachycardia Risk from Baseline ECGs

Arezoumand Amirhossein, Danala Gopichandh, Masnadi Khiabani Parisa, Ebert David, Behere Shashank

A machine learning model for individualized risk prediction of ischemic heart disease in people with hypertension in Thailand

Sakboonyarat Boonsub, Poovieng Jaturon, Rangsin Ram

More abstracts from these authors:
Patient-Specific Ascending Aortic Wall Shear Stress and Strain Analysis from 4D CT

Lobo Tricia, Wu Wensi, Litt Harold, Freas Melanie, Goldfinger Shir, Ferrari Victor, Bavaria Joseph, Desai Nimesh, Pouch Alison

A Systematic Approach to Prompting Large Language Models for Automated Feature Extraction from Cardiovascular Imaging Reports

Goldfinger Shir, Mackay Emily, Chan Trevor, Eswar Vikram, Grasfield Rachel, Yan Vivian, Barreto David, Pouch Alison

You have to be authorized to contact abstract author. Please, Login
Not Available