Logo

American Heart Association

  8
  0


Final ID: MDP374

Scalable Phenotyping of Heart Failure Across Multicenter, Non-Interoperable Health Systems Using Retrieval-Augmented Generation and Large Language Models

Abstract Body (Do not enter title and authors here): Background: While identifying patient characteristics is critical to all electronic health record (EHR)-based research, the ability to do multicenter studies is impeded by differences in data structures, such that tools don’t generalize across EHRs. Large language models (LLMs) can be optimized with Retrieval-Augmented Generation (RAG) to enable EHR-structure agnostic queries for cohort characterization with minimal a priori knowledge of EHR structure. We develop and validate a tabular RAG model to extract clinical characteristics across multiple domains among patients with heart failure (HF) in 2 distinct health system EHRs.

Methods: Our approach employs a novel RAG architecture, combining information retrieval and a generative text model (Llama2-13b) to enhance data extraction from medical records. This identifies data relevant to the query for a clinical feature and then uses the generative model to interpretably synthesize the output. We evaluated this model on 1000 HF patients from the Yale New Haven Health System and 1000 deidentified records from Beth Israel Deaconess Medical Center (MIMIC-IV). Clinical knowledge-based queries extracted patient records, across categorical features (demographics, conditions, and medications) and continuous features (vital signs and labs) [A]. We tested the RAG's performance against manually extracted variables from the tables.

Results: The RAG model performed robustly across key variables in both cohorts, with overall extraction accuracy of 81% for Yale cohort and 82.9% for MIMIC cohort. For categorical variables like myocardial infarction, peripheral arterial disease, and medications (beta blockers, ACE inhibitors), Cohen's kappa values indicated strong agreement with ground truth (Yale: 0.8, 0.76, 1.0, and 0.82; MIMIC: 0.66, 0.83, 0.94, and 0.95). Continuous variables like creatinine, heart rate and systolic blood pressure showed high correlations (Yale: 0.99, 0.90 and 0.92; MIMIC: 1.0, 0.87 and 0.51) [B]. No significant statistical difference was found between ground truth and extracted values for all categorical variables (Mcnemar’s p-value > 0.05).

Conclusion: LLM-optimized RAGs can accurately extract clinical information across multiple EHRs with varying data architectures. This introduces the potential for phenotype extraction at scale, with applications in federated multicenter research, spanning clinical trials and electronic clinical quality assessment.
  • Vasisht Shankar, Sumukh  ( Yale University , New Haven , Connecticut , United States )
  • Thangaraj, Phyllis  ( Yale University , New Haven , Connecticut , United States )
  • Adejumo, Philip  ( Yale University , New Haven , Connecticut , United States )
  • Khera, Rohan  ( Yale School of Medicine , New Haven , Connecticut , United States )
  • Author Disclosures:
    Sumukh Vasisht Shankar: DO NOT have relevant financial relationships | Phyllis Thangaraj: DO NOT have relevant financial relationships | Philip Adejumo: DO NOT have relevant financial relationships | Rohan Khera: DO have relevant financial relationships ; Research Funding (PI or named investigator):Bristol-Myers Squibb:Active (exists now) ; Ownership Interest:Ensight-AI, Inc:Active (exists now) ; Ownership Interest:Evidence2Health LLC:Active (exists now) ; Research Funding (PI or named investigator):BridgeBio:Active (exists now) ; Research Funding (PI or named investigator):Novo Nordisk:Active (exists now)
Meeting Info:

Scientific Sessions 2024

2024

Chicago, Illinois

Session Info:
More abstracts on this topic:
12-lead electrocardiograms predict adverse cardiovascular outcomes of emergency department patients

Haimovich Julian, Kolossvary Marton, Alam Ridwan, Padros I Valls Raimon, Lu Michael, Aguirre Aaron

Variation in Prevalence, Treatment, and Outcomes in Heart Failure with Improved Ejection Fraction by Race and Ethnicity

Chang Alex, Vasti Elena, Sandhu Alexander, Go Alan, Adatya Sirtaz, Bhatt Ankeet, Lee Keane, Parikh Rishi, Ambrosy Andrew, Tan Thida, Hamilton Steven, Pabon Porras Maria, Vardeny Orly, Ku Ivy

You have to be authorized to contact abstract author. Please, Login
Not Available