Logo

American Heart Association

  156
  0


Final ID: MDP374

Scalable Phenotyping of Heart Failure Across Multicenter, Non-Interoperable Health Systems Using Retrieval-Augmented Generation and Large Language Models

Abstract Body (Do not enter title and authors here): Background: While identifying patient characteristics is critical to all electronic health record (EHR)-based research, the ability to do multicenter studies is impeded by differences in data structures, such that tools don’t generalize across EHRs. Large language models (LLMs) can be optimized with Retrieval-Augmented Generation (RAG) to enable EHR-structure agnostic queries for cohort characterization with minimal a priori knowledge of EHR structure. We develop and validate a tabular RAG model to extract clinical characteristics across multiple domains among patients with heart failure (HF) in 2 distinct health system EHRs.

Methods: Our approach employs a novel RAG architecture, combining information retrieval and a generative text model (Llama2-13b) to enhance data extraction from medical records. This identifies data relevant to the query for a clinical feature and then uses the generative model to interpretably synthesize the output. We evaluated this model on 1000 HF patients from the Yale New Haven Health System and 1000 deidentified records from Beth Israel Deaconess Medical Center (MIMIC-IV). Clinical knowledge-based queries extracted patient records, across categorical features (demographics, conditions, and medications) and continuous features (vital signs and labs) [A]. We tested the RAG's performance against manually extracted variables from the tables.

Results: The RAG model performed robustly across key variables in both cohorts, with overall extraction accuracy of 81% for Yale cohort and 82.9% for MIMIC cohort. For categorical variables like myocardial infarction, peripheral arterial disease, and medications (beta blockers, ACE inhibitors), Cohen's kappa values indicated strong agreement with ground truth (Yale: 0.8, 0.76, 1.0, and 0.82; MIMIC: 0.66, 0.83, 0.94, and 0.95). Continuous variables like creatinine, heart rate and systolic blood pressure showed high correlations (Yale: 0.99, 0.90 and 0.92; MIMIC: 1.0, 0.87 and 0.51) [B]. No significant statistical difference was found between ground truth and extracted values for all categorical variables (Mcnemar’s p-value > 0.05).

Conclusion: LLM-optimized RAGs can accurately extract clinical information across multiple EHRs with varying data architectures. This introduces the potential for phenotype extraction at scale, with applications in federated multicenter research, spanning clinical trials and electronic clinical quality assessment.
  • Vasisht Shankar, Sumukh  ( Yale University , New Haven , Connecticut , United States )
  • Thangaraj, Phyllis  ( Yale University , New Haven , Connecticut , United States )
  • Adejumo, Philip  ( Yale University , New Haven , Connecticut , United States )
  • Khera, Rohan  ( Yale School of Medicine , New Haven , Connecticut , United States )
  • Author Disclosures:
    Sumukh Vasisht Shankar: DO NOT have relevant financial relationships | Phyllis Thangaraj: DO NOT have relevant financial relationships | Philip Adejumo: DO NOT have relevant financial relationships | Rohan Khera: DO have relevant financial relationships ; Research Funding (PI or named investigator):Bristol-Myers Squibb:Active (exists now) ; Ownership Interest:Ensight-AI, Inc:Active (exists now) ; Ownership Interest:Evidence2Health LLC:Active (exists now) ; Research Funding (PI or named investigator):BridgeBio:Active (exists now) ; Research Funding (PI or named investigator):Novo Nordisk:Active (exists now)
Meeting Info:

Scientific Sessions 2024

2024

Chicago, Illinois

Session Info:
More abstracts on this topic:
Accuracy of Rule-based Natural Language Processing Models for Identification of Pulmonary Embolism

Rashedi Sina, Jimenez David, Monreal Manuel, Secemsky Eric, Klok Erik, Hunsaker Andetta, Aghayev Ayaz, Muriel Alfonso, Hussain Mohamad, Appah-sampong Abena, Aneja Sanjay, Krishnathasan Darsiya, Mojibian Hamid, Goldhaber Samuel, Wang Liqin, Zhou Li, Krumholz Harlan, Piazza Gregory, Bikdeli Behnood, Khairani Candrika, Bejjani Antoine, Lo Ying-chih, Zarghami Mehrdad, Mahajan Shiwani, Caraballo Cesar, Jimenez Ceja Jose Victor

Use of Large Language Models to Optimize Clinical Text Analysis for In-Hospital Cardiac Arrest Identification

Kaviyarasu Aarthi, Vurgun Ugurcan, Hwang Sy, Acevedo Ana, Abella Benjamin, Mowery Danielle, Mitchell Oscar

You have to be authorized to contact abstract author. Please, Login
Not Available