Leveraging Large Language Models to Extract Unstructured Cardiovascular Data from the EHR for Preventive Cardiology
Abstract Body (Do not enter title and authors here): Background: Preventive cardiology relies on a comprehensive view of patient health, including biomarkers and imaging findings. However, critical data, such as coronary calcium scores (CCS) and measures from CTA Fractional Flow Reserve (CT-FFR), often reside in unstructured free text within EHR, making them difficult to access and use effectively in clinical decision-making. Objective: We developed and validated large language models (LLMs) to extract novel biomarkers from unstructured cardiovascular data, and integrate them into a clinician-friendly interface to facilitate clinical decision making. Methods: We compared a natural language processing (NLP) technique to LLMs for extracting total and vessel-specific CCS and vessel-specific CT-FFR from free-text reports. We validated the measures through chart review, the gold standard. Through 6 iterations of prompt engineering and 2 different LLM models, we achieved 100% accuracy for both total CCS and CT-FFR. We applied the final prompt and LLM to all CTA and CT-FFR reports available for patients seen by preventive cardiology program. The discrete values were displayed in the preventive care dashboard to provide the clinical team with a comprehensive view for better management. Results: Among 255 CTA reports we extracted from 12/01/2023-11/22/2024, traditional NLP could only extract total CCS from 137 (54%) reports, while the LLM was able to extract CCS from all reports. Among 40 randomly selected CTA reports, 32 were coronary calcium score reports, and the LLM model successfully identified all clinical measures (i.e., vessel specific CCS and total CCS) correctly with 100% accuracy. Among the rest of 8 CT-FFR reports, only 2 reports had at least one vessel blockage ratio reported, and the LLM correctly captured all values. Applying the final LLM model to 498 patients who were referred to the Preventive Cardiology Institute during 12/06/2023-5/20/2025, we identified 560 total CCS reports with average total CCS of 147 (std=352), among which 24% had elevated CCS (i.e., 100+), 38.2% with score 0. Among 138 CT-FFR reports, 21 (15.2%) reports had at least one vessel blockage less than 0.8, indicating elevated risk for stroke or heart attack. Conclusion: Leveraging LLMs to unlock valuable cardiovascular data long hidden in EHR free-text information possesses potential to integrate structured and unstructured data to provide comprehensive clinical information to clinicians, facilitating proactive, data-driven care.
Yan, Xiaowei
( Sutter Health
, Walnut Creek
, California
, United States
)
Romero, Nick
( Sutter Health
, Walnut Creek
, California
, United States
)
Dinglasan, Patricia
( Sutter Health
, Walnut Creek
, California
, United States
)
Narayan, Girish
( Sutter Health
, Walnut Creek
, California
, United States
)
Strangemore, Brenda
( Sutter Health
, Walnut Creek
, California
, United States
)
Li, Jiang
( Sutter Health
, Walnut Creek
, California
, United States
)
Author Disclosures:
Xiaowei Yan:DO NOT have relevant financial relationships
| Nick Romero:DO NOT have relevant financial relationships
| Patricia Dinglasan:DO NOT have relevant financial relationships
| Girish Narayan:No Answer
| Brenda Strangemore:No Answer
| Jiang Li:No Answer