Logo

American Heart Association

  25
  0


Final ID: MP465

Performance Benchmarking of Smaller Language Models Against GPT-4 for Predicting Reasons for Oral Anticoagulation Nonprescription in Atrial Fibrillation

Abstract Body (Do not enter title and authors here): Background:
Oral anticoagulation (OAC) reduces stroke risk in atrial fibrillation (AF), yet nonprescription rates approach 50% with poorly characterized reasons. Proprietary large language models (LLMs) like GPT-4 can identify documented reasons for OAC nonprescription from clinical notes but present cost and privacy barriers to widespread deployment. We investigate whether smaller, open-source LLMs (Gemma-2-9B-IT, Phi-4K) can achieve comparable performance.
Hypothesis:
Open-source LLMs can match the performance of GPT-4 using augmented techniques like chain-of-thought (CoT) prompting.
Methods:
We identified all patient encounters with clinician-billed ICD10 AF diagnosis codes at Stanford Health Care from January 1, 2015 through December 31, 2023. Three reviewers annotated 10% of AF-related note excerpts to identify OAC nonprescription reasons. We developed zero-shot prompts for GPT-4, Gemma-2-9B-IT, and Phi-4K, plus CoT prompts for the open-source models (Graphic 1). Performance was assessed using weighted macro-F1 scores.
Results:
Of 35,737 AF encounters, 7,712 (21.6%) lacked active OAC prescriptions. From 9,143 associated notes, we extracted 21,573 AF/OAC-related excerpts, with 10% (911 notes, 2,175 excerpts) manually annotated. Reasons for nonprescription appeared in 497 (54.6%) notes, most commonly antiplatelet use (18.6%), perceived contraindication (14.7%), and low AF burden (13.9%). Gemma-2-9B-IT with CoT achieved the highest average macro-F1 score (0.81), versus GPT-4 (0.80), Gemma-2-9B-IT (0.76), Phi-4-14B (0.71), and Phi-4-14B with CoT (0.68). Gemma-2-9B-IT with CoT outperformed others in four categories (perceived contraindication, low stroke risk, low AF burden, already on OAC), while GPT-4 performed best for patient preference and antiplatelet alternatives, and Gemma-2-9B-IT for history of AF ablation (Graphic 2).
Conclusions:
Gemma-2-9B-IT, an open-source LLM, effectively categorized OAC nonprescription reasons comparable to GPT-4. This demonstrates that much smaller, freely available, and privacy preserving LLMs can identify barriers to guideline-directed AF care and be deployed across health systems to help reduce care gaps in OAC prescriptions.
  • Somani, Sulaiman  ( Stanford Health Care , Menlo Park , California , United States )
  • Kim, Dale  ( Stanford University , Highlands Ranch , Colorado , United States )
  • Perez Guerrero, Eduardo  ( Stanford University , Stanford , California , United States )
  • Ngo, Summer  ( Stanford University , Highlands Ranch , Colorado , United States )
  • Nguyen, Minh  ( Stanford University , Highlands Ranch , Colorado , United States )
  • Sandhu, Alexander  ( Stanford University , Millbrae , California , United States )
  • Alsentzer, Emily  ( Stanford University , Highlands Ranch , Colorado , United States )
  • Hernandez-boussard, Tina  ( Stanford University , Highlands Ranch , Colorado , United States )
  • Rodriguez, Fatima  ( STANFORD UNIVERSITY , Palo Alto , California , United States )
  • Author Disclosures:
    Sulaiman Somani: DO NOT have relevant financial relationships | Dale Kim: DO NOT have relevant financial relationships | Eduardo Perez Guerrero: DO NOT have relevant financial relationships | Summer Ngo: DO NOT have relevant financial relationships | Minh Nguyen: DO NOT have relevant financial relationships | Alexander Sandhu: DO have relevant financial relationships ; Consultant:Reprieve Cardiovascular:Active (exists now) ; Consultant:Clearly:Active (exists now) ; Research Funding (PI or named investigator):NOVO NORDISK:Active (exists now) ; Research Funding (PI or named investigator):Novartis:Active (exists now) ; Research Funding (PI or named investigator):Bayer:Active (exists now) ; Research Funding (PI or named investigator):Astra Zeneca:Active (exists now) | Emily Alsentzer: DO have relevant financial relationships ; Consultant:Fourier Health:Active (exists now) | Tina Hernandez-Boussard: No Answer | Fatima Rodriguez: DO have relevant financial relationships ; Consultant:HealthPals:Past (completed) ; Consultant:Cleerly Health:Active (exists now) ; Consultant:Amgen:Active (exists now) ; Consultant:iRhythm:Active (exists now) ; Consultant:HeartFlow:Active (exists now) ; Consultant:Arrowhead Pharmaceuticals:Active (exists now) ; Consultant:Edwards:Active (exists now) ; Consultant:Inclusive Health:Active (exists now) ; Consultant:Esperion Therapeutics:Past (completed) ; Consultant:Kento Health:Active (exists now) ; Consultant:Movano Health:Active (exists now) ; Consultant:NovoNordisk:Past (completed) ; Consultant:Novartis:Active (exists now)
Meeting Info:

Scientific Sessions 2025

2025

New Orleans, Louisiana

Session Info:

Arrhythmias Unplugged: Equity, Innovation, and Risk in the Real World

Saturday, 11/08/2025 , 09:15AM - 10:30AM

Moderated Digital Poster Session

More abstracts from these authors:
Large Language Models to Understand Reasons for Anticoagulation Nonprescription in Atrial Fibrillation

Somani Sulaiman, Kim Dale, Perez Eduardo, Ngo Summer, Hernandez-boussard Tina, Rodriguez Fatima

Phenotyping Lipid-Lowering Therapies for Patients with ASCVD

Somani Sulaiman, Kim Dale, Ngo Summer, King Sara, Chen Tania, Hernandez-boussard Tina, Rodriguez Fatima

You have to be authorized to contact abstract author. Please, Login
Not Available