Logo

American Heart Association

  126
  0


Final ID: MDP1054

Retrieval-Augmented Generation for Extracting CHADS-VASc and HAS-BLED Risk Factors from Unstructured Clinical Notes in Patients with Atrial Fibrillation

Abstract Body (Do not enter title and authors here): Background: Assessment of stroke and bleeding risk in patients with atrial fibrillation (AF) is crucial for guiding anticoagulation therapy. CHADS-VASc and HAS-BLED are widely used scores for defining these risks, but current assessments rely on manual calculation by clinicians or approximations from structured EHR data elements. Unstructured clinical notes contain rich information that could enhance risk assessment. We developed and validated a Retrieval-Augmented Generation (RAG) approach to extract CHADS-VASc and HAS-BLED risk factors from unstructured notes in patients with AF.

Methods: We employed a RAG architecture paired with the large language model, Llama3, to extract features relevant to CHADS-VASc and HAS-BLED scores from unstructured notes. The model was deployed on a random set of 1,00 clinical notes (416 AF patients) from Yale New Haven Hospital. To establish a gold standard, 2 clinicians manually reviewed and labeled CHADS-VASc and HAS-BLED risk factors in a random subset of 200 notes. The CHADS-VASc and HAS-BLED scores were calculated for each patient using structured data alone and by incorporating risk factors identified with RAG. We assessed performance across risk factors using macro-averaged area under the receiver operating characteristic (AUROC).

Results: The RAG model demonstrated robust performance in extracting risk factors from clinical notes. In the 1000 clinical notes, RAG identified several risk factors more frequently than structured elements, including hypertension (59.7% vs 46.6%), stroke (19.2% vs 12.7%), vascular disease (35.6% vs 27.2%), diabetes (33.9% vs 27.8%), medication usage predisposing to bleeding (29.1% vs 20.9%), and alcohol use (14.4% vs 8.9%). The RAG approach achieved an accuracy of 91% and macro-AUROC of 0.88, significantly outperforming structured data (accuracy 78%, macro-AUROC 0.76) in 200 expert annotated notes. Incorporating risk factors identified by RAG increased both CHADS-VASc and HAS-BLED scores compared with using structured data, with the mean CHADS-VASc score increasing from 3.3 ± 1.9 using structured data to 4.2 ± 2.1 with RAG. Similarly, the mean HAS-BLED score increased from 2.2 ± 1.3 to 2.7 ± 1.5.

Conclusion: An LLM-optimized RAG can accurately extract CHADS-VASc and HAS-BLED risk factors from unstructured clinical notes in AF patients. This approach can enable computable risk assessment and guide appropriate anticoagulation therapy.
  • Adejumo, Philip  ( Yale University , New Haven , Connecticut , United States )
  • Thangaraj, Phyllis  ( Yale University , New Haven , Connecticut , United States )
  • Vasisht Shankar, Sumukh  ( Yale University , New Haven , Connecticut , United States )
  • Khera, Rohan  ( Yale School of Medicine , New Haven , Connecticut , United States )
  • Author Disclosures:
    Philip Adejumo: DO NOT have relevant financial relationships | Phyllis Thangaraj: DO NOT have relevant financial relationships | Sumukh Vasisht Shankar: DO NOT have relevant financial relationships | Rohan Khera: DO have relevant financial relationships ; Research Funding (PI or named investigator):Bristol-Myers Squibb:Active (exists now) ; Ownership Interest:Ensight-AI, Inc:Active (exists now) ; Ownership Interest:Evidence2Health LLC:Active (exists now) ; Research Funding (PI or named investigator):BridgeBio:Active (exists now) ; Research Funding (PI or named investigator):Novo Nordisk:Active (exists now)
Meeting Info:

Scientific Sessions 2024

2024

Chicago, Illinois

Session Info:

CardioVibes: AI-Powered Heart Screening

Sunday, 11/17/2024 , 11:10AM - 12:35PM

Moderated Digital Poster Session

More abstracts on this topic:
A Deep Learning Digital Biomarker for Mitral Valve Prolapse using Echocardiogram Videos

Al-alusi Mostafa, Khurshid Shaan, Sanborn Danita, Picard Michael, Ho Jennifer, Maddah Mahnaz, Ellinor Patrick, Lau Emily, Small Aeron, Reeder Christopher, Shnitzer Dery Tal, Andrews Carl, Kany Shinwan, Ramo Joel, Haimovich Julian

Accuracy of Rule-based Natural Language Processing Models for Identification of Pulmonary Embolism

Rashedi Sina, Jimenez David, Monreal Manuel, Secemsky Eric, Klok Erik, Hunsaker Andetta, Aghayev Ayaz, Muriel Alfonso, Hussain Mohamad, Appah-sampong Abena, Aneja Sanjay, Krishnathasan Darsiya, Mojibian Hamid, Goldhaber Samuel, Wang Liqin, Zhou Li, Krumholz Harlan, Piazza Gregory, Bikdeli Behnood, Khairani Candrika, Bejjani Antoine, Lo Ying-chih, Zarghami Mehrdad, Mahajan Shiwani, Caraballo Cesar, Jimenez Ceja Jose Victor

You have to be authorized to contact abstract author. Please, Login
Not Available