Logo

American Heart Association

  16
  0


Final ID: MP2575

Optimizing the Accuracy of Natural Language Processing Models for Pulmonary Embolism Detection Through Integration with Claims Data: The PE-EHR+ Study

Abstract Body (Do not enter title and authors here): Background: Rule-based natural language processing (NLP) tools are easy to implement and can identify pulmonary embolism (PE) via radiology reports, but their accuracy is limited when used in isolation, and their external validity remains uncertain.
Methods: In this cross-sectional study, we analyzed data from a prespecified sample of 1,712 hospitalized patients (with and without PE) at Mass General Brigham (MGB) hospitals (2016–2021) and applied two previously published NLP algorithms (Verma et al. and Johnson et al) to radiology reports to identify PE. Chart review by two independent physicians using pre-specified criteria was the reference standard. We tested three approaches: (A) NLP applied to all patients; (B) NLP limited to patients with primary or secondary International Classification of Diseases (ICD)-10 PE discharge codes; and (C) NLP applied to patients with PE discharge codes or a Present-on-Admission (POA) indicator (“Y” or “N”) for PE. All others were assumed PE-negative in Approaches B and C to minimize false positives with NLP. Weighted estimates were derived from the full MGB hospitalized cohort (n=381,642) to calculate F1 scores that summarize model performance by combining sensitivity and positive predictive value (PPV) [F1 = 2 x (PPV x sensitivity)/ (PPV + sensitivity)].
Results: In total, 7,708 (2.0%) patients had PE. In Approach A, both NLP models showed high sensitivity (82.5%, 93.0%) and specificity (98.9%, 98.7%) but low PPV (60.3%, 59.6%) (Figure). Approach B improved PPV (95.2%, 94.9%) at the cost of reduced sensitivity (74.1%, 76.2%), while Approach C preserved both high sensitivity (82.5%, 93.0%) and PPV (95.6%, 95.8%). Approach C demonstrated the best performance, yielding significantly higher F1 scores for both NLP models (88.6%, 94.4%) compared with Approach A (69.7%, 72.6%) and Approach B (83.3%, 84.5%) (P<0.001).
Conclusions: The accuracy of PE detection improves when rule-based NLP models are operationalized within a screening framework using administrative claims data in addition to radiology reports.
  • Rashedi, Sina  ( Brigham and Women's Hospital , Boston , Massachusetts , United States )
  • Hussain, Mohamad  ( Brigham and Womens Hospital , Boston , Massachusetts , United States )
  • Mojibian, Hamid  ( Yale School of Medicine , NEW HAVEN , New York , United States )
  • Goldhaber, Samuel  ( Brigham and Women's Hospital , Boston , Massachusetts , United States )
  • Zhou, Li  ( Brigham and Women's Hospital , Boston , Massachusetts , United States )
  • Yang, Richard  ( Brigham and Women's Hospital , Boston , Massachusetts , United States )
  • Wang, Liqin  ( Brigham and Women's Hospital , Boston , Massachusetts , United States )
  • Krumholz, Harlan  ( Yale University , New Haven , Connecticut , United States )
  • Piazza, Gregory  ( Brigham and Women's Hospital , Boston , Massachusetts , United States )
  • Bikdeli, Behnood  ( Brigham and Womens Hospital , Boston , Massachusetts , United States )
  • Bukhari, Syed  ( Johns Hopkins University , Baltimore , Maryland , United States )
  • Krishnathasan, Darsiya  ( Brigham and Women's Hospital , Bellerose , New York , United States )
  • Khairani, Candrika  ( Rochester General Hospital , Rochester , New York , United States )
  • Bejjani, Antoine  ( UPMC , Pittsburgh , Pennsylvania , United States )
  • Pfeferman, Mariana  ( Brigham and Women's Hospital , Boston , Massachusetts , United States )
  • Zarghami, Mehrdad  ( Jamaica Hospital , New York , New York , United States )
  • Secemsky, Eric  ( BIDMC , Boston , Massachusetts , United States )
  • Rahaghi, Farbod  ( Harvard School of Medicine , Boston , Massachusetts , United States )
  • Author Disclosures:
    SINA RASHEDI: DO NOT have relevant financial relationships | Mohamad Hussain: DO have relevant financial relationships ; Research Funding (PI or named investigator):Vascular Therapies:Active (exists now) ; Consultant:Venova:Past (completed) ; Consultant:Humacyte:Past (completed) ; Research Funding (PI or named investigator):Voyager:Active (exists now) ; Research Funding (PI or named investigator):VenoStent:Active (exists now) ; Research Funding (PI or named investigator):Humacyte:Active (exists now) | HAMID MOJIBIAN: No Answer | Samuel Goldhaber: No Answer | Li Zhou: No Answer | Richard Yang: No Answer | Liqin Wang: No Answer | Harlan Krumholz: No Answer | Gregory Piazza: No Answer | Behnood Bikdeli: DO NOT have relevant financial relationships | Syed Bukhari: No Answer | Darsiya Krishnathasan: No Answer | Candrika Khairani: No Answer | Antoine Bejjani: DO NOT have relevant financial relationships | Mariana Pfeferman: DO NOT have relevant financial relationships Eric Secemsky: DO have relevant financial relationships ; Consultant:Abbott, Asahi, BD, Boston Scientific, Conavi, Cook, Cordis, Endovascular Engineering, Evident Vascular, Gore, InfraRedx , Medtronic, Philips, RapidAI, Rampart, R3, Shockwave , Siemens, Son i Vie, Teleflex, Terumo, Thrombolex, VentureMed , Zoll:Active (exists now) | Farbod Rahaghi: No Answer
Meeting Info:

Scientific Sessions 2025

2025

New Orleans, Louisiana

Session Info:

Best of Vascular Imaging

Monday, 11/10/2025 , 10:45AM - 12:00PM

Moderated Digital Poster Session

More abstracts on this topic:
More abstracts from these authors:
Age- and Sex- Differences in the Accuracy of Rule-based Natural Language Processing Models for Identification of Pulmonary Embolism

Krishnathasan Darsiya, Jimenez David, Monreal Manuel, Secemsky Eric, Klok Erik, Hunsaker Andetta, Aghayev Ayaz, Muriel Alfonso, Hussain Mohamad, Appah-sampong Abena, Aneja Sanjay, Rashedi Sina, Mojibian Hamid, Goldhaber Samuel, Wang Liqin, Zhou Li, Krumholz Harlan, Piazza Gregory, Bikdeli Behnood, Khairani Candrika, Bejjani Antoine, Lo Ying-chih, Zarghami Mehrdad, Mahajan Shiwani, Caraballo Cesar, Jimenez Ceja Jose Victor

Accuracy of Rule-based Natural Language Processing Models for Identification of Pulmonary Embolism

Rashedi Sina, Jimenez David, Monreal Manuel, Secemsky Eric, Klok Erik, Hunsaker Andetta, Aghayev Ayaz, Muriel Alfonso, Hussain Mohamad, Appah-sampong Abena, Aneja Sanjay, Krishnathasan Darsiya, Mojibian Hamid, Goldhaber Samuel, Wang Liqin, Zhou Li, Krumholz Harlan, Piazza Gregory, Bikdeli Behnood, Khairani Candrika, Bejjani Antoine, Lo Ying-chih, Zarghami Mehrdad, Mahajan Shiwani, Caraballo Cesar, Jimenez Ceja Jose Victor

You have to be authorized to contact abstract author. Please, Login
Not Available