Logo

American Heart Association

  17
  0


Final ID: MP1526

High-Fidelity, Low-Cost Extraction of Valvular Disease Parameters from Echocardiography Reports Using Rule-Based NLP

Abstract Body (Do not enter title and authors here):
Background:
Extraction of data from transthoracic echocardiography (TTE) reports is critical for cardiovascular research, particularly in valvular heart disease. While machine learning (ML) and large language models (LLMs) offer automated solutions, they remain computationally intensive, opaque and cost-prohibitive for many health systems. The AHA highlights the need for scalable and interpretable informatics tools to bridge these gaps.

Objective:
Building on our prior work demonstrating high-accuracy extraction from TTE reports, we developed and validated an enhanced rule-based natural language processing (NLP) pipeline that emphasizes the extraction of valvular heart disease variables including aortic valve (AV) stenosis from structured and free-text sections.

Methods:
We analyzed 1,405 adult TTE reports (1000 for all parameters with an additional 405 for AV parameters) between 09/2020–03/2023 from a tertiary academic center. Using iteratively refined regular expressions in R, we extracted and structured 20 variables across demographic, functional and structural domains, with a focus on AV parameters, including degree of stenosis, AV area, gradients and velocities. Improving on our prior model, this pipeline parses unstructured narrative text to extract manually typed variables relevant to valvular disease. Performance was validated via comparison to manually extracted data. The pipeline was further evaluated on 7,800 reports for computational performance on a single-core central processing unit (CPU).

Results:
Overall mean extraction accuracy was 99.8% and 16 of 20 variables demonstrated 100% precision (PPV). AV area (PPV 99.5%, false positives [FP] 2) and AV peak velocity (PPV 99.6%, FP 2) showed slightly lower precision, whereas AV peak gradient (sensitivity [SN] 97.2%, false negatives [FN] 5) and AV dimensionless index (SN 97.4%, FN 6) showed lower sensitivity primarily due to formatting inconsistencies in free-text sections rather than extraction failure. An overall run-time of 14.5 mins for 7,800 reports, average inference time of 111.5 ms/report, throughput of 9 reports/second and max RAM usage of 47 MB per 1,000 reports.

Conclusion:
This rule-based NLP pipeline achieves near-perfect accuracy in extracting structured and free text variables from TTE reports, without the energy consumption, cost, or opacity of ML. This pipeline is scalable, interpretable and energy-efficient, making it an effective solution for real-world clinical data environments.
  • Boxley, Peter  ( University of Illinois Chicago , Chicago , Illinois , United States )
  • Praveen, Nischal  ( University of Illinois Chicago , Chicago , Illinois , United States )
  • Seaney, Darren  ( University of Illinois Chicago , Chicago , Illinois , United States )
  • Salem, Edward  ( University of Illinois Chicago , Chicago , Illinois , United States )
  • Gupta, Neil  ( University of Illinois Chicago , Chicago , Illinois , United States )
  • Seshadri, Suhas  ( University of Illinois-Chicago , Chicago , Illinois , United States )
  • Tofovic, David  ( University of Illinois Hospital , Chicago , Illinois , United States )
  • Author Disclosures:
    Peter Boxley: DO NOT have relevant financial relationships | Nischal Praveen: DO NOT have relevant financial relationships | Darren Seaney: DO NOT have relevant financial relationships | Edward Salem: No Answer | Neil Gupta: DO NOT have relevant financial relationships | Suhas Seshadri: DO NOT have relevant financial relationships | David Tofovic: DO NOT have relevant financial relationships
Meeting Info:

Scientific Sessions 2025

2025

New Orleans, Louisiana

Session Info:

Transforming Healthcare with Large Language Models and NLP: From Unstructured Data to Clinical Insight

Sunday, 11/09/2025 , 11:50AM - 01:00PM

Moderated Digital Poster Session

More abstracts on this topic:
A Machine Learning Approach to Simplify Risk Stratification of Patients with Atherosclerotic Cardiovascular Disease

Li Hsin Fang, Gluckman Ty, Nute Andrew, Weerasinghe Roshanthi, Wendt Staci, Wilson Eleni, Sidelnikov Eduard, Kathe Niranjan, Swihart Charissa, Jones Laney

A Bridge from Sweet to Sour: A Case of Recurrent Myocardial Stunning in Diabetic Ketoacidosis

Satish Vikyath, Pargaonkar Sumant, Slipczuk Leandro, Schenone Aldo, Maliha Maisha, Chi Kuan Yu, Sunil Kumar Sriram, Borkowski Pawel, Vyas Rhea, Rodriguez Szaszdi David Jose Javier, Kharawala Amrin, Seo Jiyoung

More abstracts from these authors:
Utilization and Efficacy of an Automated Transthoracic Echocardiographic Report Data Extraction

Praveen Nischal, Shah Anish, Hill Michael, Tofovic David

A Remedy for the Heart and the Hemoglobin: Improvement in Anemia Post Transcatheter Aortic Valve Replacement

Matta Raghav, Roy Aanya, Hammad Bayan, Draffen Arvind, Natsheh Zachary, Tiu Daniel, Tiu David, Salem Edward, Balami Jesse, Kalagara Swetha, Gupta Neil, Uraizee Omar, Sahgal Savina, Mishra Atreya, Ene Adriana, Hattab Aleyah, Arora Aarushi, Sufyaan Humam, Dau Trang, Silberstein Jonathan, Yu Julia, Torres Kayla, Seshadri Suhas, Navarro Laura, Singam Manisha, Ismail Mariam, Rana Riya, Habeel Samer, Liu Simon, Chaganti Srinidhi, Gurbuxani Vidur, Dwyer Kaluzna Stephanie, Groo Vicki, Carlson Andrew, Shroff Adhir, Bhayani Siddharth, Khan Azmer, Bhattaram Rohan, Zhang Runze, Shah Pal

You have to be authorized to contact abstract author. Please, Login
Not Available