Automatic assignment of CAD-RADS categories in coronary CTA reports using large language model AHA Conference Repository

American Heart Association

Final ID: MDP1524

Automatic assignment of CAD-RADS categories in coronary CTA reports using large language model

Abstract Body (Do not enter title and authors here):
Background
The effectiveness of different large language models (LLM) in extracting important information from CT angiography (CTA) radiology reports is unknown.
Purpose
The Coronary Artery Disease-Reporting and Data System (CAD-RADS) is a standardized reporting system for CTA that categorizes the severity of coronary artery disease to aid in clinical decision-making. To evaluate the agreement between human readers and LLMs in assigning CAD-RADS categories based on CTA reports.
Materials and Methods
This retrospective study analyzed reports from patients who underwent coronary CTA for screening or diagnosis at Johns Hopkins Hospital. Reports with CAD-RADS 0–5 findings were collected from Jan. 2020 to Oct. 2023. Initially, 6,212 CTA reports were retrieved. After excluding non-organized reports and retaining only those with CAD-RADS scores, 590 reports remained. Board-certified radiologists and three large language models (GPT-3.5, GPT-4, and Llama3) assigned CAD-RADS categories based on the original radiologists' findings. Two radiologists reviewed the CAD scores, with only one discrepancy in the 590 reports, indicating high inter-agreement. Agreement between human readers and LLMs for CAD-RADS categories was assessed using the Gwet agreement coefficient (AC1 value). Frequencies of changes in CAD-RADS category assignments that could potentially affect clinical management (CAD-RADS 0-2 vs. 3-5) were calculated and compared. Typically, CAD-RADS 0-2 indicates mild disease leading to conservative management, while CAD-RADS 3-5 denotes further testing or disease requiring aggressive intervention to prevent adverse cardiovascular events.
Results
Among 590 reports, agreement between original and reviewing radiologists was almost perfect (AC1 = 0.985). Good agreement was also found between original radiologists and Llama3, GPT-3.5, and GPT-4 (AC1 = 0.861, 0.907, 0.941, respectively). Differences in CAD-RADS category upgrades/downgrades potentially affecting clinical management were observed: 0 of 590 (0%) for human readers, 56 of 590 (9.4%) for Llama3, 33 of 590 (5.5%) for GPT-3.5, and 23 of 590 (3.9%) for GPT-4 upgrading CAD-RADs; 8 of 590 (1.4%) for human readers, 19 of 590 (3.2%) for Llama3, 22 of 590 (3.7%) for GPT-3.5, and 12 of 590 (2.0%) for GPT-4 downgrading RAD-CADs.
Conclusion
LLMs showed good agreement with human readers in assigning CAD-RADS categories on CTA reports. However, they were more likely to upgrade CAD-RADS category when compared to radiologists.

Wang, Yuli ( Johns Hopkins University , Baltimore , Maryland , United States )
Hsu, Wen-chi ( Johns Hopkins University , Baltimore , Maryland , United States )
Dai, Yuwei ( Central South University , Changsha , China )
Shi, Victoria ( Johns Hopkins University , Baltimore , Maryland , United States )
Lin, Gigin ( Chang Chung Memorial Hospital , Taoyuan , Taiwan )
Bai, Harrison ( Johns Hopkins University , Baltimore , Maryland , United States )
Lin, Cheng Ting ( Johns Hopkins University , Baltimore , Maryland , United States )

Author Disclosures:

Yuli Wang:

DO NOT have relevant financial relationships

Wen-Chi Hsu:

No Answer

Yuwei Dai:

No Answer

Victoria Shi:

No Answer

Gigin Lin:

DO NOT have relevant financial relationships

Harrison Bai:

No Answer

Cheng Ting Lin:

DO have relevant financial relationships

Meeting Info:

Scientific Sessions 2024

2024

Chicago, Illinois

Session Info:

Navigating the Cardiac Landscape: A Guide to AI-Driven Diagnostics

Monday, 11/18/2024 , 09:30AM - 10:35AM

Moderated Digital Poster Session

American Heart Association

Automatic assignment of CAD-RADS categories in coronary CTA reports using large language model

Meeting Info:

Session Info:

More abstracts on this topic: