Logo

American Heart Association

  32
  0


Final ID: MP1524

Comparison of ChatGPT-4o and OpenEvidence for Clinical Decision Support in Transcatheter Tricuspid Valve Interventions

Abstract Body (Do not enter title and authors here): Background:
Artificial intelligence (AI) tools are increasingly used to support clinical decision-making. However, data on their performance in complex, subspecialty areas such as structural heart disease remain limited, particularly in light of recent FDA approvals of the TriClip (Abbott) and EVOQUE (Edwards) systems for transcatheter tricuspid interventions.
Hypothesis:
We hypothesized that ChatGPT-4o and OpenEvidence would differ in clinical accuracy, response time, and consistency when addressing clinician-facing questions on transcatheter tricuspid valve repair (T-TEER) and replacement (TTVR).
Methods:
Fifteen clinical questions related to T-TEER and TTVR were submitted to both ChatGPT-4o and OpenEvidence in April 2025. Two board-certified cardiac imaging specialists independently graded responses as accurate, partially accurate, or inaccurate. Disagreements were resolved through consensus. Inter-rater agreement was measured using the free-marginal Fleiss kappa (κ). Response times were recorded and compared using Welch’s t-test.
Results:
ChatGPT-4o responses were rated as fully accurate in 10/15 cases (66.7%), partially accurate in 3/15 (20.0%), and inaccurate in 2/15 (13.3%) (Figure 1). OpenEvidence produced 4/15 fully accurate answers (26.7%), 3/15 partially accurate (20.0%), and 8/15 inaccurate (53.3%). ChatGPT-4o had significantly faster response times (mean 2.11 ± 0.84 s vs. 14.15 ± 2.91 s; p < 0.001) (Figure 2). Inter-rater agreement was higher for ChatGPT-4o (κ = 0.87) vs. OpenEvidence (κ = 0.69). The likelihood of a fully accurate response was significantly greater with ChatGPT-4o (RR 2.5; 95% CI 1.02–6.14; p = 0.043).
Conclusion:
ChatGPT-4o outperformed OpenEvidence in accuracy, speed, and consistency when answering complex questions about tricuspid valve therapies, suggesting potential utility in real-time clinical support. However, both platforms showed limitations, emphasizing the need for expert oversight. ChatGPT-4o’s contextual flexibility may be especially beneficial in busy clinical settings, while OpenEvidence’s citation-based outputs may better serve academic tasks.
  • Hajj, Joseph  ( Cleveland Clinic , Cleveland , Ohio , United States )
  • Mdaihly, Mohamad  ( Cleveland Clinic , Cleveland , Ohio , United States )
  • Kassab, Joseph  ( University of Texas Southwestern , Dallas , Texas , United States )
  • Miyasaka, Rhonda  ( Cleveland Clinic , Cleveland , Ohio , United States )
  • Harb, Serge  ( Cleveland Clinic , Cleveland , Ohio , United States )
  • Author Disclosures:
    Joseph Hajj: DO NOT have relevant financial relationships | Mohamad Mdaihly: DO NOT have relevant financial relationships | Joseph Kassab: DO NOT have relevant financial relationships | Rhonda Miyasaka: No Answer | Serge Harb: No Answer
Meeting Info:

Scientific Sessions 2025

2025

New Orleans, Louisiana

Session Info:

Transforming Healthcare with Large Language Models and NLP: From Unstructured Data to Clinical Insight

Sunday, 11/09/2025 , 11:50AM - 01:00PM

Moderated Digital Poster Session

More abstracts on this topic:
CAUGHT IN THE NET: A RARE CASE OF MRSA ENDOCARDITIS OF THE CHIARI NETWORK

Subramanian Lakshmi, Udani Kunjan, Srikanth Sashwath, Alrubaye Lara, Ardhanari Sivakumar

A Deep Learning Topic Analysis Approach for Enhancing Risk Assessment in Heart Failure Using Unstructured Clinical Notes

Adejumo Philip, Pedroso Aline, Khera Rohan

You have to be authorized to contact abstract author. Please, Login
Not Available