Logo

American Heart Association

  2
  0


Final ID: Su3130

Evaluating the Clinical Reasoning of GPT-4, Grok, and Gemini in Different Fields of Cardiology

Abstract Body (Do not enter title and authors here): Introduction: The use of Large Language Models (LLMs) in clinical practice seems increasingly possible, although comparative studies are needed to achieve standardization. GPT-4, Grok and Gemini are three commonly used LLMs. While these LLMs may exhibit some level of clinical reasoning, their performance could differ between them and between areas of cardiology.
Aim: To evaluate the clinical reasoning of GPT-4, Grok and Gemini in distinct fields of cardiology.
Methods: A total of 12 cases were selected from the AHA Circulation Journal, representing six cardiology areas: General Cardiology (GC), Interventional Cardiology (IC), Cardiac Imaging (CI), Electrophysiology (EP), Heart Failure (HF) and Congenital Heart Disease (CHD). Each area included two cases. We only provided the case presentation to the LLMs, ensuring they were unaware of the diagnosis and management. Four questions were asked with the following prompts: 1. What are the diagnostic steps?, 2. What is the most likely diagnosis?, 3. What is the differential diagnosis?, 4. What is the appropriate management?. The responses were systematically evaluated with a digital rubric through an international collaboration of 12 cardiologists from various institutions. Each cardiologist evaluated two cases in their area of expertise and was aware of the diagnosis and management. One-way ANOVA was used for analysis.
Results: GPT-4 had a slightly best overall performance (Table 1) and the highest score in GC and CHD; Grok excelled in determining the management and was superior in CI, HF and IC; Gemini had the highest performance in EP, although these findings were not statistically significant (Table 2).
Conclusions: This is one of the first studies comparing the clinical reasoning of LLMs in cardiology. Our results propose that GPT-4 has the best overall performance and Grok is superior indicating the management. More studies comparing LLMs are required for further standardization of their use.
  • Reyes-rivera, Jonathan  ( Facultad de Medicina Autonoma de San Luis Potosi , San Luis Potosi , Mexico )
  • Chi, Gerald  ( Beth Isreal Deaconess Medical CTR , Boston , Massachusetts , United States )
  • Angulo, Stephanie  ( Instituto Nacional de Cardiologia , Ciudad de Mexico , Mexico )
  • Moore, Michelle  ( Emory University , Atlanta , Georgia , United States )
  • Lopez-quijano, Juan M.  ( Facultad de Medicina Autonoma de San Luis Potosi , San Luis Potosi , Mexico )
  • Samman, Abdallah  ( CENA RESEARCH INSTITUTE , Houston , Texas , United States )
  • Gordillo-moscoso, Antonio A.  ( Facultad de Medicina Autonoma de San Luis Potosi , San Luis Potosi , Mexico )
  • Ali, Asif  ( UT Health Science Center Houston , Houston , Texas , United States )
  • Castro Molina, Alberto  ( Beth Isreal Deaconess Medical CTR , Boston , Massachusetts , United States )
  • Romero-lorenzo, Marco  ( CENA RESEARCH INSTITUTE , Houston , Texas , United States )
  • Ali, Sajid  ( CENA RESEARCH INSTITUTE , Houston , Texas , United States )
  • Gibson, Charles  ( Beth Isreal Deaconess Medical CTR , Boston , Massachusetts , United States )
  • Saucedo, Jorge  ( Medical College of Wisconsin , Milwaukee , Wisconsin , United States )
  • Calandrelli, Matias  ( Hospital de la Santa Creu , Barcelona , Spain )
  • García Cruz, Edgar  ( Instituto Nacional de Cardiologia , Ciudad de Mexico , Mexico )
  • Bahit, Cecilia  ( Hospital Italiano de Buenos Aires , Buenos Aires , Argentina )
  • Author Disclosures:
    Jonathan Reyes-Rivera: DO NOT have relevant financial relationships | Gerald Chi: DO have relevant financial relationships ; Researcher:CSL Behring:Active (exists now) ; Researcher:Bayer:Active (exists now) ; Researcher:Janssen Research:Active (exists now) | Stephanie Angulo: No Answer | michelle moore: No Answer | Juan M. Lopez-Quijano: DO NOT have relevant financial relationships | Abdallah Samman: DO NOT have relevant financial relationships | Antonio A. Gordillo-Moscoso: DO NOT have relevant financial relationships | Asif Ali: DO have relevant financial relationships ; Other (please indicate in the box next to the company name):McGraw Hill publisher:Past (completed) ; Consultant:Signature Care:Past (completed) ; Speaker:Lumi Health:Past (completed) ; Speaker:ZOLL LifeVest:Past (completed) ; Consultant:Thrive360:Past (completed) ; Consultant:Qardio:Past (completed) ; Speaker:Boehringer Ingelheim:Active (exists now) ; Consultant:Avive AED:Active (exists now) ; Consultant:First History:Active (exists now) ; Consultant:CSMI:Past (completed) ; Consultant:DocNexus:Active (exists now) ; Consultant:Ainthoven:Active (exists now) ; Consultant:Valencell:Active (exists now) ; Executive Role:HealthSeers CMO:Active (exists now) ; Executive Role:Tabia Health AI CMO:Active (exists now) | Alberto Castro Molina: DO NOT have relevant financial relationships | Marco Romero-Lorenzo: DO NOT have relevant financial relationships | sajid ali: No Answer | Charles Gibson: No Answer | Jorge Saucedo: DO NOT have relevant financial relationships | Matias Calandrelli: DO NOT have relevant financial relationships | Edgar García Cruz: No Answer | Cecilia Bahit: No Answer
Meeting Info:

Scientific Sessions 2024

2024

Chicago, Illinois

Session Info:

LLMs Friend or Foe?

Sunday, 11/17/2024 , 03:15PM - 04:15PM

Abstract Poster Session

More abstracts on this topic:
A ChatGLM-based stroke diagnosis and prediction tool

Song Xiaowei, Wang Jiayi, Ma Weizhi, Wu Jian, Wang Yueming, Gao Ceshu, Wei Chenming, Pi Jingtao

A Deep Learning Digital Biomarker for Mitral Valve Prolapse using Echocardiogram Videos

Al-alusi Mostafa, Khurshid Shaan, Sanborn Danita, Picard Michael, Ho Jennifer, Maddah Mahnaz, Ellinor Patrick, Lau Emily, Small Aeron, Reeder Christopher, Shnitzer Dery Tal, Andrews Carl, Kany Shinwan, Ramo Joel, Haimovich Julian

You have to be authorized to contact abstract author. Please, Login
Not Available