Logo

American Heart Association

  72
  0


Final ID: Wed160

Addressing Racial Bias in GPT-4 Cardiovascular Clinical Reasoning

Abstract Body: Background:
Large language models (LLMs) like GPT-4 are increasingly used for clinical diagnostic reasoning and treatment. However, historical biases in training data may lead to inequitable outcomes for racial/ethnic minority groups.

Hypothesis:
GPT-4’s diagnostic performance and care recommendations differ across racial/ethnic groups, reflecting potential biases. Instructing GPT-4 to disregard race/ethnicity will mitigate these disparities.

Aims:
1) Evaluate GPT-4’s diagnostic accuracy and care recommendations for different racial/ethnic groups using real cardiovascular data; 2) Determine whether ignoring race/ethnicity reduces biases.

Methods:
Ten cardiovascular patient notes from the MIMIC-IV-Note database were each altered to reflect four racial/ethnic identities (Black, White, Hispanic, Asian). GPT-4 then generated the top 10 differential diagnoses, workups, and treatments for each variation, repeated across replicate sets, yielding 2,000 total prompts. Another 2,000 prompts were generated with instructions to disregard race/ethnicity. Diagnostic accuracy was assessed by comparing GPT-4 outputs to actual discharge diagnoses (FDR-corrected Mann-Whitney). Workups and treatments were examined via multivariable regression for advanced imaging (e.g., echocardiography, cardiac MRI) and specialist referrals, adjusting for gender.

Results:
Altering only race/ethnicity affected diagnostic accuracy for 20% of sampled cases (2/10). Hispanic patients were diagnosed less accurately for coronary artery disease than White (p=0.023), Asian (p=0.023), and Black (p=0.037) patients (mean rank difference=1.8 [SD 4.3], 1.9 [SD 5.1], 2.0 [SD 5.2], respectively). Black patients were diagnosed less accurately for myocardial infarction than Hispanic patients (p=0.016; mean rank difference=1.7 [SD 3.5]). After adjusting for gender, Black patients received fewer advanced imaging recommendations (83%) than White (86%; p=0.042) and Hispanic (87%; p=0.009) patients, and fewer specialist referrals than Asian patients (21% vs 28%; p=0.015). When GPT-4 was instructed to disregard race/ethnicity, these disparities were no longer significant.

Conclusions:
GPT-4 showed racial/ethnic disparities in diagnosing critical cardiovascular conditions and recommending advanced workups for Black and Hispanic patients. Prompt-based instructions to ignore race/ethnicity mitigated these biases. Future work should refine bias detection, mitigation, and monitoring before broader LLM deployment in clinical care.
  • Krieger, Katherine  ( Weill Cornell Medicine , New York , New York , United States )
  • Rossi, Camilla  ( Weill Cornell Medicine , New York , New York , United States )
  • Rahouma, Mohamed  ( Weill Cornell Medicine , New York , New York , United States )
  • Gaudino, Mario  ( Weill Cornell Medicine , New York , New York , United States )
  • Hameed, Irbaz  ( Yale University , Hamden , Connecticut , United States )
  • Quer, Giorgio  ( Scripps Research Translational Inst , La Jolla , California , United States )
  • Mack, Charles  ( Weill Cornell Medicine , New York , New York , United States )
  • Savic, Marco  ( Weill Cornell Medicine , New York , New York , United States )
  • Mantaj, Polina  ( Weill Cornell Medicine , New York , New York , United States )
  • Hirofuji, Aina  ( Weill Cornell Medicine , New York , New York , United States )
  • Gregg, Alexander  ( Weill Cornell Medicine , New York , New York , United States )
  • Soletti, Giovanni  ( Weill Cornell Medicine , New York , New York , United States )
  • Author Disclosures:
    Katherine Krieger: DO NOT have relevant financial relationships | Camilla Rossi: DO NOT have relevant financial relationships | Mohamed Rahouma: No Answer | Mario Gaudino: DO have relevant financial relationships ; Advisor:Abbott Vascular:Past (completed) | Irbaz Hameed: No Answer | Giorgio Quer: No Answer | Charles Mack: DO NOT have relevant financial relationships | Marco Savic: No Answer | Polina Mantaj: DO NOT have relevant financial relationships | Aina Hirofuji: No Answer | Alexander Gregg: No Answer | Giovanni Soletti: DO NOT have relevant financial relationships
Meeting Info:

Basic Cardiovascular Sciences 2025

2025

Baltimore, Maryland

Session Info:

Poster Session and Reception 1

Wednesday, 07/23/2025 , 04:30PM - 07:00PM

Poster Session and Reception

More abstracts on this topic:
Analysis of Factors and Paths Influencing the Health Information Seeking Behavior of Stroke Patients

Chen Lu, He Manlan, Hu Yufan

A large-scale multi-view deep learning-based assessment of left ventricular ejection fraction in echocardiography

Jing Linyuan, Metser Gil, Mawson Thomas, Tat Emily, Jiang Nona, Duffy Eamon, Hahn Rebecca, Homma Shunichi, Haggerty Christopher, Poterucha Timothy, Elias Pierre, Long Aaron, Vanmaanen David, Rocha Daniel, Hartzel Dustin, Kelsey Christopher, Ruhl Jeffrey, Beecy Ashley, Elnabawi Youssef

More abstracts from these authors:
You have to be authorized to contact abstract author. Please, Login
Not Available