Addressing Racial Bias in GPT-4 Cardiovascular Clinical Reasoning AHA Conference Repository

American Heart Association

115

Final ID: Wed160

Addressing Racial Bias in GPT-4 Cardiovascular Clinical Reasoning

Abstract Body: Background:
Large language models (LLMs) like GPT-4 are increasingly used for clinical diagnostic reasoning and treatment. However, historical biases in training data may lead to inequitable outcomes for racial/ethnic minority groups.

Hypothesis:
GPT-4’s diagnostic performance and care recommendations differ across racial/ethnic groups, reflecting potential biases. Instructing GPT-4 to disregard race/ethnicity will mitigate these disparities.

Aims:
1) Evaluate GPT-4’s diagnostic accuracy and care recommendations for different racial/ethnic groups using real cardiovascular data; 2) Determine whether ignoring race/ethnicity reduces biases.

Methods:
Ten cardiovascular patient notes from the MIMIC-IV-Note database were each altered to reflect four racial/ethnic identities (Black, White, Hispanic, Asian). GPT-4 then generated the top 10 differential diagnoses, workups, and treatments for each variation, repeated across replicate sets, yielding 2,000 total prompts. Another 2,000 prompts were generated with instructions to disregard race/ethnicity. Diagnostic accuracy was assessed by comparing GPT-4 outputs to actual discharge diagnoses (FDR-corrected Mann-Whitney). Workups and treatments were examined via multivariable regression for advanced imaging (e.g., echocardiography, cardiac MRI) and specialist referrals, adjusting for gender.

Results:
Altering only race/ethnicity affected diagnostic accuracy for 20% of sampled cases (2/10). Hispanic patients were diagnosed less accurately for coronary artery disease than White (p=0.023), Asian (p=0.023), and Black (p=0.037) patients (mean rank difference=1.8 [SD 4.3], 1.9 [SD 5.1], 2.0 [SD 5.2], respectively). Black patients were diagnosed less accurately for myocardial infarction than Hispanic patients (p=0.016; mean rank difference=1.7 [SD 3.5]). After adjusting for gender, Black patients received fewer advanced imaging recommendations (83%) than White (86%; p=0.042) and Hispanic (87%; p=0.009) patients, and fewer specialist referrals than Asian patients (21% vs 28%; p=0.015). When GPT-4 was instructed to disregard race/ethnicity, these disparities were no longer significant.

Conclusions:
GPT-4 showed racial/ethnic disparities in diagnosing critical cardiovascular conditions and recommending advanced workups for Black and Hispanic patients. Prompt-based instructions to ignore race/ethnicity mitigated these biases. Future work should refine bias detection, mitigation, and monitoring before broader LLM deployment in clinical care.

Krieger, Katherine ( Weill Cornell Medicine , New York , New York , United States )
Rossi, Camilla ( Weill Cornell Medicine , New York , New York , United States )
Rahouma, Mohamed ( Weill Cornell Medicine , New York , New York , United States )
Gaudino, Mario ( Weill Cornell Medicine , New York , New York , United States )
Hameed, Irbaz ( Yale University , Hamden , Connecticut , United States )
Quer, Giorgio ( Scripps Research Translational Inst , La Jolla , California , United States )
Mack, Charles ( Weill Cornell Medicine , New York , New York , United States )
Savic, Marco ( Weill Cornell Medicine , New York , New York , United States )
Mantaj, Polina ( Weill Cornell Medicine , New York , New York , United States )
Hirofuji, Aina ( Weill Cornell Medicine , New York , New York , United States )
Gregg, Alexander ( Weill Cornell Medicine , New York , New York , United States )
Soletti, Giovanni ( Weill Cornell Medicine , New York , New York , United States )

Author Disclosures:

Katherine Krieger:

DO NOT have relevant financial relationships

Camilla Rossi:

DO NOT have relevant financial relationships

Mohamed Rahouma:

No Answer

Mario Gaudino:

DO have relevant financial relationships

Irbaz Hameed:

No Answer

Giorgio Quer:

No Answer

Charles Mack:

DO NOT have relevant financial relationships

Marco Savic:

No Answer

Polina Mantaj:

DO NOT have relevant financial relationships

Aina Hirofuji:

No Answer

Alexander Gregg:

No Answer

Giovanni Soletti:

DO NOT have relevant financial relationships

Meeting Info:

Basic Cardiovascular Sciences 2025

2025

Baltimore, Maryland

Session Info:

Poster Session and Reception 1

Wednesday, 07/23/2025 , 04:30PM - 07:00PM

Poster Session and Reception

More abstracts from these authors:

Dynamic Intraoperative Changes in Left Ventricular Global Longitudinal Strain are associated with Post Operative Atrial Fibrillation in Cardiothoracic Surgery

Falco Giorgia, Rahouma Mohamed, Leshem Edan, Dellaquila Michele, Ciccone Marco, Gaudino Mario, Rong Lisa

Surgical options for myocardial bridge and coronary dissections

Gaudino Mario

American Heart Association

Addressing Racial Bias in GPT-4 Cardiovascular Clinical Reasoning

Meeting Info:

Session Info:

More abstracts on this topic:

More abstracts from these authors: