Addressing Racial Bias in GPT-4 Cardiovascular Clinical Reasoning
Abstract Body: Background: Large language models (LLMs) like GPT-4 are increasingly used for clinical diagnostic reasoning and treatment. However, historical biases in training data may lead to inequitable outcomes for racial/ethnic minority groups.
Hypothesis: GPT-4’s diagnostic performance and care recommendations differ across racial/ethnic groups, reflecting potential biases. Instructing GPT-4 to disregard race/ethnicity will mitigate these disparities.
Aims: 1) Evaluate GPT-4’s diagnostic accuracy and care recommendations for different racial/ethnic groups using real cardiovascular data; 2) Determine whether ignoring race/ethnicity reduces biases.
Methods: Ten cardiovascular patient notes from the MIMIC-IV-Note database were each altered to reflect four racial/ethnic identities (Black, White, Hispanic, Asian). GPT-4 then generated the top 10 differential diagnoses, workups, and treatments for each variation, repeated across replicate sets, yielding 2,000 total prompts. Another 2,000 prompts were generated with instructions to disregard race/ethnicity. Diagnostic accuracy was assessed by comparing GPT-4 outputs to actual discharge diagnoses (FDR-corrected Mann-Whitney). Workups and treatments were examined via multivariable regression for advanced imaging (e.g., echocardiography, cardiac MRI) and specialist referrals, adjusting for gender.
Results: Altering only race/ethnicity affected diagnostic accuracy for 20% of sampled cases (2/10). Hispanic patients were diagnosed less accurately for coronary artery disease than White (p=0.023), Asian (p=0.023), and Black (p=0.037) patients (mean rank difference=1.8 [SD 4.3], 1.9 [SD 5.1], 2.0 [SD 5.2], respectively). Black patients were diagnosed less accurately for myocardial infarction than Hispanic patients (p=0.016; mean rank difference=1.7 [SD 3.5]). After adjusting for gender, Black patients received fewer advanced imaging recommendations (83%) than White (86%; p=0.042) and Hispanic (87%; p=0.009) patients, and fewer specialist referrals than Asian patients (21% vs 28%; p=0.015). When GPT-4 was instructed to disregard race/ethnicity, these disparities were no longer significant.
Conclusions: GPT-4 showed racial/ethnic disparities in diagnosing critical cardiovascular conditions and recommending advanced workups for Black and Hispanic patients. Prompt-based instructions to ignore race/ethnicity mitigated these biases. Future work should refine bias detection, mitigation, and monitoring before broader LLM deployment in clinical care.
Krieger, Katherine
( Weill Cornell Medicine
, New York
, New York
, United States
)
Rossi, Camilla
( Weill Cornell Medicine
, New York
, New York
, United States
)
Rahouma, Mohamed
( Weill Cornell Medicine
, New York
, New York
, United States
)
Gaudino, Mario
( Weill Cornell Medicine
, New York
, New York
, United States
)
Hameed, Irbaz
( Yale University
, Hamden
, Connecticut
, United States
)
Quer, Giorgio
( Scripps Research Translational Inst
, La Jolla
, California
, United States
)
Mack, Charles
( Weill Cornell Medicine
, New York
, New York
, United States
)
Savic, Marco
( Weill Cornell Medicine
, New York
, New York
, United States
)
Mantaj, Polina
( Weill Cornell Medicine
, New York
, New York
, United States
)
Hirofuji, Aina
( Weill Cornell Medicine
, New York
, New York
, United States
)
Gregg, Alexander
( Weill Cornell Medicine
, New York
, New York
, United States
)
Soletti, Giovanni
( Weill Cornell Medicine
, New York
, New York
, United States
)
Author Disclosures:
Katherine Krieger:DO NOT have relevant financial relationships
| Camilla Rossi:DO NOT have relevant financial relationships
| Mohamed Rahouma:No Answer
| Mario Gaudino:DO have relevant financial relationships
;
Advisor:Abbott Vascular:Past (completed)
| Irbaz Hameed:No Answer
| Giorgio Quer:No Answer
| Charles Mack:DO NOT have relevant financial relationships
| Marco Savic:No Answer
| Polina Mantaj:DO NOT have relevant financial relationships
| Aina Hirofuji:No Answer
| Alexander Gregg:No Answer
| Giovanni Soletti:DO NOT have relevant financial relationships