Racial Disparities in Knowledge of Cardiovascular Disease by Chat-Based Artificial Intelligence Models
Abstract Body (Do not enter title and authors here): Background Patients and their families often explore the information available online for information about their health status. Dialogue-based artificial intelligence (AI) language models (ChatGPT, Perplexity, Bing AI and Google Bard AI) havebeen developed for complex questions and answers even while still constantly evolving. We sought to assess whether AI model had knowledge of cardiovascular disease (CVD) racial disparities, including disparities associated with CVD risk factors and associated diseases.
Methods To assess the responses of various AI models to topics in cardiovascular disease disparities, we created twelve questions, with each question having varying topics and patient demographics. Each question was input into 4 different AI models and was asked three times to assess for variability in responses, and the application closed after each attempt.
Results A total of 144 responses were tabulated from ChatGPT, Google Bard, Perplexity, and Bing AI answers to 12 questions in triplicate to assess for consistency. Most responses to the same prompt were consistent across different question-and-answer sessions. ChatGPT’s responses to 75% of the questions (9 out of 12 questions) were appropriate, 25% (3 out of 12 questions) were inappropriate and none were unreliable. Google Bard's responses to 91.7% of the questions (11 out of 12 questions) were appropriate, 8.3% (1 out of 12 questions) were inappropriate and none were unreliable. Perplexity responses to 66.7% of the questions (8 out of 12 questions) were appropriate, 25% (3 out of 12 questions) were inappropriate and 8.3% of the questions (1 out of 12 questions) were unreliable. Bing AI responses to 75% of the questions (9 out of 12 questions) were appropriate, 16.7% (2 out of 12 questions) were inappropriate and 8.3% of the questions (1 out of 12 questions) were unreliable. Of the 144 prompt entries into ChatGPT, Google Bard, Perplexity, and Bing AI; 122 (84.7%) were correct, 11 (7.64%) were hedge responses and could not be binarized into a correct or incorrect response, and 11 (7.64%) were incorrect.
Conclusion Our study showed that online chat-based AI models have a broad knowledge of CVD racial disparities, however persistent gaps in knowledge about minority groups. Given that these models might be used by the general public, caution should be advised in taking responses at face value.
Eromosele, Benjamin
( Boston University
, Everett
, Massachusetts
, United States
)
Ughagwu, Kelechukwu
( University of Ibadan/University College Hospital
, Ibadan
, Nigeria
)
Johnson, Ayodeji
( VN Karazin National University
, Kharkiv
, Ukraine
)
Das, Herlyne
( St. George's University School of Medicine
, True Blue
, Grenada
)
Sobodu, Temitope
( Massachusetts College of Pharmacy and Health Sciences, Boston, MA
, Boston
, Massachusetts
, United States
)
Ouyang, David
( Cedars Sinai
, Los Aeles
, California
, United States
)
Author Disclosures:
Benjamin Eromosele:DO NOT have relevant financial relationships
| Kelechukwu Ughagwu:No Answer
| Ayodeji Johnson:DO NOT have relevant financial relationships
| Herlyne Das:DO NOT have relevant financial relationships
| Temitope Sobodu:No Answer
| David Ouyang:DO have relevant financial relationships
;
Consultant:invision:Active (exists now)
; Consultant:ultromics:Past (completed)
; Consultant:echoiq:Past (completed)
; Consultant:astrazeneca:Active (exists now)
; Consultant:alexion:Active (exists now)