Logo

American Heart Association

  20
  0


Final ID: WE488

A Comparison of English and Spanish AI Chatbot Responses to Online Cardiovascular Questions: Differences in Response Length, Readability, and Guidance

Abstract Body: Introduction
Large language models (LLMs) such as ChatGPT (OpenAI) and Gemini (Google) have become a dominant source of information. These tools contain nuanced and personalized information; they are also prone to hallucinations, societal biases, and sycophancy. While studies suggest LLMs are capable of providing high quality health information, less is known how this information varies by language and reading level. Because cardiovascular disease remains a leading cause of morbidity and mortality, and access to understandable health information is critical to promoting health equity, we studied the abilities of these models in providing approachable health information.

Methods
Twelve common cardiovascular questions were queried in English and Spanish using two popular sources for patient LLM usage, the ChatGPT (4o) and Google’s AI summaries (Table 1). Response length, sentence length, disclaimers, and clinical guidance elements were recorded and analyzed using descriptive statistics and paired t-tests were performed to assess between-language differences. Readability was assessed using validated tools - Flesch-Kincaid Grade Level for English and Fernández-Huerta index for Spanish.

Results
Across 48 responses, ChatGPT produced longer answers than Google in both languages. Mean sentence length was significantly longer in ChatGPT English compared with ChatGPT Spanish. Conversely, Google responses were slightly longer in Spanish. ChatGPT offered further medical guidance in 10 of 12 English and 11 of 12 Spanish responses, whereas Google provided none. Medical disclaimers were present in all Google responses and in approximately half of ChatGPT responses. Readability was above the recommended 6th grade level across languages and platforms.

Conclusion
In conclusion, AI response length, readability, and guidance to cardiovascular questions varied by language and platform. AI companies should target health-related outputs to reading levels that are appropriate to users. Our finding of decreased medical disclaimers is consistent with other work in this field. Further research should evaluate the quality of responses in different languages with validated metrics. There are major limitations to our study, most notably that it had a small sample size, and our evaluation did not analyze quality or meaning. Greater attention to readability and linguistic tailoring of AI outputs may enhance equitable access to understandable online health information.
  • Benarroch, Yoel  ( Beth Israel Deaconess Medical Cente , Boston , Massachusetts , United States )
  • Gusdorf, Jason  ( Beth Israel Deaconess Medical Cente , Boston , Massachusetts , United States )
  • Isaza, Nicolas  ( Beth Israel Deaconess Medical Cente , Boston , Massachusetts , United States )
  • Rodman, Adam  ( Beth Israel Deaconess Medical Cente , Boston , Massachusetts , United States )
  • Author Disclosures:
Meeting Info:

EPI-Lifestyle Scientific Sessions 2026

2026

Boston, Massachusetts

Session Info:

Poster Session 2

Wednesday, 03/18/2026 , 05:00PM - 07:00PM

Poster Session

More abstracts on this topic:
A RETROSPECTIVE EVALUATION OF RESOURCE UTILIZATION AND OUTCOMES IN PATIENTS USING MOBILE ECG DEVICES FOLLOWING ABLATION FOR ATRIAL FIBRILLATION

Fairman Alix, Loh Alexander, Druml Lauren, Triplett Cynthia, Liu Taylor, Pursnani Seema


A blood test based on RNA-seq and machine learning for the detection of steatotic liver disease: A Pilot Study on Cardiometabolic Health

Poggio Rosana, Berdiñas Ignacio, La Greca Alejandro, Luzzani Carlos, Miriuka Santiago, Rodriguez-granillo Gaston, De Lillo Florencia, Rubilar Bibiana, Hijazi Razan, Solari Claudia, Rodríguez Varela María Soledad, Mobbs Alan, Manchini Estefania

More abstracts from these authors:
You have to be authorized to contact abstract author. Please, Login
Not Available