Artificial Intelligence for Clinical Risk Stratification: Expert Based Risk Scores versus Online Open Source Generative Pre-Trained Transformers
Abstract Body (Do not enter title and authors here): Background: We explored the potential of cutting-edge open-label artificial intelligence, particularly the unique cognitive capabilities it offers, in modern clinical practice. Our study evaluated the efficacy of online open-source generative pre-trained transformers (ChatGPT) in predicting cardiovascular risk in patients with heart failure and preserved ejection fraction, comparing its performance with expert-based clinical stratification.
Methods: Retrospectively, we included 772 patients presenting with heart failure symptoms (mean age: 69±6 years, 56% female, mean ejection fraction: 61±5%, all >50%). They were followed for a median of 3.9 years for occurrences of death and hospitalization due to heart failure (HF). A script incorporating 12 variables (see Figure 1) was generated and submitted to the ChatGPT website, utilizing the returned score. Additionally, the H2FPEF score was computed as per guidelines. We then compared the predictive capabilities of both models for outcomes.
Results: During follow-up, 17 patients died, 52 were hospitalized, and 67 experienced the combined outcome. The average ChatGPT score stood at 6.1±1.7, whereas the mean H2FPEF score was 3.1±1.5, exhibiting a modest correlation (r=0.51, p<0.001). Receiver-operator characteristic curve analysis suggested thresholds of ChatGPT of 6 and H2FPEF of 3 [AE1] for predicting the combined outcome, with comparable accuracy (AUC: 0.71 vs 0.72, all p<0.001). Both models similarly accounted for patients' comorbidities, exercise capacity, and baseline and post-exercise diastolic function. Survival curves illustrated the discriminative power of both H2FPEF and ChatGPT scores in predicting death, HF hospitalization, and the combined outcome. While the agreement between the two classifications was modest (Kappa 0.4, p=0.032), ChatGPT facilitated the reclassification of high-risk patients identified by H2FPEF.
Conclusions: Open-source large language models such as ChatGPT can contribute to existing methods for predicting cardiac risk, offering the potential for significant cost and expertise savings. Future research endeavors should explore broader applications in diagnosis and management, always prioritizing rigorous ethical and equitable considerations.
Veshtaj, Marinela
( Mount Sinai, Morningside
, Bronx
, New York
, United States
)
Omar, Alaa
( Mount Sinai, Morningside
, Bronx
, New York
, United States
)
Alam, Loba
( Mount Sinai, Morningside
, Bronx
, New York
, United States
)
Kim, Ga Hee
( Mount Sinai, Morningside
, Bronx
, New York
, United States
)
Pinney, Sean
( Mount Sinai, Morningside
, Bronx
, New York
, United States
)
Argulian, Edgar
( Mount Sinai, Morningside
, Bronx
, New York
, United States
)
Author Disclosures:
Marinela Veshtaj:DO NOT have relevant financial relationships
| Alaa Omar:DO NOT have relevant financial relationships
| Loba Alam:No Answer
| Ga Hee Kim:DO NOT have relevant financial relationships
| Sean Pinney:No Answer
| Edgar Argulian:DO NOT have relevant financial relationships