Machine Learning Models for Cancer-Associated Thrombosis (CAT): A Systematic Review and Meta-Analysis of Predictive Ability
Abstract Body: Introduction: Cancer-associated thrombosis (CAT) is one of the leading cause of mortality in patients with active malignancy. Accurate risk stratification is essential to guide prophylaxis, yet current clinical risk scores, derived primarily using standard Logistic Regression (LR), often have modest predictive accuracy and may fail to capture the complex, non-linear interactions between cancer-specific therapies and patient risk factors. We sought to systematically compare the discrimination performance of aggregated Machine Learning (ML) algorithms versus traditional LR modeling to evaluate their potential as enhanced predictive tools.
Methods: Per PRISMA guidelines, a comprehensive search was conducted in PubMed/MEDLINE and Google Scholar to identify studies developing or validating predictive models for CAT in patients with solid and hematologic malignancies. Algorithms were stratified into a "Combined ML" cohort (Random Forest, XGBoost, SVM, Deep Learning [DL], and Ensemble methods) and a traditional LR cohort. The primary endpoint was model discrimination (Area Under the Receiver Operating Characteristic Curve [AUC]). Pooled AUCs were calculated using an inverse-variance weighted random-effects model.
Results: A total of 25 studies with 125,112 patients were included in the analysis. On internal validation, the Combined ML algorithms (pooled AUC: 0.83; 95% CI: 0.80–0.85; p=0.01) demonstrated similar discrimination to the standard LR models (pooled AUC: 0.81; 95% CI: 0.77–0.84; p<0.0001). However, XGBoost (pooled AUC: 0.87; 95% CI: 0.81–0.93; p<0.0001), DL (pooled AUC: 0.87; 0.84-0.90; p<0.0001) and Random Forest (pooled AUC: 0.85; 0.81-0.89; p=0) achieved the greater individual performance. External validation data, though limited (11 studies), indicated that ML models maintained discrimination (AUC range 0.76–0.85), whereas LR models frequently showed performance degradation (AUC <0.72 in multiple external cohorts).
Conclusion: In our analysis, although Advanced ML models show promise for improved CAT prediction compared to standard LR models, current evidence is limited by study heterogeneity, external generalizability, and interpretability challenges. Given these limitations and the "black box" nature of complex algorithms such as XGBoost, rigorous prospective studies are required to confirm clinical utility before widespread adoption.
Mylavarapu, Maneeth
(
Endeavor Health Cardiovascular Institute, Endeavor Health Glenbrook Hospital
, Glenview , Illinois , United States )
Tanwar, Niharika
(
Advocate Illinois Masonic Medical Center
, Chicago , Illinois , United States )
Kiyani, Madiha
(
Medstar Georgetown University Hospital
, Washington , District of Columbia , United States )
Vats, Vaibhav
(
Jacobi Medical Center/ Albert Einstein College of Medicine
, Bronx , New York , United States )
Ananthaneni, Anil
(
LSU Health Shreveport
, Shreveport , Louisiana , United States )
Parada Cabrera, Fabio
(
Instituto Guatemalteco de Seguridad Social (IGSS)
, Guatemala , Guatemala )
Medarametla, Ravi
(
Baptist Medical Center South
, Montgomery , Alabama , United States )
Sam, Riya
(
Endeavor Health Cardiovascular Institute, Endeavor Health Glenbrook Hospital
, Glenview , Illinois , United States )
Runkana, Ashok
(
Baptist Medical Center South
, Montgomery , Alabama , United States )
Pursnani, Amit
(
Endeavor Health Cardiovascular Institute, Endeavor Health Glenbrook Hospital
, Glenview , Illinois , United States )
Author Disclosures:
Maneeth Mylavarapu:DO NOT have relevant financial relationships
| Amit Pursnani:No Answer
| Niharika Tanwar:DO NOT have relevant financial relationships
| Madiha Kiyani:No Answer
| Vaibhav Vats:No Answer
| Anil Ananthaneni:DO NOT have relevant financial relationships
| Fabio Parada Cabrera:DO NOT have relevant financial relationships
| Ravi Medarametla:DO NOT have relevant financial relationships
| Riya Sam:No Answer
| Ashok Runkana:DO NOT have relevant financial relationships