A Multi-Tier, Natural-Language Processing Framework to Automate Labeling of Acute Cerebrovascular Events From Radiology Reports and Diagnosis Codes
Abstract Body: Background: Machine learning (ML) approaches can solve predictive problems in stroke care but often require large, labeled datasets. We sought to support rapid development of an ML model to predict unnecessary acute stroke alerts – which can strain resources and delay care. To this end, we developed a multi-tiered “weak labeling” algorithm to label a large sample of stroke alerts as being associated with cerebrovascular disease or stroke mimics, based on diagnostic codes and selected neuroradiology reports.
Methods: Using an unlabeled, previously un-analyzed institutional stroke alert registry, we developed a 4-tiered weak labeling heuristic to assign binary labels (presence/absence of acute cerebrovascular disease) to stroke alerts at each tier. In Tier 1, we developed natural language processing (NLP) rules based on clinical expertise, using report text of brain CT and MRI studies associated with each alert. In Tier 2, we created rules based on combinations of Tier 1 outputs within 48 hours after the first neuroimaging study. In Tier 3, we used validated stroke diagnosis codes at discharge. In Tier 4, we combined labels from Tiers 2 and 3 to assign a final binary label. We conducted manual, clinical expert validation of Tiers 1, 2, and 4 across three statistically significant samples. We determined performance by calculating accuracy, sensitivity, specificity, and area under the receiver-operating curve (AUROC) vis-à-vis manual review as ground-truth.
Results: We identified 16,515 stroke alerts with 28,548 neuroradiology studies between 2011 and 2021. We reviewed random, stratified samples (Tiers 1, 2: 800 reports each, Tier 4: 240 encounters). Tier 1 achieved 91.9% accuracy, 89.4% sensitivity, 93.2% specificity, and 91.4% AUROC. Tier 2 showed 91.7% accuracy, 92.8% sensitivity, 90.8% specificity, and 91.8% AUROC. Tier 4 had 86.8% accuracy, 90.7% sensitivity, 82.7% specificity, and 87.1% AUROC.
Conclusions: We developed an algorithm to weakly label stroke alerts with a binary outcome that demonstrated good classifying ability and required minimal manual review. However, tiers based on NLP only generally performed better than tiers using NLP and diagnostic codes. Our results suggest that similar approaches can be used for labeling clinical cohorts, particularly those comprising diseases or treatments that heavily rely on diagnostic imaging. The benefits of this approach should be weighed against the required technical expertise and computational resources.
Erekat, Asala
( Icahn School of Medicine at Mount Sinai
, Union City
, New Jersey
, United States
)
Stein, Laura
( Icahn School of Medicine at Mount Sinai
, Union City
, New Jersey
, United States
)
Delman, Bradley
( Mount Sinai Hospital
, New York
, New York
, United States
)
Karp, Adam
( Icahn School of Medicine at Mount Sinai
, Union City
, New Jersey
, United States
)
Kupersmith, Mark
( MOUNT SINAI HEALTH SYSTEM
, New York
, New York
, United States
)
Kummer, Benjamin
( MOUNT SINAI HEALTH SYSTEM
, New York
, New York
, United States
)
Author Disclosures:
Asala Erekat:DO NOT have relevant financial relationships
| Laura Stein:DO have relevant financial relationships
;
Research Funding (PI or named investigator):American Heart Association:Active (exists now)
| Bradley Delman:DO NOT have relevant financial relationships
| Adam Karp:DO NOT have relevant financial relationships
| Mark Kupersmith:No Answer
| Benjamin Kummer:DO NOT have relevant financial relationships