Utilization and Efficacy of an Automated Transthoracic Echocardiographic Report Data Extraction
Abstract Body (Do not enter title and authors here): Background: Extraction of unstructured and semi-structured medical data is a key prerequisite for the application of bioinformatics. Portability, scalability, and protection of health information remain key problems in data analytics in medicine that cannot easily be solved using machine learning techniques alone, highlighting the importance of multi-faceted approaches.
Research Question / Hypothesis: Can rule-based algorithms reliably and identify and extract transthoracic echocardiographic (TTE) report findings for use in a data analytics pipeline?
Methods: Deidentified adult TTE reports were obtained between 09/14/2020 to 03/30/2023 within a single urban academic healthcare system. A rule-based algorithm was developed using derivatives of regular expressions in R to capture chamber parameters, cardiac function, and valvular disease. The accuracy was evaluated in a subset of manually adjudicated reports by study cardiologists.
Results: Of the 1000 reports obtained, we were able to extract 23079 (78.4%) populated data points out of 29423 maximal data points for 37 variables. Out of 803 manually verified NA data points, 743 (92.5%) were accurate. The mean net accuracy of all variables was 99.8% (see Table 1). Continuous data points showed 100% accuracy. Modes of failure for data extraction were in categorical variables (7.5% of the 23 features), with the most common being in left atrial size (n=6), mitral valve structure (n=5), aortic valve structure (n=13), tricuspid valve structure (n=2) and right ventricular function (n= 7). All other categorical variables showed 93.6% mean accuracy of NA data points. Conclusions: A rule-based algorithm is effective at converting cardiologist-read TTE reports into datasets ready for use data analytics. Moving forward, it would be important to test this tool on metrics of speed, cluster computing and scalability.
Praveen, Nischal
( University of Illinois Chicago
, Chicago
, Illinois
, United States
)
Shah, Anish
( University of Illinois Chicago
, Chicago
, Illinois
, United States
)
Hill, Michael
( University of Illinois Chicago
, Chicago
, Illinois
, United States
)
Tofovic, David
( University of Illinois Chicago
, Chicago
, Illinois
, United States
)
Author Disclosures:
Nischal Praveen:DO NOT have relevant financial relationships
| Anish Shah:DO NOT have relevant financial relationships
| Michael Hill:No Answer
| David Tofovic:DO NOT have relevant financial relationships