Novel Strategies for Inferential Error Management in Reused Clinical Datasets AHA Conference Repository

American Heart Association

Final ID: MP616

Novel Strategies for Inferential Error Management in Reused Clinical Datasets

Abstract Body (Do not enter title and authors here):
Introduction
The use of publicly available datasets and large-scale registries, such as the Society of Thoracic Surgeons (STS) National Databases and the Medicare database, has revolutionized accessibility to large sample data across research centers. However, there are few existing data governance protocols for managing the risk of Type I and Type II across the expanding portfolio of studies requesting from the same registry.
Research Questions
We investigate how repeated and uncoordinated reuse of datasets increases the riskiness of Type I/II errors due to dependent risks, undermining the reliability of research findings. We also examine strategies to manage this risk by differentiating between actively managed and passively managed databases.
Methods
We adopt a decision-theoretic perspective to analyze how reuse of datasets can result in a dependence structure between tests that increases the disutility of the portfolio of Type I/II errors as measured by the actuarial notion of stop loss order.
Results
Figure 1 shows the distribution of Type I errors for a two-sample t-test comparing the means of seven treatment groups against a common control group with data reuse of the control (Design 1) vs. without reuse of the control (Design 2) in 10,000 simulations of the global null. While the FWER of Design 1 was lower than that of Design 2, the error distribution of Design 1 is strictly preferable in stop loss order. We further demonstrate how subsampling strategies and portfolio optimization techniques can be deployed in a variety of contexts to mitigate the effects of data reuse.
Conclusion
We are the first to propose a novel quantitative framework for reducing false positives and false negatives across multiple requests of the same database, providing a foundation for database managers to implement error control policies.
Existing measures of error control, such as per-comparison error rates, false discovery rates, and familywise error rates fail to address the complex error structures that arise from multiple, overlapping studies.
As reliance on large clinical datasets grows, especially those with high usage, robust error management strategies are crucial. Active dataset management offers a way to maintain the validity of conclusions from large registries. We advocate for increased funding of small, well-powered studies and the development of guidelines and software for dataset managers to effectively allocate inferential resources.

Dale, Reid ( Stanford University , San Jose , California , United States )
Leipzig, Matt ( Stanford University , San Jose , California , United States )
Baiocchi, Mike ( Stanford University , San Jose , California , United States )
Currie, Maria ( Stanford University , San Ramon , California , United States )

Author Disclosures:

Reid Dale:

No Answer

Matt Leipzig:

DO NOT have relevant financial relationships

Mike Baiocchi:

DO NOT have relevant financial relationships

Maria Currie:

No Answer

Meeting Info:

Scientific Sessions 2025

2025

New Orleans, Louisiana

Session Info:

From Systems to Solutions: Innovation, Equity, and Implementation at the Frontlines of Cardiovascular Care

Saturday, 11/08/2025 , 10:45AM - 11:55AM

Moderated Digital Poster Session

American Heart Association