Comparative Tools in SAS: Analyzing Self-Reported versus Imputed Values
September 23, 2024: 1:00 AM - 2:00 AM
Data Collection, Management & Manipulation, Brookside B

Authors Abstract
Neesha Nathwani, Sarah Woodruff Imputation of data is often a critical step in compensating for missing values when analysis is due to be performed. The need for effective imputation has been demonstrated within the Sentinel Initiative, a program sponsored by the U.S. Food and Drug Administration (FDA) created to monitor the safety of FDA-regulated medical products via an active surveillance system. Since Sentinel functions as a distributed data network, data completeness can vary, emphasizing the importance of quality data imputation. The focus of this paper is on comparison of imputed values, in particular those reporting race, ethnicity, sex, and gender. While current data partners are primarily using the Bayesian Improved Surname and Geocoding (BISG) method and a standardized imputation method developed by RTI International (RTI) for imputation, the tools and approaches available to compare reported versus imputed values vary much more widely. This paper describes and discusses the use of PROC COMPARE, implementation of crosstabulation particularly using PROC FREQ, and PROC MEANS, application of hash methods as well as regular expressions, and PROC SQL. Testing of these methods was done on fabricated descriptive reporting datasets which emulate those aggregated across the Sentinel population. This paper includes a comparative analysis of the methods to evaluate their effectiveness based on a range of priorities and relevant study outcomes.

Paper