Tianyi Yang |
This presentation explores the challenges and techniques of handling large-scale de-identified electronic health record (EHR) data using SAS, with a focus on exploratory data summaries. In this study, we focus on EHR data from a large-scale dataset involving patients from multiple health systems, including Duke University, the University of Iowa, and Dartmouth-Hitchcock. The de-identified data encompasses patient demographics, hospital encounters, diagnoses, and death records. Key tasks in this project involve:
1. Efficiently managing and filtering large datasets using SAS, particularly focusing on handling missing data and cleaning irregular variables such as dates and identifiers.
2. Creating new summary variables for exploratory data analysis, including calculating the number of unique patients, identifying first occurrences of specific diagnoses, and filtering records based on time constraints such as visits or diagnoses within two years of a patient's last visit.
The presentation will also highlight some of the more complex SAS tasks that were performed, such as the creation of dynamic categorical variables and large-scale filtering based on temporal relationships. These techniques are crucial when working with de-identified datasets where precise, individual-level tracking is obscured. |