Michael A. Raithel |
Occasionally, we receive very wide SAS data sets with hundreds or thousands of variables. The data could have been extracted from Blaise, downloaded from a SQL Server table, obtained from an Oracle table, or sent by a client. Not all of the variables in the large SAS data set may have data stored in them. This can happen for any number of reasons, such as partial surveys, test tables, database tables with variables in them for future use, and upstream processing issues.
It is useful to know which variables actually have values stored in them. Analysts can concentrate on analyzing those variables and skip the others. Completely empty variables may indicate a problem with a file. Missing values might signify an issue with the collection instrument's skip patterns. Perhaps the chronically-missing variables can be dropped from the SAS data set or from the originating database tables. Additionally, it is sometimes be helpful to know which variables hold values for the individual cases within the SAS data set.
The SAS Data Set Missing Value Analyzer identifies all of the variables that have one or more non-missing value in them for a given SAS data set. The program creates four types of Excel report files:
• Count of the Variables That Have Non-Missing Values For Each Distinct ID Variable
• Characteristics of All Variables With At Least One Non-Missing Value
• Characteristics of Variables With ALL Missing Values
• Optionally, a separate report for each distinct ID Variable titled: Characteristics of The Variables with One or More Non-Missing Values
This presentation reviews the SAS Data Set Missing Value Analyzer macro program and the report files it creates. Attendees can begin using the macro directly after attending this session. |