Michael A. Raithel |
SAS programmers are finding themselves working with larger and larger SAS datasets. As the size of SAS datasets increase, so too does the strain on computing resources. Programs that took minutes to execute are running longer and longer as the number of observations increases. Sometimes the SAS datasets are so large that normal programming practices such as match-merging files simply fail due to their overwhelming the available computer resources. This particular issue can surface when you are using a file of ID variables to subset an impossibly large SAS dataset. What to do?
This presentation illustrates a methodology for overcoming computing resource issues when matching a SAS dataset of ID variables to a very large SAS dataset in order to extract matching observations. The method starts with the creation of a SAS format with the ID variables of the smaller SAS dataset. Once created, the format is used in a sequential read of the larger SAS dataset to select only those observations that match to the ID variable values stored in the format. Consequently, this methodology uses a single pass of each of the SAS datasets, so no sorting or match-merging of the datasets is necessary.
Attendees can begin using this technique in their programs that subset impossibly large SAS data sets immediately. |