SESUG 2014 Conference Abstracts
Application Development
Using SAS® software to shrink the Data used in Apache Flex® Application
Ahmed Al-Attar
AD-44
This paper discusses the techniques I used at the Census Bureau to overcome the
issue of dealing with large amount of data while modernizing some of their
public facing web applications by using Service Oriented Architecture (SOA) to
deploy SAS powered Flex web applications. Techniques that resulted in reducing
142,293 XML lines (3.6 MB) down to 15,813 XML lines (1.8 MB) a 50% size
reduction on the server side (HTTP Response), and 196,167 observations down to
283 observations, a reduction of 99.8% in summarized data on the client side
(XML Lookup file).
%Destroy() a Macro With Permutations
Brandon Welch and James Vaughan
AD-103
The SAS® Macro is a powerful tool. It minimizes repetitive tasks and provides
portable tools for users. These tools are sometimes delivered to clients and a
quality macro is necessary. For example, when developed to perform a
complicated statistical test, we want the macro to produce accurate results and
a clean log. To accomplish this, we insert parameter checks. Depending on the
complexity of the macro it is sometimes difficult to perform a thorough check. We introduce the %Destroy()macro which uses the call RANPERK routine to permute
a list of arguments. These arguments are then passed to the macro you test. We
show how to add appropriate parameter checks to ensure on subsequent runs of
%Destroy() the testing macro produces the desired results. While this article
targets a clinical computing audience, the techniques we present offer a good
overview of macro processing that will educate SAS programmers of all levels
across various disciplines.
This is the Modern World: Simple, Overlooked SAS® Enhancements
Bruce Gilsen
AD-18
At my job as a SAS ® consultant at the Federal Reserve Board, reading the
SAS-L internet newsgroup, and at SAS conferences, I’ve noticed that some
smaller, less dramatic SAS enhancements seem to have fallen through the cracks. Users continue to use older, more cumbersome methods when simpler solutions are
available. Some of these enhancements were introduced in Version 9.2, but others were introduced in Version 9, Version 8,
or even Version 6! This paper reviews underutilized enhancements that allow you to more easily do the following.
- Write date values in the form yyyymmdd
- Increment date values with the INTNX function
- Create transport files: PROC CPORT/CIMPORT versus PROC COPY with the XPORT engine
- Count the number of times a character or substring occurs in a character string or the number of words in a character string
- Concatenate character strings
- Check if any of a list of variables contains a value
- Sort by the numeric portion of character values
- Retrieve DB2 data on z/OS mainframes
DFAST & CCAR: One size does not fit all
Charyn Faenza
AD-127
In 2014, for the first time, mid-market banks (consisting of banks and bank
holding companies with 10-50 bn in consolidated assets) were required to submit
Capital Stress Tests to the federal regulators under the Dodd-Frank Wall Street
Reform and Consumer Protection Act (DFAST). This is a process large banks have
been going through since 2011; however, mid-market banks are not positioned to
commit as many resources to their annual stress tests as their largest peers. Limited human and technical resources, incomplete or non-existent detailed
historical data, lack of enterprise-wide cross functional analytics teams, and
limited exposure to rigorous model validations, are all challenges mid-market
banks face. While there are fewer deliverables required from the DFAST banks,
the scrutiny the regulators are placing on the analytical modes is just as high
as their expectations for CCAR banks. This session is designed to discuss the
differences in how DFAST and CCAR banks execute their stress tests, the
challenges facing DFAST banks, and potential ways DFAST banks can leverage the
analytics behind this exercise.
PROC RANK, PROC SUMMARY and PROC FORMAT Team Up and a Legend is Born!
Christianna Williams
AD-73
The task was to produce a figure legend that gave the quintile ranges of a
continuous measure corresponding to each color on a five-color choropleth map. Actually, figure legends for several dozen maps for several dozen different
continuous measures and time periods…so, the process needed to be automated. A method was devised using PROC RANK to generate the quintiles, PROC SUMMARY to
get the data value ranges within each quintile, and PROC FORMAT (with the
CNTLIN= option) to generate and store the legend labels. And then, of course,
these were rolled into a few macros to apply the method for the many different
figure legends. Each part of the method is quite simple – even mundane –
but together these techniques allowed us to standardize and automate an
otherwise very tedious process. The same basic strategy could be used whenever
one needs to dynamically generate data “buckets” but then keep track of the
bucket boundaries – whether for producing labels or legends or so that future
data can be benchmarked against the stored categories.
Useful Tips for Building Your Own SAS® Cloud
Danny Hamrick
AD-146
Everyone has heard about SAS® Cloud. Now come learn how you can build and
manage your own cloud using the same SAS® virtual application (vApp) technology.
More Hash: Some Unusual Uses of the SAS Hash Object
Haikuo Bian, Carlos Jimenez and David Maddox
AD-102
Since the introduction of the SAS Hash Object in SAS 9.0 and recent
enhancements, the popularity of the methodology has been grown. The significant
effects of the technique in conjunction with the large memory capacity of
modern computing devices has brought new and exciting capabilities to the data
step. The most often cited application of the SAS Hash Object is table lookup. This paper will highlight several unusual applications of the methodology
including random sampling, “sledge-hammer matching”, anagram searching,
dynamic data splitting, matrix computation, and unconventional transposing.
Your Database can do SAS too!
Harry Droogendyk
AD-57
How often have you pulled oodles of data out of the corporate data warehouse
down into SAS for additional processing? Additional processing, sometimes
thought to be uniquely SAS's, such as FIRST. logic, cumulative totals, lag
functionality, specialized summarization or advanced date manipulation? Using
the Analyical / OLAP and Windowing functionality available in many databases (e.g. Teradata, Netezza ) all of this processing can be performed directly in
the database without moving and reprocessing detail data unnecessarily.
This presentation will illustrate how to increase your coding and execution
efficiency by utilizing the database's power through your SAS environment.
Moving Data and Results Between SAS® and Microsoft Excel
Harry Droogendyk
AD-58
Microsoft Excel spreadsheets are often the format of choice for our users, both
when supplying data to our processes and as a preferred means for receiving
processing results and data. SAS® offers a number of ways to import Excel data
quickly and efficiently. There are equally flexible methods to move data and
results from SAS to Excel This paper will outline the MANY techniques
available and identify useful tips for moving data and results between SAS and
Excel efficiently and painlessly.
Before and After: Implementing a Robust Outlier Identification Routine using SAS®
Jack Shoemaker
AD-126
Due to the long memory of the Internet, the author still receives frequent
questions about a paper presented in the early 1990s that described a set of
SAS ® macros to implement Tukey’s robust outlier (non-parametric) methods. The UNIVARIATE procedure formed the core of these macros. The paper was done
prior to the advent of the Output Delivery System (ODS) and SAS Enterprise
Guide™ (SAS/EG). As a way of demonstrating how SAS technologies have evolved
and improved over time, this paper starts with that original 1990
implementation and then implements the same methods taking advantage first of
ODS and then the data-analysis features built into SAS/EG.
SAS® Debugging 101
Kirk Paul Lafler
AD-38
SAS® users are almost always surprised to discover their programs contain
bugs. In fact, when asked users will emphatically stand by their programs and
logic by saying they are bug free. But, the vast number of experiences along
with the realities of writing code says otherwise. Bugs in software can appear
anywhere; whether accidentally built into the software by developers, or
introduced by programmers when writing code. No matter where the origins of
bugs occur, the one thing that all SAS users know is that debugging SAS program
errors and warnings can be a daunting, and humbling, task. This presentation
explores the world of SAS bugs, providing essential information about the types
of bugs, how bugs are created, the symptoms of bugs, and how to locate bugs. Attendees learn how to apply effective techniques to better understand,
identify, and repair bugs and enable program code to work as intended.
Top Ten SAS® Performance Tuning Techniques
Kirk Paul Lafler
AD-39
The Base-SAS® software provides users with many choices for accessing,
manipulating, analyzing, and processing data and results. Partly due to the
power offered by the SAS software and the size of data sources, many
application developers and end-users are in need of guidelines for more
efficient use. This presentation highlights my personal top ten list of
performance tuning techniques for SAS users to apply in their applications. Attendees learn DATA and PROC step language statements and options that can
help conserve CPU, I/O, data storage, and memory resources while accomplishing
tasks involving processing, sorting, grouping, joining (merging), and
summarizing data.
The Power of SAS® Macro Programming – One Example
Milorad Stojanovic
AD-114
When we are using SAS macro programming, macro tools and features can make our
life more difficult at the beginning and a lot easier at the end. We should
envision the work of macros in different combinations of input data and
relationships between variables. Also macro code should prevent the processing
data if ‘critical’ files are missing. In this paper we present examples of
creating macro variables, using one or more ampers (&), creating dynamic SAS
code, using %Sysfunc, %IF, %THEN, %ELSE and delivering flexible reports. Code
is data driven by using macro programming tools.
The New Tradition: SAS® Clinical Data Integration
Vincent Amoruccio
AD-138
Base SAS® programming has been around for a very long time. And over that
time, there have been many changes. New and enhanced procedures, new features,
new functions and even operating systems have been added. Over time, there have
been many windows and wizards that help to more easily generate code that can
be used in programs. Through it all, programmers always come back to their SAS
roots, the basic programs with which they started. But, as we move into the
future, is this the best use of time, to sit and manually code everything? Or
can we take advantage of the new tools and solutions that generate code and use
metadata to describe data, validate output and document exactly what the
programmer has done. This paper will show you how we can change the current
process using the graphical user interface of SAS Clinical Data Integration to
integrate data from disparate data sources and transform that data into
industry standards in a methodical, repeatable, more automated fashion.
Building Blocks
Hidden in Plain Sight: My Top Ten Underpublicized Enhancements in SAS ® Versions 9.2 and 9.3
Bruce Gilsen
BB-17
SAS ® Versions 9.2 and 9.3 contain many interesting enhancements. While the most significant enhancements have been widely publicized in online documentation, conference papers, the SAS-L internet newsgroup/listserv, and elsewhere, some smaller enhancements have received little attention. This paper reviews my ten favorite underpublicized features.
- Eliminate observations with a unique sort key (BY groups of size one)
- DATA step sort
- String delimiter on input
- String delimiter on output
- The PRINT procedure: printing blank lines
- Data set lists in the SET and MERGE statements
- Append SAS log files
- Simpler macro variable range
- Trim leading and trailing blanks from a value stored by the SQL procedure in a single macro variable
- Create a directory or folder in a LIBNAME statement
SAS®, Excel®, and JMP® Connectivity — HOW
Charlie Shipp and Kirk Paul Lafler
BB-123
Microsoft Excel is the most used software on planet Earth — the importance of connectivity with Excel is increasingly important to everyone. JMP is the best in the world for statistical graphics and data discovery; and SAS software is the gold standard for robust and reliable statistical analysis! Combine these three heavyweight software products with easy connectivity and you have a profound competitive edge. Depending on requirements, your (1) input, (2) discovery and analysis, and (3) final display and reporting can begin with any of the three and end with any of the three. We demonstrate the most likely paths that emphasizes SAS and JMP capabilities. You will leave the workshop appreciating the many possibilities to utilize Excel with SAS and JMP, including using the powerful Output Delivery System.
PROC SQL for PROC SUMMARY Stalwarts
Christianna Williams
BB-69
One of the endlessly fascinating features of SAS is that the software often provides multiple ways to accomplish the same task. A perfect example of this is the aggregation and summarization of data across multiple rows “BY groups” of interest.
These groupings can be study participants, time periods, geographical areas, or really just about any type of discrete classification that one desires. While many SAS programmers may be accustomed to accomplishing these aggregation tasks with PROC SUMMARY (or equivalently, PROC MEANS), PROC SQL can also do a bang-up job of aggregation – often with less code and fewer steps. The purpose of this step-by-step paper is to explain how to use PROC SQL for a variety of summarization and aggregation tasks, and will use a series of concrete, task-oriented examples to do so. For each example, both the PROC SUMMARY method and the PROC SQL method will be presented, along with discussion of pros and cons of each approach. Thus, the reader familiar with either technique can learn a new strategy that may have benefits in certain circumstances. The presentation style will be similar to that used in the author’s previous paper, “PROC SQL for DATA Step Die-Hards”.
FORMATs Top Ten
Christianna Williams
BB-70
SAS FORMATs can be used in so many different ways! Even the most basic FORMAT use of modifying the way a SAS data value is displayed (without changing the underlying data value) holds a variety of nifty tricks, such as nesting formats, formats that affect various style attributes (such as color, font, etc.), and conditional formatting. Add in PICTURE formats, multi-label FORMATs, using FORMATs for data cleaning, and FORMATs for joins and table look-ups, and we have quite a bag of tricks for the humble SAS FORMAT and the PROC FORMAT used to generate them. The purpose of this paper is to describe a few handfuls of very useful programming techniques that employ SAS FORMATs. While this paper will be appropriate for the newest SAS user, it will also focus on some of the lesser-known features of FORMATs and PROC FORMAT and so should be useful for even quite experienced users of SAS.
A Non-Standard Report Card - Informing Parents About What Their Children Know
Daniel Ralyea
BB-122
A non-traditional use of the power of SAS. A typical report card lists the subject and a letter or a number grade. It does not identify the skills that lead to that grade. SAS allows us to read the grades from PowerSchool's Oracle database, combine in test scores from an outside vendor and summarize multiple grades into more generalized standards. Using SAS ODS individual report cards are sorted by school and teacher and printed for each student.
A Quick View of SAS Views
Elizabeth Axelrod
BB-63
Looking for a handy technique to have in your toolkit? Consider SAS® Views, especially if you work with large datasets. After a brief introduction to Views, I’ll show you several cool ways to use them that will streamline your code and save workspace.
Combining Multiple Date-Ranged Historical Data Sets with Dissimilar Date Ranges into a Single Change History Data Set
Jim Moon
BB-66
This paper describes a method that uses some simple SAS® macros and SQL to merge data sets containing related data that contains rows with varying effective date ranges. The data sets are merged into a single data set that represents a serial list of snapshots of the merged data, as of a change in any of the effective dates. While simple conceptually, this type of merge is often problematic when the effective date ranges are not consecutive or consistent, or when the ranges overlap, or when there are missing ranges from one or more of the merged data sets. The technique described is used by the Fairfax County Human Resources Department to combine various employee data sets (Employee Name and Personal Data, Personnel Assignment and Job Classification, Personnel Actions, Position-Related data, Pay Plan and Grade, Work Schedule, Organizational Assignment, and so on) from the County's SAP-HCM ERP system into a single Employee Action History/Change Activity file for historical reporting purposes. The technique currently is used to combine nineteen data sets, but is easily expandable by inserting a few lines of code using the existing macros.
PROC TRANSPOSE® For Fun And Profit
John Cohen
BB-59
Occasionally we are called upon to transform data from one format into a “flipped,” sort of mirror image. Namely if the data were organized in rows and columns, we need to transpose these same data to be arranged instead in columns and rows. A perfectly reasonable view of incoming lab data, ATM transactions, or web “click” streams may look “wrong” to us. Alternatively extracts from external databases and production systems may need massaging prior to proceeding in SAS®. Finally, certain SAS procedures may require a precise data structure, there may be particular requirements for data visualization and graphing (such as date or time being organized horizontally/along the row rather than values in a date/time variable), or the end user/customer may have specific deliverable requirements.
Traditionalists prefer using the DATA step and combinations of Array, Retain, and Output statements. This approach works well but for simple applications may require more effort than is necessary. For folks who intend to do much of the project work in, say, MS/Excel®, the resident transpose option when pasting data is a handy short cut. However, if we want a simple, reliable method in SAS which once understood will require little on-going validation with each new run, then PROC TRANSPOSE is a worthy candidate. We will step through a series of examples, elucidating some of the internal logic of this procedure and its options. We will also touch on some of the issues which cause folks to shy away and rely on other approaches.
The Nuances of Combining Hospital Data
Jontae Sanders, Charlotte Baker and Perry Brown
BB-100
Hospital data can be used for the surveillance of various health conditions in a population. To maximize our ability to tell the story of a population's health, it is often necessary to combine multiple years of data. This step can be tedious as there are many factors to take into account such as changes in variable names or data formats between years. Once you have resolved these issues, the data can be successfully combined for analysis. This paper will demonstrate many factors to look for and how to handle them when combining data from hospitals.
Move over MERGE, SQL and SORT. There is a faster game in town! #Hash Table
Karen Price
BB-121
The purpose of this paper and presentation is to introduce the basics of what a hash table is and to illustrate practical applications of this powerful Base SAS® DATA Step construct. We will highlight features of the hash object and show examples of how these features can improve programmer productivity and system performance of table lookup and sort operations. We will show relatively simply code to perform typical “look-up” match-merge usage as well as a method of sorting data through hashing as an alternative to the SORT procedure.
Point-and-Click Programming Using SAS® Enterprise Guide®
Kirk Paul Lafler and Mira Shapiro
BB-34
SAS® Enterprise Guide® (EG) empowers organizations with all the capabilities that SAS has to offer. Programmers, business analysts, statisticians and end-users have a powerful graphical user interface (GUI) with built-in wizards to perform reporting and analytical tasks, access to multi-platform enterprise data sources, deliver data and results to a variety of mediums and outlets, construct data manipulations without the need to learn complex coding constructs, and support data management and documentation requirements. Attendees learn how to use the GUI to access tab-delimited and Excel input files; subset and summarize data; join two or more tables together; flexibly export results to HTML, PDF and Excel; and visually manage projects using flowcharts and diagrams.
Formatting Data with Metadata – Easy and Powerful
Leanne Tang
BB-113
In our organization a lot of efforts are put into building and maintaining our organizational level metadata databases. Metadata databases are used to store the information, or metadata, about our data. Many times we have to “decipher” the keys and codes associate with our data so that they can be presented to our data users for data analysis. One of options to interpret our data is to generate user defined formats from our metadata using Proc Format. One advantage of using formats is that we do not have to create a new SAS dataset for data lookup. The second advantage is that the formats generated can be used in any programs in need of data interpretation. The best advantage is that, without maintaining the metadata myself, I can generate the formats with the most up-to-date information available in the metadata database with a simple proc format execution. In this paper we are going to explore some of the powerful options available in proc format procedure and how we apply the formats generated from our metadata to our data.
SAS® Macro Magic is not Cheesy Magic! Let the SAS Macros Do the Magic of Rewriting Your SAS Code
Robert Williams
BB-105
Many times, we need to rewrite weekly or monthly SAS programs to change certain key statements such as the conditions inside the WHERE statements, import/export file paths, reporting dates and SAS data set names. If we hard coded these statements, it is cumbersome and a chore to read through the SAS code to rewrite these hard-coded statements. Sometimes, we might miss an important statement that needs to be re-coded resulting in inaccurate data extract and reports. This paper will show how the SAS Magic Macros can streamline and eliminate the process rewriting the SAS codes. Two types of SAS Macros will be reviewed with examples:
- Defining SAS Macro variables with values using %LET to be rewritten in the SAS statement using & sign.
- Creating SAS Macro programming using %MACRO and %MEND to write a series of SAS statements to be rewritten in the SAS code.
You will be amazed how useful the SAS Magic Macro is for many of your routine weekly and monthly reports. Let the SAS Magic Macro relieve you of the tedious task of rewriting many of the SAS statements!
Flat Pack Data: Converting and ZIPping SAS® Data for Delivery
Sarah Woodruff
BB-25
Clients or collaborators often need SAS data converted to a different format. Delivery or even storage of individual data sets can become cumbersome, especially as the number of records or variables grows. The process of converting SAS data sets into other forms of data and saving files into compressed ZIP storage has become not only more efficient, but easier to integrate into new or existing programs. This paper describes and explores various methods to convert SAS data as well as effective strategies to ZIP data sets along with any other files that might need to accompany them.
PROC IMPORT and PROC EXPORT have been long standing components of the SAS toolbox, so much so that they have their own wizards, but understanding their syntax is important to effectively use them in code being run in batch or to include them in programs that may be run interactively but “hands free”. The syntax of each is described with a particular focus on moving between SAS, STATA and SPSS, though some attention is also given to Excel. Once data sets and their attendant files are ready for delivery or need to be put into storage, compressing them into ZIP files becomes helpful. The process of using ODS PACKAGE to create such ZIP files is laid out and can be connected programmatically to the creation of the data sets or documents in the first place.
The Power of PROC APPEND
Ted Logothetti
BB-33
PROC APPEND is the fastest way to concatenate SAS® data sets. This paper discusses some of the features of this procedure, including how much it lessens processing time, some tips and tricks, and a correction to the online SAS® documentation. It also lists some limitations of the procedure.
Coder's Corner
Creating a Hyperbolic Graph Using the SAS® Annotate Facility
Bill Bland and Liza Thompson
CC-129
In order to optimize their rate design, electric utilities analyze their customers’ bills and costs for electricity by looking at the each hour’s use of demand. These graphs of kWh energy usage versus hours of use are produced on a monthly basis (from 0 to 730 hours). The resulting graphs are typically curvilinear. To allow for an easier rate and cost comparison, we wanted the ability to plot hyperbolic hours, as well as cost curves, on the same graph. To do this, we applied a hyperbolic transformation to the hours use axis. This linearized the graph and made it easier to interpret. For our analysis, it is necessary to graph cost and price versus the hyperbolic axis, but at the same time show the original hours use axis. Proc GPLOT does not allow multiple X axes. Therefore, we solve the problem using the annotate facility. In this presentation, we will show you a step-by-step example of how we changed cost graphs for easier analysis and explain the code we used.
Debugging SAS ® code in a macro
Bruce Gilsen
CC-19
Debugging SAS ® code contained in a macro can be frustrating because the SAS error messages refer only to the line in the SAS log where the macro was invoked. This can make it difficult to pinpoint the problem when the macro contains a large amount of SAS code.
Using a macro that contains one small DATA step, this paper shows how to use the MPRINT and MFILE options along with the fileref MPRINT to write just the SAS code generated by a macro to a file. The ""de-macroified"" SAS code can be easily executed and debugged.
Using PROC FCOMP to Do Fuzzy Name and Address Matching
Christy Warner
CC-119
This paper discussions how to utilize PROC FCOMP to create your own fuzzy-matching functions. Name and address matching are a common task performed among SAS programmers, and this presentation will provide some code and guidance on how to handle tough name-matching exercises. The code shared in this presentation has been utilized to cross-check with the List of Excluded Individuals and Entities (LEIE) file, maintained by the Department of Health and Human Services' OIG. It has also been incorporated to match against the Specially Designated Nationals (SDN) Terrorist Watch List and in name-matching for Customs and Border Protection (CBP). * Christy Warner is a Senior Associate with Integrity Management Services, LLC (IMS) and has 22 years of SAS programming experience, as well as a degree in Math and a minor in Statistics. She has served as the Deputy Project Director of a Medicaid Integrity Contract (MIC) Audit, and has spent the last 14 years developing algorithms to identify healthcare fraud, waste, and abuse.
Manage Variable Lists for Improved Readability and Modifiability
David Abbott
CC-76
Lists of variables occur frequently in statistical analysis code, for example, lists of explanatory variables, variables used as rows in demographics tables, and so forth. These lists may be long, say 10-30 or more variables and the same list, or a major portion of it, may occur in multiple places in the code. The lists are often replicated using cut and paste by the programmer during program composition. Readers of the code may find themselves doing repeated “stare and compare” to determine if the list in location A is really the same list as in location B or location C. Simply adding a variable to the list may require changing numerous lines of code since the list occurs in the code numerous times. If managed naively, variable lists can impair code readability and modifiability.
The SAS macro facility provides the tools needed to eliminate repeated entry of lengthy variable lists. Related groups of variables can be assigned to macro variables and the macro variables concatenated as needed to generate the list of variables needed at different points in the code. Certain SAS macros can be used to programmatically alter the list, for example, remove specific variables from the list (not needed for a given regression) or change the delimiter character to comma (when the list is used with PROC SQL). The macro variable names can express the purpose of the groups of variables, e.g. ExplanVars, OutcomeVars, DemographicOnlyVars, etc.. Employing this approach makes data analysis code easier to read and modify.
Integrating Data and Variables between SAS® and R via PROC IML: Enable Powerful Functionalities for Coding in SAS®
Feng Liu
CC-106
Programming in R provides additional features and functions which augments SAS procedures for many statisticians or scientists in areas like bioinformatics, finance, education etc. It is a high demanding feature for SAS users to be able to call R within their SAS programs. However, existing papers show how to import R data set into SAS or vice verse, it lacks a comprehensive solution to transfer more formats of variables other than data set. In this paper, we present solutions to use PROC IML to interfaces R software which enables transferring variables and data set transparently between SAS and R. We also provide examples of calling R functions directly in SAS which offers much flexibility for coding in SAS, especially for big projects involving intensive coding. This can also be used to pass parameters from SAS to R. In this paper, you will see a step-by-step demonstration of a SAS project which integrates calling R functions via PROC IML.
How to Build a Data Dictionary – In One Easy Lesson
Gary Schlegelmilch
CC-32
In the wonderful world of programming, the Child Left Behind is usually documentation. The requirements may be thoroughly analyzed (usually on a combination of phoned-in notes, e-mails, draft documents, and the occasional
cocktail napkin). Design is often on the fly, due to various restraints, deadlines, and in-process modifications. And doing documentation after the fact, once the program is running, is, well, a great idea – but it often doesn’t happen.
Some software tools allow you to build flow diagrams and descriptions from existing code and/or comments embedded in the program. But in a recent situation, there was a lament that a system that had been running in the field for quite a while had no Data Dictionary – and one would be really handy for data standardization and data flow. SAS to the rescue!
Hands Free: Automating Variable Name Re-Naming Prior to Export
John Cohen
CC-61
Often production datasets come to us with data in the form of rolling 52 weeks, 12 or 24 months, or the like. For ease of use, the variable names may be generic (something like VAR01, VAR02, etc., through VAR52 or VAR01 through VAR12), with the actual dates corresponding to each column being maintained in some other fashion – often in the variable labels, a dataset label, or some other construct. Not having to re-write your program each week or month to properly use these data is a huge benefit.
Until, however, you may need to capture the date information to properly document – in the variable names (so far VAR01, VAR02, etc.) – prior to, say, exporting to MS/Excel® (where the new column names may instead need to be JAN2011, FEB2011, etc.). If the task of creating the correct corresponding variable names/column names each week or month were a manual one, the toll on efficiency and accuracy could be substantial.
As an alternative we will use an approach using a “program-to-write-a-program” to capture date information in the incoming SAS® dataset (from two likely alternate sources) and have our program complete the rest of the task seamlessly, week-after-week (or month-after-month). By employing this approach we can continue to use incoming data with generic variable names.
Simple Rules to Remember When Working with Indexes
Kirk Paul Lafler
CC-37
SAS® users are always interested in learning techniques related to improving data access. One way of improving information retrieval is by defining an index consisting of one or more columns that are used to uniquely identify each row within a table. Functioning as a SAS object, an index can be defined as numeric, character, or a combination of both. This presentation emphasizes the rules associated with creating effective indexes and using indexes to make information retrieval more efficient.
Interacting with SAS using Windows PowerShell ISE
Mayank Nautiyal
CC-135
The most conventional method of using SAS on a Windows environment is via a GUI application. There are numerous SAS users who have a UNIX background and can definitely take advantage of the Windows PowerShell ISE to gain job eficiency. The Windows PowerShell Integrated Scripting Environment (ISE) is a host application for Windows PowerShell. One can run commands, write, test, and debug scripts in a single Windows-based graphic user interface with multiline editing. This paper will demonstrate how frequently used SAS procedures can be scripted and submitted at the PowerShell command prompt. Job scheduling and submission for batch processing will also be illustrated.
Let SAS® Do the Coding for You
Robert Williams
CC-104
Many times, we need to create the same reports going to different groups based on the group’s subset of queried data or we have to develop many repetitive SAS codes such as a series of IF THEN ELSE statements or a long list of different conditions in a WHERE statement. It is cumbersome and a chore to manually write and change these statements especially if the reporting requirements change frequently. This paper will suggest methods to streamline and eliminate the process of writing and copying/pasting your SAS code to be modified for each requirement change. Two techniques will be reviewed along with a listing of key words in a SAS dataset or an Excel® file:
- Create code using the DATA _NULL_ and PUT statements to an external SAS code file to be executed with %INCLUDE statement.
- Create code using the DATA _NULL_ and CALL SYMPUT to write SAS codes to a macro variable.
You will be amazed how useful this process is for hundreds of routine reports especially on a weekly or monthly basis. RoboCoding is not just limited to reports; this technique can be expanded to include other procedures and data steps. Let the RoboCoder do the repetitive SAS coding work for you!
@/@@ This Text file. Importing Non-Standard Text Files using @,@@ and / Operators
Russsell Woods
CC-45
SAS in recent years has done a fantastic job at releasing newer and more powerful tools to the analyst and developer tool boxes, but these tools are only as effective as the data that is available to them. Most of you would have no trouble dealing with a simple delimited or column formatted text file. However, data can often be provided in a non-standard format that must be parsed before analysis can be performed or the final results can have very specific non-standard formatting rules that must be adhered to when delivering. In these cases we can use some simple SAS operators, ‘@’, ‘@@’ and ‘/’ in conjunction with conditional statements to interact with any flat file that has some kind of distinguishable pattern. In this presentation I will demonstrate the step-by-step process I used to analyze several non-standard text files and create import specifications capable of importing them into SAS. It is my hope that once you master these techniques it will not matter if you are preparing an audit report for the federal government or a side-effects analysis for the FDA you will easily be able to accommodate any specifications they may have.
VBScript Driven Automation in SAS®: A Macro to Update the Text in a Microsoft® Word Document Template at Preset Bookmarks
Shayala Gibbs
CC-101
SAS® can harness the power of Visual Basic Scripting Edition (VBScript) to programmatically update Microsoft Office® documents. This paper presents a macro developed in SAS to automate updates to a Microsoft Word document. The useful macro invokes VBScript to pass text directly from a SAS data set into a predefined bookmark in an existing template Word document.
Searching for (and Finding) a Needle in a Haystack: A Base Macro-Based SAS Search Tool to Facilitate Text Mining and Content Analysis through the Production of Color-Coded HTML Reports
Troy Hughes
CC-94
Text mining describes the discovery and understanding of unstructured, semi-structured, or structured textual data. While SAS® Text Miner presents a comprehensive solution to text mining and content analysis, simpler business
questions may warrant a more straightforward solution. When first triaging a new data set or database of unknown content, a profile and categorization of the data may be a substantial undertaking. An initial analytic question may include a request to determine if a word or phrase is found “somewhere” within the data, with what frequency, and in what fields and data sets. This text describes an automated text parsing Base SAS tool that iteratively parses SAS libraries and data sets in search of a single word, phrase, or a list of words or phrases. Results are saved to an HTML file that displays the frequency, location, and search criteria highlighted in context.
Hands On Workshop
SAS Enterprise Guide for Institutional Research and Other Data Scientists
Claudia McCann
How-82
Data requests can range from on-the-fly, need it yesterday, to extended projects taking several weeks or months to complete. Often institutional researchers and other data scientists are juggling several of these analytic needs on a daily basis, i.e., taking a break from the longitudinal report on retention and graduation to work on responding to a USN&WR survey to answering the simple 5 minute data query question from an administrator. SAS Enterprise Guide is a terrific tool for handling multiple projects simultaneously. This Hands On Workshop is designed to walk the data analyst through the process of setting up a project, accessing data from several sources, merging the datasets, and running the analyses to generate the data needed for the particular project. Specific tasks covered are pulling SAS datasets and Excel files into the project, exploring several facets of the ever-so-powerful Query Builder, and utilizing several quick and easy descriptive statistical techniques in order to get the desired results.
A Tutorial on the SAS® Macro Language
John Cohen
How-60
The SAS Macro language is another language that rests on top of regular SAS code. If used properly, it can make programming easier and more fun. However, not every program is improved by using macros. Furthermore, it is another language syntax to learn, and can create problems in debugging programs that are even more entertaining than those offered by regular SAS.
We will discuss using macros as code generators, saving repetitive and tedious effort, for passing parameters through a program to avoid hard coding values, and to pass code fragments, thereby making certain tasks easier than using regular SAS alone. Macros facilitate conditional execution and can be used to create program modules that can be standardized and re-used throughout your
organization. Finally, macros can help us create interactive systems in the absence of SAS/AF® or SAS/Intrnet®.
When we are done, you will know the difference between a macro, a macro variable, a macro statement, and a macro function. We will introduce interaction between macros and regular SAS language, offer tips on debugging macros, and discuss SAS macro options.
Store and Recall Macros with SAS Macro Libraries
John Myers
How-71
When you store your macros in a SAS macro library, you can recall the macros with your SAS programs and share them with other SAS programmers. SAS macro libraries help you by reducing the time it takes to develop new programs by using code that has been previously tested and verified. Macro libraries help you to organize your work by saving sections of code that you can reuse in other programs. Macro libraries improve your macro writing skills by focusing on a specific task for each macro. Macro libraries are not complicated – they are just a way to store macros in a central location. This presentation will give examples of how you can build macro libraries using %INCLUDE files, AUTOCALL library, and STORED COMPILED MACRO library.
Application Development Techniques Using PROC SQL
Kirk Paul Lafler
How-35
Structured Query Language (SQL) is a database language found in the base-SAS software. It permits access to data stored in data sets or tables using an assortment of statements, clauses, options, functions, and other language constructs. This Hands On Workshop illustrates core concepts as well as SQL’s many applications, and is intended for SAS users who desire an overview of this exciting procedure’s capabilities. Attendees learn how to construct SQL queries, create complex queries including inner and outer joins, apply conditional logic with case expressions, create and use views, and construct simple and composite indexes.
The DoW-Loop
Paul Dorfman and Lessia Shajenko
How-115
The DoW-loop is a nested, repetitive DATA step structure enabling you to isolate instructions related to a certain break event before, after, and during a DO-loop cycle in a naturally logical manner. Readily recognizable in its most ubiquitous form by the DO UNTIL(LAST.ID) construct, which readily lends itself to control-break processing of BY-group data, the DoW-loop's nature is more morphologically diverse and generic. In this workshop, the DoW-loop's logic is examined via the power of example to reveal its aesthetic beauty and pragmatic utility. In some industries like Pharma, where flagging BY-group observations based on in-group conditions is standard fare, the DoW-loop is an ideal vehicle greatly simplifying the alignment of business logic and SAS code. In this Hands On Workshop, the attendees will have an opportunity to investigate the program control of the DoW-loop step by step using the SAS DATA step debugger and learn of a range of nifty practical applications of the DoW-loop.
Reliably Robust: Best Practices for Automating Quality Assurance and Quality Control Methods into Software Design
Troy Hughes
How-95
An often objective of SAS development is the generation of autonomous, automated processes that can be scheduled for recurring execution or confidently run with push-button simplicity. While the adoption of SAS software development best practices most confidently predicts programmatic success, robust applications nevertheless require a quality management strategy that
incorporates both quality assurance (QA) and quality control (QC) methods. To the extent possible, these methods both ensure and demonstrate process success and product validation while minimizing the occurrence and impact of environmental and other exceptions that can cause process failure. QA methods include event handling that drives program control under normal functioning, exception handling (e.g., error trapping) that identifies process failure and may initiate a remedy or graceful termination, and post hoc analysis of program logs and performance metrics. QC methods conversely identify deficits in product (e.g. data set) availability, validity, completeness, and accuracy, and are implemented on input and output data sets as well as reports and other output. QC methods can include data constraints, data structure validation, statistical testing for outliers and aberrant data, and comparison of transactional data sets against established norms and historical data stores. The culmination of any quality management strategy prescribes the timely communication of failures (or successes) to stakeholders through alerts, report generation, or a real-time dashboard. This text describes the advantages and best practices of incorporating a comprehensive quality management strategy into SAS development, as well as the more challenging transformation of error-prone legacy code into a robust, reliable application.
Pharma & Healthcare
Time Series Mapping with SAS®: Visualizing Geographic Change over Time in the Health Insurance Industry
Barbara Okerson
PH-22
Changes in health insurance and other industries often have a spatial
component. Maps can be used to convey this type of information to the user more
quickly than tabular reports and other non-graphical formats. SAS® provides
programmers and analysts with the tools to not only create professional and
colorful maps, but also the ability to display spatial data on these maps in a
meaningful manner that aids in the understanding of the changes that have
transpired. This paper illustrates the creation of a number of different maps
for displaying change over time with examples from the health insurance arena.
You've used FREQ, but have you used SURVEYFREQ?
Charlotte Baker
PH-86
PROC FREQ is a well utilized procedure for descriptive statistics. If the data
being analyzed is from a complex survey sample, it is best to use the PROC
SURVEYFREQ procedure instead. Other than the SAS documentation on PROC
SURVEYFREQ, few user examples exist for how to perform analyses using this
procedure. This paper will demonstrate why PROC SURVEYFREQ should be used and
how to implement it in your research.
A Comprehensive Automated Data Management System for Clinical Trials
Heather Eng, Jason Lyons and Theresa Sax
PH-125
A successful data coordinating center for multicenter clinical trials and
registries must provide timely, individualized, and frequent feedback to
investigators and study coordinators over the course of data collection.
Investigators require up-to-date reports to help them monitor subject accrual
and retention, randomization balance, and patient safety markers. Study
coordinators need to know what data are expected or are delinquent, and they
need to know about errors to be corrected, such as missing data or data that
don’t pass validity and logical consistency tests. Frequent monitoring can
reveal systemic issues in procedures that require remedial adjustments to keep
the project from being at risk.
Data managers at the University of Pittsburgh’s Epidemiology Data Center in
the Graduate School of Public Health have developed an integrated system to
collect and import data from multiple and disparate sources into a central
relational database, subject it to comprehensive quality control procedures,
create reports accessible on the web, and email individualized reports to
investigators and study coordinators, all on an automated and scheduled basis.
Post-hoc routines monitor execution logs, so that unexpected errors and
warnings are automatically emailed to data managers for same-day review and
resolution.
The system is developed almost exclusively using SAS® software. While SAS®
is best known among clinical trialists as statistical software, its strength as
a data management tool should not be overlooked. With its strong and flexible
programming capabilities for data manipulation, reporting and graphics, web
interfacing, and emailing, it provides the necessary infrastructure to serve as
a single platform for the management of data collected in clinical trials and
registries.
This paper will describe the modules of the system and their component programs
as they were developed for the Computer-Based Cognitive-Behavioral Therapy
(CCBT) Trial currently underway at the University of Louisville and the
University of Pennsylvania, with data coordination at the University of
Pittsburgh.
Using SAS® to Analyze the Impact of the Affordable Care Act
John Cohen and Meenal (Mona) Sinha
PH-28
The Affordable Care Act being implemented in 2014 is expected to fundamentally
reshape the health care industry. All current participants--providers,
subscribers, and payers--will operate differently under a new set of key
performance indicators (KPIs). This paper uses public data and SAS® software
to illustrate an approach to creating a baseline for the health care industry
today so that structural changes can be measured in the future to assess the
impact of the new law.
Using SAS/STAT to Implement A Multivariate Adaptive Outlier Detection Approach to Distinguish Outliers From Extreme Values
Paulo Macedo
PH-89
A standard definition of outlier states that “an outlier is an observation
that deviates so much from other observations as to arouse the suspicion that
it was generated by a different mechanism” (Hawkins, 1980). To identify
outliers in the data a classic multivariate outlier detection approach
implements the Robust Mahalanobis Distance Method by splitting the distribution
of distance values in two subsets (within-the-norm and out-of-the-norm): the
threshold value is usually set to the 97.5% Quantile of the Chi-Square
distribution with p (number of variables) degrees of freedom and items whose
distance values are beyond it are labeled out-of-the-norm. This threshold value
is an arbitrary number, though, and it may flag as out-of-the-norm a number of
items that are indeed extreme values of the baseline distribution rather than
outliers coming from a “contaminating” distribution. Therefore, it is
desirable to identify an additional threshold, a cutoff point that divides the
set of out-of-norm points in two subsets - extreme values and outliers.
One way around the issue, in particular for large databases, is to increase the
threshold value to another arbitrary number but this approach requires taking
into consideration the size of the dataset as that size is expected to affect
the threshold separating outliers from extreme values. As an alternative, a
2003 article by D. Gervini (Journal of Multivariate Statistics) proposes “an
adaptive threshold that increases with the number of items N if the data is
clean but it remains bounded if there are outliers in the data”.
This paper implements Gervini’s adaptive threshold value estimator using PROC
ROBUSTREG and the SAS Chi-Square functions CINV and PROBCHI, available in the
SAS/STAT environment. It also provides data simulations to illustrate the
reliability and the flexibility of the method in distinguishing true outliers
from extreme values.
Impact of Affordable Care Act on Pharmaceutical and Biotech Industry
Salil Parab
PH-80
On March 23, 2010, President Obama signed The Patient Protection and Affordable
Care Act (PPACA), commonly called the Affordable Care Act (ACA). The law has
made historic changes to the health care system in terms of coverage, cost, and
quality of care. This paper will discuss the impact the law will have on the
pharmaceutical and biotech industry.
ACA imposes several costs on pharmaceutical, biotech and related
businesses. A fee will be imposed on each covered entity that manufactures or
imports branded prescription drugs with sales of over $5 million to specified
government programs. In addition to the fees on branded prescription drug
sales, ACA imposes a 2.3% excise tax on sale of medical devices. The tax is
levied on the manufacturer or importer before a medical device is sent to the
wholesaler or hospital. Prior to ACA, prescription drug manufacturer had to pay
a rebate under Medicaid coverage which was greater of 15.1% of average
manufacturer price (“AMP”) or the difference between AMP and best price of
the drug. ACA increased the rebate percentage to 23.1%. It also modified the
definition of AMP, calculation of additional rebate for price increase of line
of extension drugs, and expanded rebate program to additional drug sales. Under
ACA, manufacturers of drugs that wish to sell their drugs covered under
Medicare Part D must participate in coverage gap discount.
Along with additional costs on the industry, ACA will bring positive
changes. ACA is anticipated to add 35 million uninsured citizens as new
customers who will directly impact the industry’s bottom line by $115 billion
over the next 10 years or so. Under The Qualifying Therapeutic Discovery
Project program as part of ACA, a tax credit will be given to companies that
treat unmet medical needs or chronic diseases. This will significantly boost
innovation, particularly for small to mid-size enterprises and benefit the
overall industry. The Biologics Price Competition and Innovation Act (BPCIA),
which is part of ACA includes guidelines for market approval of
“biosimilar” products, patent provisions, data and market exclusivity, and
incentives for innovation.
Evaluating and Mapping Stroke Hospitalization Costs in Florida
Shamarial Roberson and Charlotte Baker
PH-108
Stroke is the fourth leading cause of death and the leading cause of disability
in Florida. Hospitalization charges related to stroke events have increased
over the past ten years even while the number of hospitalizations have remained
steady. North Florida lies in the Stroke Belt, the region of the United States
with the highest stroke morbidity and mortality. This paper will demonstrate
the use of SAS to evaluate the influence of socio-economic status, sex, and
race on total hospitalization charges by payer type in North Florida using data
from the State of Florida Agency for Health Care Administration and the Florida
Department of Health Office of Vital Statistics.
SDTM What? ADaM Who? A Programmer's Introduction to CDISC
Venita DePuy
PH-90
Most programmers in the pharmaceutical industry have at least heard of CDISC,
but may not be familiar with the overall data structure, naming conventions,
and variable requirements for SDTM and ADaM datasets. This overview will
provide a general introduction to CDISC from a programing standpoint, including
the creation of the standard SDTM domains and supplemental datasets, and
subsequent creation of ADaM datasets. Time permitting, we will also discuss
when it might be preferable to do a “CDISC-like” dataset instead of a
dataset that fully conforms to CDISC standards.
Planning, Support, and Administration
Configurable SAS® Framework for managing SAS® OLAP Cube based Reporting System
Ahmed Al-Attar and Shadana Myers
PA-42
This paper illustrates a high-level infrastructure discussion with some
explanation of the SAS codes that are used to implement a configurable batch
framework for managing and updating SAS® OLAP Cubes. The framework contains
collection of reusable parameter driven SAS Base macros, SAS Base custom
programs and UNIX/LINUX shell scripts.
This collection manages typical steps and processes used for manipulating SAS
files and executing SAS statements.
The SAS Base macro collections contains a group of Utility Macros that includes
- Concurrent /Parallel Processing Utility Macros
- SAS Metadata Repository Utility Macros
- SPDE Tables Utility Macros
- Table Lookup Utility Macros
- Table Manipulation Utility Macros
- Other Utility Macros
and a group of OLAP related Macros, that includes
- OLAP Utility Macros
- OLAP Permission Table Processing Macros
Case Studies in Preparing Hadoop Big Data for Analytics
Doug Liming
PA-143
Before you can analyze your big data, you need to prepare the data for
analysis. This paper discusses capabilities and techniques for using the power
of SAS® to prepare big data for analytics. It focuses on how a SAS user can
write code that will run in a Hadoop cluster and take advantage of the massive
parallel processing power of Hadoop.
SAS Metadata Querying and Reporting Made Easy: Using SAS Autocall Macros
Jiangtang Hu
PA-41
Metadata is the core of the modern SAS system (aka, SAS Business Analysis
Platform) and SAS offers various techniques to access it via SAS data step
functions, procedures, libname engines and Java interface. Furthermore, SAS
also provides bunch of autocall macros which were well packaged for metadata
querying and reporting using techniques above.
In this paper, I will go through such metadata autocall macros to get quick
results against SAS metadata like users, libraries, datasets, jobs and most
important, permissions. For best display of SAS metedata, SAS ODS Reporting
Writing Interface technique is also used for this demo(again, it's not
new and it's in SAS system folder of sample codes which are omitted by most SAS
programmers). All demo codes (can be submitted through SAS Display Manager, SAS
Enterprise Guide and SAS Data Integration Studio) can be found in Github,
https://github.com/Jiangtang/SESUG.
Metadata browsing configurations are also supplied for users of SAS Display
Manager, SAS Enterprise Guide and SAS Data Integration Studio respectively.
A Review of "Free" Massive Open Online Content (MOOC) for SAS Learners
Kirk Paul Lafler
PA-54
Leading online providers are now offering SAS users with “free” access to
content for learning how to use and program in SAS. This content is available
to anyone in the form of massive open online content (or courses) (MOOC). Not
only is all the content offered for “free”, but it is designed with the
distance learner in mind, empowering users to learn using a flexible and
self-directed approach. As noted on Wikipedia.org, “A MOOC is an online
course or content aimed at unlimited participation and made available in an
open access forum using the web.” This presentation illustrates how anyone
can access a wealth of learning technologies including comprehensive student
notes, instructor lesson plans, hands-on exercises, PowerPoints, audio,
webinars, and videos.
Google® Search Tips and Techniques for SAS® and JMP® Users
Kirk Paul Lafler and Charlie Shipp
PA-40
Google (www.google.com) is the worlds most popular and widely-used search
engine. As the premier search tool on the Internet today, SAS® and JMP®
users frequently need to identify and locate SAS and JMP content wherever and
in whatever form it resides. This paper provides insights into how Google works
and illustrates numerous search tips and techniques for finding articles of
interest, reference works, information tools, directories, PDFs, images,
current news stories, user groups, and more to get search results quickly and
easily.
Stretching Data Training Methods: A Case Study in Expanding SDTM Skills
Richard Addy
PA-48
With CDISC moving towards becoming an explicit standard for new drug
submissions, it is important to expand the number of people who can implement
those standards efficiently and proficiently. However, the CDISC models are
complex, and increasing expertise with them across a large group of people is a
non-trivial task.
This paper describes a case study focusing on increasing the number of people
responsible for creating the SDTM portion of the submissions package (data set
specifications, annotated CRF, metadata, and define file). In moving these
tasks from a small dedicated group who handled all submission-related
activities to a larger pool of programmers, we encountered several challenges:
ensuring quality and compliance across studies; developing necessary skills
(often, non-programmatic skills); and managing a steep learning curve (even for
programmers with previous SDTM experience).
We developed several strategies to address these concerns, including developing
training focused on familiarizing new folks with where they need to go to look
for details , a mentor system to help prevent people from getting stuck,
focusing extra attention on domains that consistently caused problems, and
creating flexible and robust internal tools to assist in the creation of the
submission.
Managing and Measuring the Value of Big Data and Analytics Focused Projects
Rob Phelps
PA-31
Big data and analytic focused projects have undetermined scope and changing
requirements at their core. There is high risk of loss of business value if
the project is managed with an IT centric waterfall approach and classical
project management methods. Simply deploying technology on time, to plan, and
within budget does not produce business value for Big data projects. A
different approach in managing projects and stakeholders are required to
execute and deliver business value for big data and analytically focused
initiatives.
Introduction:
Projects that are designed to drive better decision in organization can deploy
technology on time, to plan, and within budget and completely fail to deliver
business value. In the race to extract insights from the massive amounts of
data now available, many companies are spending heavily on IT tools and hiring
data scientists. Most are struggling to achieve a worthwhile return. Big data
and analytically focused projects that are treated the same way as IT projects
for the most part fail to demonstrate the hoped for business value. This paper
will discuss why Big Data and Analytic project must be treated differently to
achieve business changing outcomes.
Discovery driven Projects:
To obtain value from analysis projects focus must be on solving business
problems rather than managing risk of deploying technology. The desire to move
to more scientific based management practices using analysis of large and
disparate data results in the possibility of change for business processes and
the way information is used. This is in contrast to simply optimizing
technical processes which is a historical IT strong suit. Organization
learning and organizational change are the outcomes to show value from analysis
projects. Standard project management tool and measures are not sufficient to
track the delivery or insure the value of analytical efforts. Tools focused on
mission and vision driven measures are well suited and can be tied to show
applicability to business needs. These include concepts from formal program
evaluation and the creation of logic models which are methods for framing
change measures.
Calculating the Most Expensive Printing Jobs
Roger Goodwin, PMP
PA-51
As the SAS manual states, the macro facility is a tool for extending and
customizing SAS and for reducing the amount of text that the programmer must
enter to do common tasks. Programmers use SAS macros for mundane, repetitive
tasks. In this application, we present a SAS macro that calculates the top
twenty most expensive printing jobs for each Federal agency.
Given the request for the top twenty most expensive printing jobs, it became
apparent that this request would become repetitive [H. Paulson 2008]. The US
Government Printing Office anticipated an increase for financial summaries from
government agencies and developed the following SAS macro specifically for
Treasury's request. GPO has contracts with most Federal government agencies to
procure print. GPO can produce a report for the top twenty most expensive
printing jobs for each Federal agency. This, of course, assumes the agency does
business with GPO.
Securing SAS OLAP Cubes with Authorization Permissions and Member-Level Security
Stephen Overton
PA-62
SAS OLAP technology is used to organize and present summarized data for
business intelligence applications. It features flexible options for creating
and storing aggregations to improve performance and brings a powerful
multi-dimensional approach to querying data. This paper focuses on managing
security features available to OLAP cubes through the combination of SAS
metadata and MDX logic.
Debugging and Tuning SAS Stored Processes
Tricia Aanderud
PA-116
You don't have to be with the CIA to discover why your SAS® stored process is
producing clandestine results. In this talk, you will learn how to use prompts
to get the results you want, work with the metadata to ensure correct results,
and even pick up simple coding tricks to improve performance. You will walk
away with a new decoder ring that allows you to discover the secrets of the SAS
logs!
Teaching SAS Using SAS OnDemand Web Editor and Enterprise Guide
Charlotte Baker and Perry Brown
PA-85
The server based SAS OnDemand offerings are excellent tools for teaching SAS
coding to graduate students. SAS OnDemand Enterprise Guide and SAS OnDemand Web
Editor can be used to accomplish similar educational objectives but the
resources required to use each program can be different. This paper will
discuss why one might use a SAS OnDemand program for education and the pros and
cons of using each program for instruction.
Posters
Overview of Analysis of Covariance (ANCOVA) Using GLM in SAS
Abbas Tavakoli
PO-15
Analysis of covariance (ANCOVA) is a more sophisticated method of analysis of
variance. Analysis of covariance is used to compare response means among two or
more groups (Categorical variables) adjusted for a quantitative variable
(Covariate), thought to influence the outcome (Dependent). A covariate is a
continuous variable that can be used to reduce the Sum Square Error (SSE) and
subsequently increase the statistical power of an ANOVA design. There may be
more than one covariate. The purpose of this paper is to overview of Analysis
of Covariance (ANCOVA) using GLM with two examples in SAS with interpretation
to use for publication.
Trash to Treasures: Salvaging Variables of Extremely Low Coverage for Modeling
Alec Zhixiao Lin
PO-88
Variables with an extremely low occurrences either exhibit very low Information
Values in scorecard development or fail to be selected by a regression model,
and hence are usually discarded at the stage of data cleaning. However, some of
these variables could contain valuable information and are worth retaining. We
can aggregate different rare occurrences into a single predictor which can be
used in a subsequent regression or analysis. This paper introduces a SAS macro
that tries to discover and salvage these variables in hope of turning them into
potentially useful predictors.
Design of Experiments (DOE) Using JMP®
Charlie Shipp
PO-47
JMP has provided some of the best design of experiment software for years. The
JMP team continues the tradition of providing state-of-the-art DOE support. In
addition to the full range of classical and modern design of experiment
approaches, JMP provides a template for Custom Design for specific
requirements. The other choices include: Screening Design; Response Surface
Design; Choice Design; Accelerated Life Test Design; Nonlinear Design; Space
Filling Design; Full Factorial Design; Taguchi Arrays; Mixture Design; and
Augmented Design. Further, sample size and power plots are available.
We show an interactive tool for n-Factor Optimization in a single plot.
Analysis of Zero Inflated Longitudinal Data Using PROC NLMIXED
Delia Voronca and Mulugeta Gebregziabher
PO-147
Background: Commonly used parametric models may lead to erroneous inference
when analyzing count or continuous data with excess of zeroes. For
non-clustered data, the most common models used to address the issue for count
outcomes are zero inflated Poisson (ZIP), zero inflated negative binomial
(ZINB), hurdle Poisson (HP) and hurdle negative binomial (HNB) and Gamma Hurdle
(HGamma), truncated Normal Hurdle (HTGauss), hurdle Weibull (HWeibull) and zero
inflated Gaussian (ZIGauss) are used for for continuous outcomes.
Objective: Our goal is to expand these for modeling clustered data by
developing a unified SAS macro based on PROC NLMIXED.
Data and Methods: The motivating data set comes from a longitudinal study in an
African America population with poorly controlled type 2 diabetes conducted at
VA and MUSC centers in SC between 2008 and 2011. A total of 256 subjects were
followed for one year and measures were taken at baseline and at month 3, 6 and
12. post baseline after the subjects were randomly assigned to four treatment
groups: Telephone-delivered diabetes knowledge/information, Telephone-delivered
motivation/behavioral skills training intervention, Telephone-delivered
diabetes knowledge/information and motivation/behavioral intervention and Usual
Care. The main goal of the study was to determine the efficacy of the treatment
groups in relation to the usual care group in reducing the levels of hemoglobin
A1C at 12 months. We use these data to demonstrate the application of the
unified SAS macro.
Results: We show that using the unified SAS macro improves the efficiency of
analyzing multiple outcomes with zero-inflation and facilitates model
comparison.
Using Regression Model to Predict Earthquake Magnitude and Ground Acceleration at South Carolina Coastal Plain (SCCP)
Emad Gheibi, Sarah Gassman and Abbas Tavakoli
PO-65
Seismically-induced liquefaction is one of the most hazardous geotechnical
phenomenons from earthquakes that can cause loss of lives and devastating
damages to infrastructures. In 1964, 7.5 Richter earthquake magnitudes in
Nigata, Japan destroyed numerous buildings and structures and initiated studies
to understand soil liquefaction. One major outcome of these studies has been
the development of correlations that are used to determine liquefaction
resistance of soil deposits from in-situ soil indices. These relations are
based on the Holocene soils (<10,000 years old) while the sand deposits
encountered in the South Carolina Coastal Plain (SCCP) are older than 100,000
years and thus the current empirical correlations are not valid for measuring
soil resistance against liquefaction. Researchers have developed methodology
that considers the effect of aging on liquefaction potential of sands. In-situ
and geotechnical laboratory tests have performed in the vicinity of sand blows
which date back to 6000 years ago at Fort Dorchester, Sampit, Gapway, Hollywood
and Four Hole Swamp sites in the SCCP. Paleoliquefaction studies have been
performed to back analyze the earthquake magnitude and the required maximum
acceleration to initiate liquefaction at the time of the prehistoric earthquake
at these 5 sites. In this paper, descriptive statistics include frequency
distribution for categorical variables and summary statistics for continuous
variables is carried out. Statistical analysis using regression models are
performed for selected variables on the calculated values of earthquake
magnitude and maximum acceleration (dependent variables). SAS 9.4 used to
analyze the data.
PROC MEANS for Disaggregating Statistics in SAS: One Input Data Set and One Output Data Set with Everything You Need
Imelda Go and Abbas Tavakoli
PO-133
The need to calculate statistics for various groups or classifications is ever
present. Calculating such statistics may involve different strategies with some
being less efficient than others. A common approach by new SAS programmers who
are not familiar with PROC MEANS is to create a SAS data set for each group of
interest and to execute PROC MEANS for each group. This strategy can be
resource-intensive when large data sets are involved. It requires multiple PROC
MEANS statements due to multiple input data sets and involves multiple output
data sets (one per group of interest). In lieu of this, an economy of
programming code can be achieved using a simple coding strategy in the DATA
step to take advantage of PROC MEANS capabilities. Variables that indicate
group membership (1 for group membership, blank for non-group membership) can
be created for each group of interest in a master data set. The master data set
with these blank/1 indicator variables can then be processed with PROC MEANS
and its different statements (i.e., CLASS and TYPES) to produce one data set
with all the statistics generated for each group of interest.
Exploring the Use of Negative Binomial Regression Modeling for Pediatric Peripheral Intravenous Catheterization
Jennifer Mann, Jason Brinkley and Pamela Larsen
PO-109
A large study conducted at two southeastern US hospitals from October 2007
through October 2008 sought to identify predictive variables for successful
intravenous catheter (IV) insertion, a crucial procedure that is potentially
difficult and time consuming in young children. The data was collected on a
sample of 592 children that received a total of 1195 attempts to start
peripheral IV catheters in the inpatient setting. The median age of children
was 2.25 years, with an age range of 2 days to 18 years. The outcome here is
number of attempts to successful IV placement for which the underlying data
appears to have a negative binomial structure. The goal here is to illustrate
the appropriateness of a negative binomial assumption using visuals obtained
from Proc SGPLOT and to determine the goodness of fit for a negative binomial
model.
Negative binomial regression output from Proc GENMOD will be contrasted
with traditional ordinary least squares output. Akaike’s Information
Criterion (AIC) illustrates that the negative binomial model has a better fit
and comparisons are made in the inferences of covariate impact. Many scenarios
of negative binomial regression follow from an application to overdispersed
Poisson data; however, this project demonstrates a dataset that fits well under
the traditional ideology and purpose of a negative binomial model.
To Foam or not to Foam: A Survival Analysis of the Foam Head that Forms when a Soda is Poured
Kate Phillips
PO-74
The goal of this study is to determine which factors influence the dissolve
time of the foam head that forms after a soda is poured. This study proposes a
hierarchical logistic model in order to estimate a particular soda’s
probability of being a “small fizzer” (the desired outcome) as opposed to a
“big fizzer,” with the median dissolve time of 12 seconds serving as the
cut point for the binary outcome. A standard procedure for testing foam head
dissolve time was developed in order to collect the study data. A sample of 80
Coke products was then tested during fall 2013; characteristics of each product
sampled were also recorded. All analyses were then conducted using Base SAS
9.3. After conducting a univariate analysis for each factor of interest, the
continuous response variable was then dichotomized into the binary outcome of
interest. A bivariate analysis was then conducted; odds ratios with their
confidence intervals were examined in order to determine a predictor’s
significance with respect to the binary outcome. Table row percentages were
examined for factors where odds ratios were not given by SAS. It was discovered
that the most significant factors were sweetener type and a previously
undiscovered (to the author’s best knowledge) interaction between test
container material and the presence/absence of caffeine (“test container
material” refers to the material that the beverage was poured into for
testing). According to the study results, this interaction was the most
influential factor with respect to foam head dissolve time. The odds ratio for
sweetener type was 2.25 (95% CI: 0.91, 5.54). With caffeine present, the odds
ratio for test container material was 0.76 (95% CI: 0.23, 2.53). With caffeine
absent, the odds ratio for test container material jumped to 11.70 (95% CI:
1.85, 74.19). The final hierarchical logistic model retains the factors
“sweetener type,” “test container material,” and the interaction
between test container material and the presence/absence of caffeine.
Connect with SAS® Professionals Around the World with LinkedIn and sasCommunity.org
Kirk Paul Lafler and Charles Edwin Shipp
PO-55
Accelerate your career and professional development with LinkedIn and
sasCommunity.org. Establish and manage a professional network of trusted
contacts, colleagues and experts. These exciting social networking and
collaborative online communities enable users to connect with millions of SAS
users worldwide, anytime and anywhere. This presentation explores how to
create a LinkedIn profile and social networking content, develop a professional
network of friends and colleagues, join special-interest groups, access a
Wiki-based web site where anyone can add or change content on any page on the
web site, share biographical information between both communities using a
built-in widget, exchange ideas in Bloggers Corner, view scheduled and
unscheduled events, use a built-in search facility to search for desired
wiki-content, collaborate on projects and file sharing, read and respond to
specific forum topics, and more.
Evaluating Additivity of Health Effects of Exposure to Multiple Air Pollutants Given Only Summary Data
Laura Williams, Elizabeth Oesterling Owens and Jean-Jacques Dubois
PO-139
A research team is interested in determining if health effects of exposure to a
mixture of air pollutants is additive or not based on data provided by
toxicology studies. Additivity is defined as the effects of exposure to the
mixture being statistically equal to the sum of the effects of exposure to each
individual component of that mixture. The studies of interest typically did not
explicitly test for differences between the effects of the mixture and the sum
of effects of each component of that mixture, however many did provide summary
data for the observed effects. The summary data from individual studies (e.g.
number of subjects [n], mean response, standard deviation) was extracted. SAS
was used to reconstruct representative datasets for each study by randomly
generating n values, which were then normalized to the mean and standard
deviation given. The effect of the mixture of pollutants was tested against the
sum of the effects of each component of the mixture using proc glm. A relative
difference between the mixture and the sum was calculated so results could be
compared even if the endpoints were different. Confidence intervals were
calculated using proc iml. A forest plot of all the results that were not
simply additive was created using proc sgplot. The study details can also be
added to the plot as data points. The method described here allowed us to test
for the effect of interest, as if we had the primary data generated by the
original authors. The views expressed in this abstract are those of the authors
and do not necessarily represent the views or policies of the U.S.
Environmental Protection Agency.
Build your Metadata with PROC CONTENTS and ODS OUTPUT
Louise Hadden
PO-29
Simply using an ODS destination to replay PROC CONTENTS output does not provide
the user with attractive, usable metadata. Harness the power of SAS® and ODS
output objects to create designer multi-tab metadata workbooks with the click
of a mouse!
A National Study of Health Services Utilization and Cost of Care with SAS: Analyses from the 2011 Medical Expenditure Panel Survey
Seungyoung Hwang
PO-68
Objective: To show how to examine the health services utilization and cost of
care associated with mood disorders among the older population aged 65 or older
in the United States.
Research Design and Methods: A cross-sectional study design was used to
identify two groups of elders with mood disorders (n = 441) and without mood
disorders (n = 3,822) using the 2011 Medical Expenditure Panel Survey (MEPS). A
multivariate regression analysis using PROC SURVEYREG procedure in SAS was
conducted to estimate the incremental health services and direct medical costs
(inpatient, outpatient, emergency room, prescription drugs, and other)
attributable to mood disorders.
Measures: Clinical Classification code aggregating ICD-9-CM codes for
depression and bipolar disorders.
Results: The prevalence of mood disorders among individuals aged 65 or older
in 2011 was estimated at 11.38% (5.17 million persons) and their total direct
medical costs were estimated at approximately $81.82 billion in 2011 U.S.
dollars. After adjustment for demographic, socioeconomic, and clinical
characteristics, the additional incremental health services utilization
associated with mood disorders for hospital discharges, number of prescriptions
filled, and office-based visits were 0.14 ± 0.04, 4.76 ± 1.04, and 17.29 ±
2.07, respectively (all p<0.001). The annual adjusted mean incremental total
cost associated with mood disorders was $5,957 (SE: $1,294; p<0.0001) per
person. Inpatient, prescription medications, and office-based visits together
accounted for approximately 78% of the total incremental cost.
Conclusion: The presence of mood disorders for older adults has a substantial
influence on health services utilization and cost of care in the U.S.
Significant savings associated with mood disorders could be realized by cost
effective prescription medications which might reduce the need for subsequent
inpatient or office-based visits.
Key words: SAS; health services utilization; healthcare costs; mood disorders;
older adults.
Reporting and Information Visualization
Design of Experiments (DOE) Using JMP®
Charlie Shipp
RIV-47
JMP has provided some of the best design of experiment software for years. The
JMP team continues the tradition of providing state-of-the-art DOE support. In
addition to the full range of classical and modern design of experiment
approaches, JMP provides a template for Custom Design for specific
requirements. The other choices include: Screening Design; Response Surface
Design; Choice Design; Accelerated Life Test Design; Nonlinear Design; Space
Filling Design; Full Factorial Design; Taguchi Arrays; Mixture Design; and
Augmented Design. Further, sample size and power plots are available.
We give an introduction to these methods followed by a few examples with factors.
Secrets from a SAS(E9) Technical Support Guy: Combining the Power of the Output Deliver System with Microsoft Excel Worksheets
Chevell Parker
RIV-144
Business analysts commonly use Microsoft Excel with the SAS® System to answer
difficult business questions. While you can use these applications
independently of each other to obtain the information you need, you can also
combine the power of those applications, using the SAS Output Delivery System
(ODS) tagsets, to completely automate the process. This combination delivers a
more efficient process that enables you to create fully functional and highly
customized Excel worksheets within SAS. This paper starts by discussing common
questions and problems that SAS Technical Support receives from users when they
try to generate Excel worksheets. The discussion continues with methods for
automating Excel worksheets using ODS tagsets and customizing your worksheets
using the CSS style engine and extended tagsets. In addition, the paper
discusses tips and techniques for moving from the current MSOffice2K and
ExcelXP tagsets to the new Excel destination, which generates output in the
native Excel 2010 forma.
Tricks and Tips for Using the Bootstrap in JMP Pro 11
Jason Brinkley and Jennifer Mann
RIV-49
The bootstrap has become a very popular technique for assessing the variability
of many different unusual estimators. Starting in JMP Pro 10 the bootstrap
feature was added to a wide variety of output options, however there has not
been much development as to the possible uses of this somewhat hidden feature. This paper will discuss a handful of uses that can be added to routine
analyses. Examples include confidence interval estimates of the 5% trimmed
mean and median survival, validation of covariates in regression analysis,
comparing the differences in Spearman correlation estimates across two groups,
and eigenvalues in principal components analysis. The examples will show the
extra depth that can be easily added to routine analyses.
Build your Metadata with PROC CONTENTS and ODS OUTPUT
Louise Hadden
RIV-29
Simply using an ODS destination to replay PROC CONTENTS output does not provide
the user with attractive, usable metadata. Harness the power of SAS® and ODS
output objects to create designer multi-tab metadata workbooks with the click
of a mouse!
Where in the World Are SAS/GRAPH® Maps? An Exploration of the Old and New SAS® Mapping Capacities
Louise Hadden
RIV-30
SAS® has an amazing arsenal of tools to use and display geographic information
that is relatively unknown and underutilized. This presentation highlights both
new and existing capacities for creating stunning, informative maps as well as
using geographic data in other ways. SAS provided map data files, functions,
format libraries and other geographic data files are explored in detail. Custom
mapping of geographic areas are discussed. Maps produced include use of both
the annotate facility (including some new functions) and PROC GREPLAY. Products
used are Base SAS® and SAS/GRAPH®. SAS programmers of any skill level will
benefit from this presentation.
Integrating SAS with JMP to Build an Interactive Application
Merve Gurlu
RIV-50
This presentation will demonstrate how to bring various JMP visuals into one
platform to build an appealing, informative, and interactive dashboard using
JMP Application Builder and make the application more effective by adding data
filters to analyze subgroups of your population with a simple click. Even
though all the data visualizations are done in JMP, importing and merging large
data files, data manipulations, creating new variables and all other data
processing steps are performed by connecting JMP to SAS. This presentation will
demo connecting to SAS to create a data file ready for visualization using SAS
data manipulation capabilities and macros; building interactive visuals using
JMP; and, building an application using JMP application builder. For attendees
who would like to be able to print the visuals in the application, a few tips
and tricks for building PowerPoint presentations will be provided at the end of
the presentation.
Penalizing your Models: An Overview of the Generalized Regression Platform
Michael Crotty and Clay Barker
RIV-151
We will provide an overview of the Generalized Regression personality of the
Fit Model platform, added in JMP Pro version 11. The motivation for using
penalized regression will be discussed, and multiple examples will show how the
platform can be used for variable selection on continuous or count data.
Web Scraping with JMP for Fun and Profit
Michael Hecht
RIV-150
JMP includes powerful tools for importing data from web pages. This talk walks
through a case study that retrieves OS usage share data from the web, and
transforms it into a JMP graph showing usage changes over time. When combined
with JMP’s built-in formulas, value labels, and summarization methods, the
end result is a tool that can be used to quickly evaluate and make decisions
based on OS usage trends.
Enhancements to Basic Patient Profiles
Scott Burroughs
RIV-97
Patient Data Viewers are becoming more prevalent in the pharmaceutical
industry, but not all companies use them nor need them for all situations. Old-fashioned patient profiles still have use in today’s industry, but how can they be enhanced?
Missing data, bad data, and outliers can affect the output and/or the running
of the program. Also, relying on analysis data sets that need to be run first
by others can affect timing (vacations, out-of-office, busy, etc.). As always,
there are things you can do to make them look prettier in general. This paper
will show how to solve these issues and make the program more robust.
Creating Health Maps Using SAS
Shamarial Roberson and Charlotte Baker
RIV-107
There are many different programs that have been developed to map data. However, SAS users do not need to always go outside of their SAS installation
to map data. SAS has many built-in options for mapping that, with a bit of
knowledge, can be just as good as advanced external programs. This paper will
give an introduction to how to create health maps using PROC GMAP and compare
the results to maps created in ArcGIS.
A Strip Plot Gets Jittered into a Beeswarm
Shane Rosanbalm
RIV-52
The beeswarm is a relatively new type of plot and one that SAS does not yet
produce automatically (as of version 9.4). For those unfamiliar with beeswarm
plots, they are related to strip plots and jittered strip plots. Strip plots
are scatter plots with a continuous variable on the vertical axis and a
categorical variable on the horizontal axis (e.g., systolic blood pressure vs.
treatment group). The strip plot is hamstrung by the fact that tightly packed
data points start overlaying one another, obscuring the story that the data are
trying to tell. A jittered strip plot seeks to remedy this problem by randomly
moving data points off of the categorical center line. Depending on the volume
of data and the particular sequence of random jitters, this technique does not
always eliminate all overlays. In order to guarantee no overlays we must adopt
a non-random approach. This is where the beeswarm comes in. The beeswarm
approach is to plot data points one at a time, testing candidate locations for
each new data point until one is found that does not conflict with any
previously plotted data points. The macro presented in this paper performs the
preliminary calculations necessary to avoid overlays and thereby produce a
beeswarm plot.
How To Make An Impressive Map of the United States with SAS/Graph® for Beginners
Sharon Avrunin-Becker
RIV-27
Have you ever been given a map downloaded from the internet and asked to
reproduce the same thing using SAS complete with labels and annotations? As you
stare at all the previously written, brilliant SAS/Graph conference papers, you
start feeling completely overwhelmed. The papers assume you already know how to
get started and you feel like a clueless chimpanzee not understanding what you
are missing. This paper will walk you through the steps to getting started with
your map and how to add colors and annotations that will not only impress your
manager, but most importantly yourself that you could do it too!
Dashboards with SAS Visual Analytics
Tricia Aanderud
RIV-118
Learn the simple steps for creating a dashboard for your company and then see
how SAS Visual Analytics makes it a simple process.
Statistics and Data Analysis
Using SAS to Examine Mediator, Direct and Indirect Effects of Isolation and Fear on Social Support Using Baron& Kenny Combined with Bootstrapping Methods
Abbas Tavakoli and Sue Heiney
SD-64
This study presentation examines mediator, direct and indirect effects of
isolation and fear on social support by using two methods: Baron & Kenny, and
Bootstrapping. This paper used a cross-sectional data from the longitudinal
study randomized trial design in which 185 participants were assigned to the
therapeutic group (n=93) who received by teleconference with participants
interacting in real time with each other and control group (n=92) who received
usual psychosocial care (any support used by the patient in the course of
cancer treatment. Baron and Kenny (1986) steps and Hayes (2004) were used to
examine for direct and indirect effects. Results of Baron indicated that the
relationship between fear and social support was significant (c =-1.151 (total
effect) (p=.0001)) and that there was significant relationship between
isolation and fear (α =1.22 (p=.0001)). Also, previously significant
relationship between fear and social support was not significant (c’ =-.40
(direct effect) (p=.1876) when both fear and isolation were in the model. The
indirect effect was -1.11 and Sobel test was significant (P=.0001). The
results of bootstrapping methods indicated the direct effect wares -.41 (95%
CI: -.42, -.40 for normal theory and -.41 (95% CI: -.99, .14 for percentile)
and indirect effect was -1.06 (95% CI: -1.09, -1.08 for normal theory and
-1.09, -1.55 for percentile). The result showed both methods had significant
indirect effect.
Don't be binary! Tales of Non-binary Categorical Regression
Charlotte Baker
SD-87
It is not always optimal to reorganize your data into two levels for
regression. To prevent the loss of information that occurs when categories are
collapsed, polytomous regression can be used. This paper will discuss
situations in which polytomous regression can be used and how you can write the code.
Maximizing Confidence and Coverage for a Nonparametric Upper Tolerance Limit on the Second Largest Order Statistic for a Fixed Number of Samples
Dennis Beal
SD-93
A nonparametric upper tolerance limit (UTL) bounds a specified percentage of
the population distribution with specified confidence. The confidence and
coverage of a UTL based on the second largest order statistic is evaluated for
an infinitely large population. This relationship can be used to determine the
number of samples prior to sampling to achieve a given confidence and coverage. However, often statisticians are given a data set and asked to calculate a UTL
for the second largest order statistic for the number of samples provided. Since the number of samples usually cannot be increased to increase confidence
or coverage for the UTL, the maximum confidence and coverage for the given
number of samples is desired. This paper derives the maximum confidence and
coverage for the second largest order statistic for a fixed number of samples. This relationship is demonstrated both graphically and in tabular form. The
maximum confidence and coverage are calculated for several sample sizes using
results from the maximization. This paper is for intermediate SAS® users of
Base SAS® who understand statistical intervals.
Strimmed_t: A SAS® Macro for the Symmetric Trimmed t Test
Diep Nguyen, Anh Kellermann, Patricia Rodríguez de Gil, Eun Sook Kim and Jeffrey Kromrey
SD-79
It is common to use the independent means t-test to test the equality of two
population means. However, this test is very sensitive to violations of the
population normality and homogeneity of variance assumptions. In such
situations, Yuen’s (1974) trimmed t-test is recommended as a robust
alternative. The aim of this paper is to provide a SAS macro that allows easy
computation of Yuen’s symmetric trimmed t-test. The macro output includes a
table with trimmed means for each of two groups, Winsorized variance estimates,
degrees of freedom, and obtained value of t (with two-tailed p-value).
In addition, the results of a simulation study are presented and provide empirical
comparisons of the Type I error rates and statistical power of the independent
samples t-test, Satterthwaite’s approximate t-test and the trimmed t-test
when the assumptions of normality and homogeneity of variance are violated.
ANOVA_HOV: A SAS® Macro for Testing Homogeneity of Variance in One-Factor ANOVA Models
Diep Nguyen, Thanh Pham, Patricia Rodríguez de Gil, Tyler Hicks, Yan Wang, Isaac Li, Aarti Bellara, Jeanine Romano, Eun Sook Kim, Harold Holmes, Yi-Hsin Chen and Jeffrey Kromrey
SD-81
Variance homogeneity is one of the critical assumptions when conducting ANOVA
as violations may lead to perturbations in Type I error rates. Previous
empirical research suggests minimal consensus among studies as to which test is
appropriate for a particular analysis. This paper provides a SAS macro for
testing the homogeneity of variance assumption in one-way ANOVA models using
ten different approaches. In addition, this paper describes the rationale
associated with examining the variance assumption in ANOVA and whether the
results could inform decisions regarding the selection of a valid test for mean
differences. Using simulation methods, the ten tests evaluating the variance
homogeneity assumption were compared in terms of their Type I error rate and
statistical power.
Text Analytics using High Performance SAS Text Miner
Edward Jones
SD-112
The latest release of SAS Enterprise Miner, version 12.3, contains high
performance modules, including a new module for text mining. Paper compares
this new module to the SAS Text Miner modules for text mining. The advantages
and disadvantages of HP Text Miner are discussed. This is illustrated using
customer survey data.
How does Q-matrix Misspecification Affect the Linear Logistic Test Model’s Parameter Estimates?
George MacDonald and Jeffrey Kromrey
SD-99
Cognitive diagnostic assessment (CDA) is an important thrust in measurement
designed to assess students’ cognitive knowledge structures and processing
skills in relation to item difficulty (Leighton & Gierl, 2007). If the goal of
assessing students’ strengths and weaknesses is to be accomplished, it will
be important to develop standardized assessments that measure the psychological
processes involved in conceptual understanding. The field of CDA in general
and the linear logistic test model in particular can be thought of as a
response to these emerging educational needs (MacDonald, G., 2014). A simulation study was conducted to explore the performance of the linear
logistic test model (LLTM) when the relationships between items and cognitive
components were misspecified. Factors manipulated included percent of
misspecification (0%, 1%, 5%, 10%, and 15%), form of misspecification
(under-specification, balanced misspecification, and over-specification),
sample size (20, 40, 80, 160, 320, 640, and 1280), Q-matrix density (60% and
46%), number of items (20, 40, and 60 items), and skewness of person ability
distribution (-0.5, 0, and 0.5). Statistical bias, root mean squared error,
confidence interval coverage, and confidence interval width were computed to
interpret the impact of the design factors on the cognitive components, item
difficulty, and person ability parameter estimates. The simulation provided rich results and selected key conclusions include (a)
SAS works superbly when estimating LLTM using a marginal maximum likelihood
approach for cognitive components and an empirical Bayes estimation for person
ability, (b) parameter estimates are sensitive to misspecification, (c)
under-specification is preferred to over-specification of the Q-matrix, (d)
when properly specified the cognitive components parameter estimates often have
tolerable amounts of root mean squared error when the sample size is greater
than 80, (e) LLTM is robust to the density of Q-matrix specification, (f) the
LLTM works well when the number of items is 40 or greater, and (g) LLTM is
robust to a slight skewness of the person ability distribution. In sum, the
LLTM is capable of identifying conceptual knowledge when the Q-matrix is
properly specified, which is a rich area for applied empirical research
(MacDonald, 2014).
Modeling Cognitive Processes of Learning with SAS® Procedures
Isaac Li, Yi-Hsin Chen, Chunhua Cao and Yan Wang
SD-83
Traditionally, the primary goal of educational assessments has been to evaluate
students’ academic achievement or proficiency in comparison to their peers or
against promulgated standards. Both classical test theory (CTT) and item
response theory (IRT) modeling frameworks provide measurement in the form of a
summative estimate of the outcome variable. In recent years, understanding and
exploring the complex cognitive processes that contribute to the learning
outcome received growing interest in the field of psychometrics, where various
item response modeling approaches have been devised to describe the
relationship between the outcome and its componential attributes. Such
approaches include the linear logistic test model (LLTM), the crossed
random-effects linear logistic test model (CRELLTM), and the two-stage multiple
regression method (MR). This paper will not only introduce these statistical
models but also demonstrate how to obtain parameter estimates and model-data
fit indices for cognitive processes under each model by employing the GLM,
NLMIXED, and GLIMMIX procedures.
Power and Sample Size Computations
John Castellon
SD-145
Power determination and sample size computations are an important aspect of
study planning and help produce studies with useful results for minimum
resources. This tutorial reviews basic methodology for power and sample size
computations for a number of analyses including proportion tests, t tests,
confidence intervals, equivalence and noninferiority tests, survival analyses,
correlation, regression, ANOVA, and more complex linear models. The tutorial
illustrates these methods with numerous examples using the POWER and GLMPOWER
procedures in SAS/STAT® software as well as the Power and Sample Size
Application. Learn how to compute power and sample size, perform sensitivity
analyses for other factors such as variability and type I error rate, and
produce customized tables, graphs, and narratives. Special attention will be
given to the newer power and sample size analysis features in SAS/STAT software
for logistic regression and the Wilcoxon-Mann-Whitney (rank-sum) test.
Prior exposure to power and sample size computations is assumed.
Gestational Diabetes Mellitus and changes in offspring’s weight during infancy: A longitudinal analysis
Marsha Samson, Olubunmi Orekoya, Dumbiri Onyeajam and Tushar Trivedi
SD-128
Background: Gestational Diabetes Mellitus (GDM) is the most common metabolic
disorder during pregnancy in the United States with an incidence ranging
between 7-14% of all pregnancies. GDM has been associated with various adverse
effects including macrosomia and the heightened risk of type 2 diabetes
mellitus (T2DM) in mothers. Many studies have shown the association between GDM
and offspring birth weight but very few studies have assessed the longitudinal
relationship between GDM and offspring’s weight at different time points.
Objectives: The purpose of this study is to determine the association between
GDM and changes in infant’s weight at 0, 3, 5, 7 and 12 months.
Methods: We used data from the Infant Feeding Practices Survey II, a large
prospective study of pregnant women living in the United States. We examined
GDM and babies weight at 0, 3, 5, 7 and 12 months among 1,072 mothers. We used
general linear models to assess the impact of GDM on infant weight, adjusted
for socio-demographic variables and other potential confounders.
Results: The mean age of our sample is 30.5 years. Infants from mothers with
GDM had no significant difference in mean weight during infancy compared with
infants whose mothers did not have GDM (adjusted coefficient: 0.1870, 95 %
Confidence Interval: -0.0431, 0.4171). Mean weight during infancy for those
born to non-Hispanic Black mothers was significantly higher than those born to
non-Hispanic Whites (Adjusted coefficeint-0.4787, 95% CI: -0.8533, -0.1040). During infancy, mean weight for boys was significantly higher than in girls
(adjusted coefficient 0.4698, 95% CI: 0.3491, 0.5906). With every one more
cigarette smoked per day during pregnancy, mean infantile weight decreased by
4% (95% CI: -0.06145, -0.0166).
Conclusion: Our study did not show a significant association between GDM and
mean weight during infancy. However, this result should be treated with caution
because of the small number of GDM in our sample, and also because of unique
demographic composition of our study sample which consisted mostly of white
women.
Multilevel Models for Categorical Data using SAS® PROC GLIMMIX: The Basics
Mihaela Ene, Elizabeth Leighton, Genine Blue and Bethany Bell
SD-134
Multilevel models (MLMs) are frequently used in social and health sciences
where data are typically hierarchical in nature. However, the commonly used
hierarchical linear models (HLMs) are only appropriate when the outcome of
interest is continuous; when dealing with categorical outcomes, a
transformation and an appropriate error distribution for the response variable
need to be incorporated into the model and therefore, hierarchical generalized
linear models (HGLMs) need to be used. This paper provides an introduction to
specifying hierarchical generalized linear models using PROC GLIMMIX, following
the structure of the primer for hierarchical linear models previously presented
by Bell, Ene, Smiley, and Schoeneberger (2013). A brief introduction into the
field of multilevel modeling and HGLMs with both dichotomous and polytomous
outcomes is followed by a discussion of the model building process and
appropriate ways to assess the fit of these models. Next, the paper provides a
discussion of PROC GLIMMIX statements and options as well as concrete examples
of how PROC GLIMMIX can be used to estimate (a) two-level organizational models
with dichotomous outcomes and (b) two-level organizational models with
polytomous outcomes. These examples use data from High School and Beyond
(HS&B), a nationally-representative longitudinal study of American youth. For
each example, narrative explanations accompany annotated examples of the
GLIMMIX code and corresponding output.
Analyzing Multilevel Models with the GLIMMIX Procedure
Min Zhu
SD-141
Hierarchical data are common in many fields, from pharmaceuticals to
agriculture to sociology. As data sizes and sources grow, information is likely
to be observed on nested units at multiple levels, calling for the multilevel
modeling approach. This paper describes how to use the GLIMMIX procedure in SAS/STAT® to analyze
hierarchical data that have a wide variety of distributions. Examples are
included to illustrate the flexibility that PROC GLIMMIX offers for modeling
within-unit correlation, disentangling explanatory variables at different
levels, and handling unbalanced data. Also discussed are enhanced weighting options, new in SAS/STAT 13.1, for both
the MODEL and RANDOM statements. These weighting options enable PROC GLIMMIX to
handle weights at different levels. PROC GLIMMIX uses a pseudolikelihood
approach to estimate parameters, and it computes robust standard error
estimators. This new feature is applied to an example of complex survey data
that are collected from multistage sampling and have unequal sampling probabilities.
%DISCIT Macro: Pre-screening continuous variables for subsequent binary logistic regression analysis through visualization
Mohamed Anany
SD-92
Prescreening variables to identify potential predictors to enter into the model
is an important stage in any modeling process. The goal is to select the
variables that will result in the “best” model. In binary logistic
regression, when there are many independent variables that could potentially be
included in the model, it is always a good practice to perform bivariate
analysis between the dichotomous variable (dependent) and the independent
variables. The independent variables come in many forms; binary, continuous,
nominal categorical, and/or ordinal categorical variables. This presentation is
concerned with identifying candidate continuous variables by performing a
bivariate analysis. The analysis is based on a two-sample t -test, a graphical
panel to visualize the relationship of the continuous variable with the
dichotomous dependent variable, recoding the continuous variable into two
different ordinal forms and adjusting their scale through odds and log odds
transformations if needed, and collapsing similar groups of the recoded form of
the continuous variable to ensure (or improve) their linear relationship if
exists. Also, we make use of the information value which gives an indication of
the predictive power of the independent variable in capturing the dichotomous
variable. The Information value is specific to the credit and financial
industries. The SAS DISCIT macro was developed to make this prescreening
process easier for analysts.
Using SAS to Create a p-value Resampling Distribution for a Statistical Test
Peter Wludyka and Carmen Smotherman
SD-91
One starts with data to perform a statistical test of a hypothesis. A p-value
is associated with a particular test and this p-value can be used to decide
whether to reject a null hypothesis in favor of some alternative. Since the
data (the sample) is usually all a researcher knows factually regarding the
phenomenon under study, one can imagine that by sampling (resampling) with
replacement from that original data that additional information about the
hypothesis and phenomenon/study can be acquired. One way to acquire such
information is to repeatedly resample for the original data set (using, for
example, PROC SURVEYSELECT) and at each iteration (replication of the data
set) perform the statistical test of interest and calculate the corresponding
p-value. At the end of this stage one has r p-values (r is typically greater
than 1,000), one for each performance of the statistical test. Thinking of the
original p-value as a quantile of this distribution of p-values allows one to
assess the likelihood that the original hypothesis would have been rejected,
which helps put the actual decision in perspective. The resampling
distribution of p-values also allows one to retrospectively assess the power of
the test by finding the proportion of the p-values that are less than a
specified level of significance (alpha). By creating a p-value resampling
distribution for a selection of sample sizes one can create a power curve which
can be used prospectively to gather sample size information for follow up studies.
Making Comparisons Fair: How LS-Means Unify the Analysis of Linear Models
Weijie Cai
SD-142
How do you compare group responses when the data are unbalanced or when
covariates come into play? Simple averages will not do, but LS-means are just
the ticket. Central to postfitting analysis in SAS/STAT® linear modeling
procedures, LS-means generalize the simple average for unbalanced data and
complicated models. They play a key role both in standard treatment comparisons
and Type III tests and in newer techniques such as sliced interaction effects
and diffograms. This paper reviews the definition of LS-means, focusing on
their interpretation as predicted population marginal means, and it illustrates
their broad range of use with numerous examples.
Annotation utilizations in customized SAS/Graph Bar Charts
Yong Liu, Hua Lu, Liang Wei, Xingyou Zhang, Paul Eke and James Holt
SD-16
Bar graphs are generated by using SAS/Gchart to present the distribution of
health behaviors or health outcomes among adults aged ≥18 years by selected
characteristics and each of 50 states using 2011 Behavioral Risk Factor
Surveillance Survey (BRFSS). Due to missing data or unreliable estimates of
parameters, annotation facilities are utilized to make the charts more
presentable by adding data labels and footnotes. Further, incorporating a SAS
Macro variable into the program can definitely make the development of 50
charts for 50 states more achievable and efficient.