SESUG 2015 Conference Abstracts
Application Development
Integrating Microsoft VBScript and SAS
Christopher Johnson
AD-9
VBScript and SAS are each powerful tools in their own right. These two
technologies can be combined so that SAS code can call a VBScript program or
vice versa. This gives a programmer the ability to automate SAS tasks,
traverse the file system, send emails programmatically, manipulate Microsoft®
Word, Excel, and PowerPoint files, get web data, and more. This paper will
present example code to demonstrate each of these capabilities.
One SAS To Rule Them All…
William Zupko
AD-86
In order to make graphs and charts, our audience preferred Excel charts and
graphs compared to SAS charts and graphs. However, to make the necessary 30
graphs in Excel took 2-3 hours of manual work, even having chart templates
already created, and also led to mistakes due to human error. SAS graphs took
much less time to create, but lacked key functionality that the audience
preferred available in Excel graphs. Thanks to SAS, the answer came in X4ML
programming. SAS can actually submit coding to Excel in order to create
customized data reporting, create graphs or update templates’ data series,
and even populate word documents for finalized reports. This paper explores
how SAS is used to create presentation-ready graphs in a proven process that
takes less than one minute, compared to the earlier process that took hours. The following code will be utilized and/or discussed: %macro(macro_var), filename, rc commands, ODS, X4ML, and VBA (Microsoft Visual Basic for Applications).
Using PROC SURVEYSELECT: Random Sampling
Raissa Kouadjo
AD-190
This paper will examine some of the capabilities of PROC SURVEYSELECT in SAS
Studio to show the task of drawing a random sample. Every SAS programmer needs
to know how to design a statistically efficient sample. PROC SURVEYSELECT
allows the user the flexibility to customize the design parameters.
Using SAS PROC SQL to Create a Build Combinations Tool to Support Modularity
Stephen Sloan
AD-32
With SAS PROC SQL we can use a combination of a manufacturing Bill of Materials
and a sales specification document to calculate the total number of
configurations of a product that are potentially available for sale. This will
allow the organization to increase modularity with maximum efficiency.
Since some options might require or preclude other options, the result is more
complex than a straight multiplication of the numbers of available options. Through judicious use of PROC SQL, we can maintain accuracy while reducing the time, space, and complexity involved in the calculations.
SAS/GRAPH and Annotate Facility--More Than Just a Bunch of Labels and Lines
Mike Hunsucker
AD-48
SAS/GRAPH procedures enhanced with the Annotate facility are a cornerstone
capability that provides flexible capability to customize graphical displays
that goes well beyond the "standard" outputs of SAS/GRAPH PROCs. This paper
does not attempt to describe unique or seldom used capabilities in SAS/GRAPH
but instead it will expose the audience to several ways to exploit the Annotate
facility that enhances output far beyond an occasional label or line drawing. Products reviewed provide situational awareness to military planners and decisions makers daily.
INTRODUCTION:
14th Weather Squadron in Asheville, NC, is the Department of Defense’s
climatology organization supplying planning weather and climatological
statistics to military, intelligence, and research communities. The squadron
has exploited SAS capabilities for over 25 years but recently implemented
dynamically built SAS/GRAPH graphics-based capabilities ranging from simple
“cartoon” visualizations for deploying military members to complex
statistical extreme-values gradient maps for national laboratory researchers.
This paper will highlight SAS/GRAPH capabilities including GFONT, GMAP, G3GRID,
GINSIDE, GSLIDE, and more.
Five Little Known, But Highly Valuable and Widely Usable, PROC SQL Programming Techniques
Kirk Paul Lafler
AD-16
The SQL Procedure contains a number of powerful and elegant language features
for SQL users. This presentation highlights five little known, but highly
valuable and widely usable, topics that will help users harness the power of
the SQL procedure. Topics include using PROC SQL to identify FIRST.row,
LAST.row and Between.rows in BY-group processing; constructing and searching
the contents of a value-list macro variable for a specific value; data
validation operations; data summary operations to process down rows and across
columns; and using the MSGLEVEL= system option and _METHOD SQL option to
capture information into the processes during query evaluation, the algorithm
selected and used by the optimizer when processing a query, testing and
debugging operations, and other processes.
Masking Data To Obscure Confidential Values: A Simple Approach
Bruce Gilsen
AD-38
When I help users design or debug their SAS ® programs, they are sometimes
unable to provide relevant SAS data sets because they contain confidential
information. Sometimes, confidential data values are intrinsic to their
problem, but often the problem could still be identified or resolved with
innocuous data values that preserve some of the structure of the confidential
data. Or, the confidential values are in variables that are unrelated to the
problem.
While techniques for masking or disguising data exist, they are often complex
or proprietary. In this paper, I describe a very simple macro, REVALUE, that
can change the values in a SAS data set. REVALUE preserves some of the
structure of the original data by ensuring that for a given variable,
observations with the same real value have the same replacement value, and if
possible, observations with a different real value have a different replacement
value. REVALUE allows the user to specify the variables to change and whether
to order the replacement values for each variable by the sort order of the real
values or by observation order.
In this paper, I will discuss the REVALUE macro in detail, and provide a copy
of the macro.
Unlock SAS Code Automation with the Power of Macros
William Zupko
AD-87
SAS code, like any computer programming code, seems to go through a life cycle
depending on the needs of that code. Often, SAS programmers need to determine
where a code might be in that life cycle and, depending on what that code is
used for, choose to maintain, update, or reuse SAS code in current or future
projects. These SAS programmers need to decide what the best option for the
code is. Simple code that has few variables or options is easy to leave
hard-coded, as it is a quick fix for the programmer to maintain and update this
code. Complex code, which can have multiple variables and options, can be
difficult to maintain and update. This paper goes through the process a SAS
programmer might encounter and talk about times when it is useful and necessary
to automate SAS code. Then, it explores useful SAS code that helps in
maintenance and updating utilities, talking about when an option is appropriate
and when it is not. The following SAS code will be utilized: %macro, %let,
call symput(x), symget, %put, %if, %do, :into, ods output, %eval, option
statements, and %sysfunc.
GreenSpace: A Macro to Improve a SAS Data Set Footprint
Brian Varney
AD-150
SAS programs can be very I/O intensive. SAS Data Sets with inappropriate
variable attributes can degrade the performance of SAS programs. Using SAS
compression offers some relief but does not eliminate the issue of
inappropriately defined SAS variables. This paper intends to examine the
problems inappropriate SAS variable attributes cause as well as a macro to
tackle the problem of minimizing the footprint of a SAS Data Set.
Automating Simulation Studies with Base SAS(r) Macros
Vincent Hunter
AD-93
Simulations are common in methodological studies in the social sciences. Even
the most dedicated researchers have difficulty processing more than 100-200
repetitions, especially where different analysis programs must be processed in
sequence. However, studies requiring hundreds or thousands of processing
repetitions of data under the same set of study conditions are necessary for
robust results. Where different study conditions are to be compared, the
number of repetitions becomes even larger making the processing of an adequate
number of iterations all but impossible when done one at a time.
Base SAS offers two tools for automating the processing of simulations: (1)
Macros which divide the task into distinct jobs that may be easily modified for
different conditions and number of iterations; (2) the ability to invoke
non-SAS analysis programs (e.g., Mplus, R, and Bilog). Using these tools a
researcher can create and process an appropriate amount of simulated data to
obtain adequate power and control of Type I and II errors.
A recent simulation performed by the author is used as an example.
Rapidly Assessing Data Completeness
David Abbott
AD-130
Data analysts are often asked to work with collections of data sets prepared by
others and with varying degrees of history/documentation. An important early
question is, “How complete are these data? What data completeness issues
might be present?” This paper presents an efficient technique for addressing
this question both in terms of characterizing the number and patterns of
missing values and, similarly, the omitted rows of data (i.e., primary
identifier values not occurring in a given data set and occurring in some other
dataset).
Several short macros and two key algorithms enable the technique presented. The
first algorithm produces a table of missing value patterns in the style of PROC
MI on a per dataset basis. The second performs the manipulations needed to
exhibit patterns of missing identifiers across a collection of datasets.
Following this technique, analysts will be able to rapidly assess data
completeness of inherited data set collections, provided a primary identifier
(e.g., a subject ID) is used consistently in the collection.
Programming Compliance Made Easy with a Time Saving Toolbox
Patricia Guldin
AD-35
Programmers perform validation in accordance with established regulations,
guidelines, policies and procedures to ensure the integrity of analyses and
reporting, reduce risk for delays in product approvals, fines, legal actions,
and to safeguard reputations. We understand the importance, but the time
involved to produce and appropriately store the documentation and evidence
required to prove we followed process and SOPs can be labor intensive and
burdensome. Using SAS/AF®, SAS® Component Language and .NET we have developed
two versions of an automated tool that can be used with PC SAS® or Enterprise
Guide®. The toolbox is designed to make compliance with programming SOPs
easier, increase consistency, and save the programmer time. The toolbox
auto-populates some information and saves documentation in designated locations
as actions are performed. Functions include creating and verifying a standard
program header, updating program headers, revision history and version date,
creating validation environments including testing checklists, and promoting
programs. The toolbox is also used to view transaction logs, create and/or
generate batch jobs for remote execution in UNIX, and to select and include
macro calls from a macro library.
SAS Data Integration Studio – Take Control with Conditional & Looping Transformations
Harry Droogendyk
AD-167
SAS Data Integration Studio jobs are not always linear. While Loop
transformations have been part of DI Studio for ages, only more recently has
SAS Data Integration Studio included the Conditional Control transformations to
control logic flow within a job. This paper will demonstrate the use of both
the Loop and Conditional transformations in a real world example.
A Methodology for Truly Dynamic Prompting in SAS® Stored Processes
Haikuo Bian, Carlos Jimenez and David Maddox
AD-172
Dynamic prompts in SAS stored processes may be developed by selecting the
“dynamic list” option during prompt construction. The list is usually
sourced from a SAS dataset that is pre-defined in the metadata. However, the
process of refreshing the dataset is usually independent of the stored process
and must be included somewhere in the application. Using SAS views as a
source for dynamic prompts will insure that the list will be truly dynamic. This paper illustrates the process with a cascading prompt example.
The Perfect Marriage: Using SAS Enterprise Guide, the SAS Add-In for Microsoft Office, and Excel to Support Enrollment Forecasting at A Large University
Andre Watts and Lisa Sklar
AD-141
The Office of Institutional Research at the University of Central Florida is
tasked with supporting the enrollment management process of the institution by
providing five year enrollment forecasting of various enrollment measures. A
key component of the process is providing university stakeholders with a
self-service, secure, and flexible tool that enables them to quickly generate
different enrollment projections using the most up-to-date information as
possible in Microsoft Excel. This presentation will show an example of how to
effectively integrate both SAS Enterprise Guide and the SAS Add-In for
Microsoft Office to support a critical process which has very specific
stakeholder requirements and expectations.
You’ve Got Mail®: Automating SAS® from an Email
Peter Davis and Mark Asiala
AD-149
The American Community Survey is an ongoing population and housing survey that
provides data every year – giving communities the current information they
need to plan investments and services. As such, repetitive processing is
necessary and must be completed in a timely manner. Automation, where
appropriate, is an essential component for operational efficiency.
As an example of where automation is implemented, we receive an email each
month that serves as notification that one operation has completed and the next
operation may begin. Instead of waiting for this email to manually submit our
SAS programs, what if the delivery of the email initiated our SAS programs?
This paper demonstrates a KornShell (ksh93) script which parses through an
email delivered to a user’s UNIX email account. The script “reads” the
email. As long as the deliverer and the subject of the email meet
certain requirements, the appropriate SAS programs are submitted. If not, an email is
sent to the user stating that an email was received but no further action
occurred.
Data Labs with SAS and Teradata: Value, Purpose and Best Practices
Tho Nguyen and William E. Benjamin Jr
AD-201
A data lab also called a ‘play pen’ or ‘sand box’ is an area to explore
and examine ideas and possibilities by combining new data with existing data to
create experimental designs and ad-hoc queries without interrupting the
production environment. A Teradata data lab with SAS that provides SAS users
immediate access to critical data for exploration and discovery. It is an
environment that enables agile in-database analytics by simplifying the
provisioning and management of analytic workspace within the production data
warehouse. By allocating that space, it provides data lab users easy access to
all of the data without moving or duplicating the data. Come learn how SAS and
Teradata are integrated in the data lab, hear some best practices and use cases
from our joint customers.
Banking and Finance
Migrating Databases from Oracle to Teradata
Phillip Julian
BKF-91
We carefully planned and provisioned our database migration from Oracle to
Teradata. We had timelines, weekly progress and planning meetings,
presentations comparing current state to the future state, detailed project
plans for each task, Oracle DBAs, consultants from Teradata, and support from
all departments. It was an ideal situation for moving the Enterprise to a new
system of record based upon Teradata.
Our team had delays and data issues that no one could anticipate. I had
researched every issue that may happen with SAS upgrades, Teradata, and our
particular environment. But the literature and support did not prepare us for
anticipating or solving our migration problems. Instead of 6 months, we only
had 6 weeks to finish the project.
We had no time to hire an army of experts, so we had to solve our own issues.
I will describe those issues, our solutions, and our tricks that facilitated
rapid development. I'm still surprised at our rapid progress, and I am thankful for
the team's efforts and ingenuity. Our experiences should help others who are
planning or performing database and SAS migrations.
Our industry is financial services, we are regulated by the federal government,
and we must keep records for every change. Our software environment is SAS on
UNIX, SAS on PC, Teradata, Oracle, and Data Integration Studio with the
framework of SAS Credit Scoring for Banking. The UNIX hosts are multi-tiered
with separate development and production platforms.
Getting Your SAS® Program to do Your Typing for You!
Nancy Wilson
BKF-55
Do you have a SAS® program that requires adding file names to the input every
time you run it? Aren't you tired of having to check for the files, check the
names and type them in? Check out how this SAS® Enterprise Guide Project
checks for files, figures out the file names and eliminates the need for having
to type in the file names for the input data files!
Reducing Credit Union Member Attrition with Predictive Analytics
Nate Derby and Mark Keintz
BKF-118
As credit unions market themselves to increase their market share against the
big banks, they understandably focus on gaining new members. However, they must
also retain (and further engage) their existing members. Otherwise, the new
members they gain can easily be offset by existing members who leave. Happily,
by using predictive analytics as described in this paper, it can actually be
much easier and less expensive to keep (and further cultivate) existing members
than to enlist new ones.
This paper provides a step-by-step overview of a relatively simple but
comprehensive approach to reduce member attrition. We first prepare the data
for a statistical analysis. With some basic predictive analytics techniques, we
can then identify those members who have the highest chance of leaving and the
highest value. For each of these members, we can also identify why they would
leave, thus suggesting the best way to intervene to retain them. We then make
suggestions to improve the model for better accuracy. Finally, we provide
suggestions to extend this approach to cultivating existing members and thus
increasing their lifetime value.
Code snippets will be shown for any version of SAS but will require the
SAS/STAT package. This approach can also be applied to many other organizations
and industries.
Population Stability and Model Performance Metrics Replication for Business Model at SunTrust Bank
Bogdan Gadidov and Benjamin McBurnett
BKF-132
Board of Governors of the Federal Reserve System has published Supervisory
Guidance on Model Risk Management (SR Letter 11-7) emphasizing that banks rely
heavily on quantitative analysis and models in most aspects of financial
decision making. Ongoing monitoring and maintenance (M&M) is essential for
timely evaluation of model performance to determine whether changes in business
strategies and market conditions require adjustment, redevelopment, or
replacement of the model. A typical M&M plan includes tracking of Population
Stability Index (PSI), Rank Ordering Test, and Kolmogorov-Smirnov Statistic
(KS). As part of an internship program at SunTrust bank, I was able to track
these key metrics for one business critical model. The model uses a logistic
regression to predict the probability of default for a given customer.
To track the three metrics stated above, data from quarter 1 of 2014 is
compared with a baseline distribution, generally the dataset which is used to
create the model. PSI quantifies the shift in the distribution of the
population between the baseline and current time periods. Rank Ordering Testing
involves comparing the expected default rate, predicted by the model, to the
actual default rate in the current quarter. The KS statistic assesses model
performance by measuring the model's ability to discern defaults from
non-defaults. The npar1way procedure was used in SAS to calculate KS. Reports
and charts presented in this poster will be sanitized due to the confidential
nature of the data, but methodology and step-by-step procedures represent
actual research results.
Analysis of the Impact of Federal Funds Rate Change on US Treasuries Returns using SAS
Svetlana Gavrilova and Maxim Terekhov
BKF-181
This paper analyzes the impact of federal funds rate changes on government bond
returns and return volatility and compares it with equities market reaction. The
purpose of this work is to construct a model estimating an expected risk
exposure at a hypothetical point of time in the future given a description of
current market conditions and historically observed events, which can be
helpful in explaining expected movements and predicting future bond prices and
volatility to use by portfolio managers in choosing asset allocation. We
identify to which extent there is an impact of rate changes and the length of
its effects. For our analysis we use data on major government bond prices and
major macroeconomic characteristics of the U.S. economy between February 1990
and June 2015, collected on a daily basis. We model and forecast expected
returns using ARIMA modeling based on different scenarios. Vector
Autoregression and Vector Error Correction modeling is applied to estimate the
impact of rate changes on government bonds performance and volatility. Credit
markets behavior is compared to the equities market reaction. Findings are
consistent with the previously published papers. US treasuries positively react
on Federal Funds rate change, while equities market demonstrates a negative
reaction. Long-term relationship between US Treasuries markets and Federal
Funds rate is identified. The fact that a change in US treasuries market may be
Granger caused by a change in Federal Funds target rate is statistically
proved. All estimations are performed using SAS software.
Regulatory Stress Testing—A Manageable Process with SAS®
Wei Chen
BKF-195
As a consequence of the financial crisis, banks are required to stress test
their balance sheet and earnings based on prescribed macroeconomic scenarios. In the US,
this exercise is known as the Comprehensive Capital Analysis
and Review (CCAR) or Dodd-Frank Act Stress Testing (DFAST). In order to assess
capital adequacy under these stress scenarios, banks need a unified view of
their projected balance sheet, incomes, and losses. In addition, the bar for
these regulatory stress test is very high regarding governance and overall
infrastructure. Regulators and auditors want to ensure that the granularity and
quality of data, model methodology, and assumptions reflect the complexity of
the banks. This calls for close internal collaboration and information sharing
across business lines, risk management, and finance. Currently, this process is
managed in an ad hoc, manual fashion. Results are aggregated from various lines
of business using spreadsheets and Microsoft SharePoint. Although the
spreadsheet option provides flexibility, it brings ambiguity into the process
and makes the process error prone and inefficient. This paper introduces a new
SAS® stress testing solution that can help banks define, orchestrate and
streamline the stress-testing process for easier traceability, auditability,
and reproducibility. The integrated platform provides greater control,
efficiency, and transparency to the CCAR process. This will enable banks to
focus on more value-added analysis such as scenario exploration, sensitivity
analysis, capital planning and management, and model dependencies. Lastly, the
solution was designed to leverage existing in-house platforms that banks may
already have in place.
Building Blocks
Point-and-Click Programming Using SAS® Enterprise Guide®
Kirk Paul Lafler and Mira Shapiro
BB-14
SAS® Enterprise Guide® (EG) empowers organizations with all the capabilities
that SAS has to offer. Programmers, business analysts, statisticians and
end-users have a powerful graphical user interface (GUI) with built-in wizards
to perform reporting and analytical tasks, access to multi-platform enterprise
data sources, deliver data and results to a variety of mediums and outlets,
construct data manipulations without the need to learn complex coding
constructs, and support data management and documentation requirements. Attendees
learn how to use the GUI to access tab-delimited and Excel input
files; subset and summarize data; join two or more tables together; flexibly
export results to HTML, PDF and Excel; and visually manage projects using
flowcharts and diagrams.
A SURVEY OF SOME USEFUL SAS FUNCTIONS
Ron Cody
BB-193
SAS Functions provide amazing power to your DATA step programming. Some of
these functions are essential—some of them save you writing volumes of
unnecessary code. This talk covers some of the most useful SAS functions. Some of these functions
may be new to you and they will change the way you
program and approach common programming tasks. The majority of the functions described in this talk work with
character data. There are functions that search for strings, others that can find and replace
strings or join strings together. Still others that can measure the spelling
distance between two strings (useful for "fuzzy" matching). Some of the newest
and most amazing functions are not functions at all, but call routines. Did
you know that you can sort values within an observation? Did you know that not
only can you identify the largest or smallest value in a list of variables, but
you can identify the second or third or nth largest of smallest value? A knowledge
of the functions described here will make you a much better SAS
programmer.
Tales from the Help Desk 6: Solutions to common SAS ® tasks
Bruce Gilsen
BB-72
In 30 years as a SAS ® consultant at the Federal Reserve Board, questions
about some common SAS tasks seem to surface again and again. This paper
collects some of these common questions, and provides code to resolve them. The following tasks are reviewed.
- Convert a variable from character to numeric or vice versa and keep the same
name.
- Convert multiple variables from character to numeric or vice versa and keep
the same names.
- Convert character or numeric values to SAS date values.
- Use a data set when the custom format assigned to a variable cannot be
found.
- Use an array definition in multiple DATA steps.
- Use values of a variable in a data set throughout a DATA step by copying the
values into a temporary array.
- Use values of multiple variables in a data set throughout a DATA step by
copying the values into a 2-dimensional temporary array.
In the context of discussing these tasks, the paper provides details about SAS
system processing that can help users employ the SAS system more effectively. This paper is the sixth of its type
Using the SAS Hash Object with Duplicate Key Entries
Paul Dorfman
BB-94
By default, the SAS hash object permits only entries whose keys, defined in its
data portion, are unique. While in certain programming applications this is a
rather utile feature, there also others, where being able to insert and
manipulate entries with duplicate keys is imperative. Such an ability,
facilitated in SAS since Version 9.2, was a welcome development: it vastly
expanded the functionality of the hash object and eliminated the necessity to
work around the distinct-key limitation using custom code. However, nothing
comes without a price; and the ability of the hash object to store duplicate
key entries is no exception. In particular, additional hash object methods had
to be - and were - developed to handle specific entries sharing the same key. The extra price
is that using these methods is surely not quite as
straightforward as the simple corresponding operations on distinct-key tables,
and the documentation alone is a rather poor help for making them work in
practice. Rather extensive experimentation and investigative coding is
necessary to make that happen. This paper is a result of such endeavor, and
hopefully, it will save those who delve into it a good deal of time and
frustration.
Introduction to SAS® Data Loader: The Power of Data Transformation in Hadoop
Keith Renison
BB-199
SAS Model Manager provides an easy way for deploying analytical models to
various types of relational databases and to a Hadoop Distributed File
System. There are two publishing methods that can be used: scoring functions
and the SAS®Embedded Process. This paper gives a brief introduction of both
the SAS® Model Manager publishing functionality and the SAS® Scoring
Accelerator. It describes the major differences between using the scoring
function and the SAS Embedded Process publish methods to publish a model. The
paper also explains how to use SAS applications as well as SQL code outside of
SAS® to perform in-database processing of a published model. Along with
Hadoop, the supported databases are Teradata, Oracle, Netezza, DB2, and SAP
HANA. Samples are provided for publishing a model in one of the supported
databases and Hadoop. After reading this paper, you should feel comfortable
using a published model in your business environment.
A Beginner’s Babblefish: Basic Skills for Translation Between R and SAS®
Sarah Woodruff
BB-90
SAS professionals invest time and energy in improving their fluency with the
broad range of capabilities SAS software has to offer. However, the computer
programming field is not limited to SAS alone and it behooves the professional
to be well rounded in his or her skill sets. One of the most interesting
contenders in the field of analytics is the open source R software. Due to its
range of applications and the fact that it is free, more organizations are
considering how to incorporate it into their operations and many people are
already seeing its use incorporated into project requirements. As such, it is
now common to need to move code between R and SAS, a process which is not
inherently seamless.
This paper serves as a basic tutorial on some of the most critical functions in
R and shows their parallel in SAS to aid in the translation process between the
two software packages. A brief history of R is covered followed by information
on the basic structure and syntax of the language. This is followed by the
foundational skill involved in importing data and establishing R data sets. Next,
some common reporting and graphing strategies are explored with
additional coverage on creating data sets that can be saved, as well as how to
export files in various formats. By having the R and SAS code together in the
same place, this tutorial serves as a reference that a beginner can follow to
gain confidence and familiarity when moving between the two.
Sampling in SAS using PROC SURVEYSELECT
Rachael Becker and Drew Doyle
BB-129
This paper examines various sampling options that are available in SAS through
PROC SURVEYSELECT. We will not be covering all of the possible sampling methods
or options that SURVEYSELECT features. Instead, we will look at Simple Random
Sampling, Stratified Random Sampling, Cluster Sampling, Systematic Sampling,
and Sequential Random Sampling.
Hash: Is it always the best solution?
David Izrael and Elizabeth Axelrod
BB-75
When you get a new hammer, everything looks like a nail. That’s how we felt
about the Hash object when we started to use it: Wow, this is fantastic - It
can solve everything! But… we soon learn that everything is not a nail, and
sometimes a hammer is not the best tool for the job. In SAS Version 9, direct
addressing with the Hash object was introduced, and this enabled users to
perform look-ups much faster than traditional methods of joining or merging.
Even beyond look-ups, we can now use the HASH object for summation, splitting
files, array sorting, and fuzzy matching - just to name a few. What an
all-purpose hammer! But… is it always the best tool to use?
After a brief
review of basic HASH syntax, we will pose some problems and provide several
solutions, using both the Hash object and more traditional methods. We will
compare real- and CPU–time, as well as programmer’s time to develop the
respective programs. Which is the better tool for the job? Our
recommendations will be revealed through our results.
Table Lookups: Getting Started With Proc Format®
John Cohen
BB-144
Table lookups are among the coolest tricks you can add to your SAS® toolkit. Unfortunately,
these techniques can be intimidating both conceptually and in
terms of the programming. We will introduce one of the simplest of these
techniques, employing Proc Format and the CNTLIN option as part of our
construct. With any luck, this will prove both easy-enough to program and more
efficient to run.
No FREQ-in Way
Renee Canfield
BB-105
In the consumer credit industry, privacy is key and the scrutiny increases
every day. When returning files to a client, they must be depersonalized so the
client cannot match back to any personally identifiable identification (PII). This means we must
locate any values for a variable that occur on a limited
number of records and null them out (i.e. replace them with missing values). Working with large
files which have more than one million observations and
thousands of variables, locating variables with few unique values is a
difficult task. While PROC FREQ and DATA step merging can accomplish the task,
using first./last. by variable processing to locate the suspect values and hash
objects to merge the data set back together may offer increased efficiency.
Better Metadata Through SAS® II: %SYSFUNC, PROC DATASETS, and Dictionary Tables
Louise Hadden
BB-57
SAS® provides a wealth of resources for users to create useful, attractive
metadata tables, including PROC CONTENTS listing output (to ODS destinations),
the PROC CONTENTS OUT= SAS data set, and PROC CONTENTS ODS Output Objects. This
paper and presentation explore some less well-known resources to create
metadata such as %SYSFUNC, PROC DATASETS, Dictionary Tables, SASHELP views, and
SAS "V" functions. All these options will be explored with an eye towards
exploring, enhancing, and reporting on SAS metadata.
Don’t Forget About Small Data
Lisa Eckler
BB-168
Beginning in the world of data analytics and eventually flowing into mainstream
media, we are seeing a lot about Big Data and how it can influence our work and
our lives. Through examples, this paper will explore how Small Data -- which
is everything Big Data is not -- can and should influence our programming
efforts. The ease with which we can read and manipulate data from different
formats into usable tables in SAS® makes using data to manage data very simple
and supports healthy and efficient practices. This paper will explore how
using small or summarized data can help to organize and track program
development, simplify coding and optimize code.
To Macro or Not... that is the Question
Claudine Lougee
BB-176
Do you need a macro for your program? How do you know if it's worth the time
to create one for your program? This paper will give some
guidelines, based on user experience, if it's worth the time to create a macro
whether it's parameter driven macro or just a simple macro variable. Extra tips
and tricks for using system macros will be provided. This paper is geared
towards new users and maybe experienced users who do not use macros.
Arrays – Data Step Efficiency
Harry Droogendyk
BB-157
Arrays are a facility common to many programming languages, useful for
programming efficiency. SAS® data step arrays have a number of unique
characteristics that make them especially useful in enhancing your coding
productivity. This presentation will provide a useful tutorial on the rationale
for arrays and their definition and use.
PROC TRANSPOSE: Flip your Data 90o and Save Time
Rachel Straney
BB-135
The process of transforming data from a vertical to horizontal structure is
sometimes referred to as long-to-wide conversion, and is common in the
analytical world. Although there is always more than one way to accomplish a
task using SAS®, PROC TRANSPOSE is a staple procedure that should be in every
programmer’s tool box. This paper will guide the reader through some basic
examples of PROC TRANSPOSE and share situations where it is most appropriately
used.
Hash Objects for Everyone
Jack Hall
BB-83
The introduction of Hash Objects into the SAS toolbag gives programmers a
powerful way to improve performance, especially when JOINing a large data set
with a small one. This presentation will focus on the basics of creating and using a simple hash
object, using an example from the Healthcare Insurance sector.
Coder's Corner
Implementing a Bayesian Approach to Record Linkage
Lynn Imel and Thomas Mule
CC-41
The Census Coverage Measurement survey-based program estimated household
population coverage of the 2010 Decennial Census. Calculating coverage
estimates required linking survey person data to census enumerations. For
record linkage research, we applied a Bayesian Latent Class Models approach to
both 2010 coverage survey data and simulated household data. This paper
presents our use of Base SAS® to implement the Bayesian approach. It also
discusses coding adaptations to handle changes including removing hard-coded
variable names to allow for varying input parameters.
RUN; RUN; RUN; - Methods for Running Multiple Programs in a Series
Robert Matthews
CC-12
Anyone who has ever had to run a series of programs multiple times in a row has
probably thought about ways to automate the process. For example, if you have
25 programs that need to be run one after another, the normal method would be
to run the first program, wait for it to finish, then submit the next one, and
so on. If you have the ability to run multiple SAS sessions, then you can speed
up the process a bit by submitting programs in each session. However, it still
takes some time and effort to monitor the programs, wait for each one to
finish, and then submit the next program in the series. We encountered this
issue several years ago and have developed two methods for implementing a
“hands-off” approach for submitting a series of programs. Some additional
features we implemented include the ability to either stop or continue
processing the remaining programs if an individual program in the series
encounters an error as well as the ability to send email messages after
individual programs in the series have been run. These methods greatly reduce
the need for manual intervention when running a long series of programs and
help alleviate an otherwise laborious, and sometimes error-prone process.
Running Parts of a SAS Program while Preserving the Entire Program
Stephen Sloan
CC-33
We often have the need to execute only parts of a SAS program, while at the
same time preserving the entire program for documentation or for future use. This occurs when
parts of the program have been run, when different
circumstances require different parts of a program, or when only subsets of the
output are required.
There are different ways in which parts of a program can be run while
preserving the entire program:
- %INCLUDE statements to call multiple programs from within a shell SAS program
- Using external shell programs in the operating system (like shell scripts in
Unix)
- Using macros to deactivate code
- Using %LET statements to indicate to macros which parts of the program should
be run
- Commenting out parts of the program
- Using SAS EG to only submit parts of a program interactively
- A combination of the above techniques.
Merging and Analysis of Complex Survey Data Sets by using Proc Survey Procedures in SAS
Nushrat Alam
CC-116
This paper is focused on merging and analysis of the complex
survey data sets. The sample design of any complex survey data is consists of
stratification, clustering, multi-stage sampling, and unequal probability of
selection of observations.This paper provides an outline of merging of
different complex survey datasets and the use multiple SAS procedures
like PROC SURVEYMEANS, PROC SURVEYFREQ, PROC SURVEYREG to
analyze different variables.
A Macro To Copy, Rename, Update and Move Your SAS Programs- All In One Click
Julio Ruiz
CC-154
As SAS programmers we often have to copy, rename, update, and/or move SAS
programs from one directory to another. Performing these tasks can be
time-consuming, particularly when two or more of them need to be performed
manually. This paper presents a macro developed in SAS that gives the end-user
the ability to programmatically accomplish any of these tasks with one simple
click. The macro's main goal is to offer the end-user the ability
to save time, as it can perform any or all of these tasks in a matter of
seconds.
Count the Number Of Delimiters In the Input File
Kannan Deivasigamani
CC-26
Many of us are faced at times with surprise by the presence (or even absence)
of an additional (missing) variable in the incoming file (file from the sender)
due to the appearance of an additional delimiter as part of the data. For
example, if an address variable requires a parsing and unintentionally, a user
inputs a pipe ("|") as part of the address, it can pose a problem if a pipe
delimited file is created. If 5 delimiters are expected on every record, the
record with the special address will have 6 delimiters if not intervened and
cleansed before being written to the file. On the receiving end, if the file is just read with the usual DLM='|’ option,
a shift in values will be noticed in the variables after the address. In order
to mitigate this situation, a small snippet of code to interrogate each record
and ensure that all records have the same number of variables (delimiters) as
expected on the receiving end can help. If a record was received with
unexpected delimiter count, then the process is halted and the support
personnel can be alerted. This will give some peace of mind to the recipient
assuring a quality check from a variable count perspective.
In addition to preventing any erroneous processing, timely alert might save
other EUC (End User Computing) related costs in the organization as well. The
file may be fixed and resent before another audit can screen through and
release the file for further processing by other jobs/programs/scripts. A rough
example (code) of the delimiter audit is included to show how it might be
applied to mitigate the issue. The subsequent processing may be handled by the
respective job schedulers used in different mainframe (or other) shops with
appropriate controls in place as needed.
Tips for Identifying Patient Characteristics Associated with Diagnostic Concordance between Two Measures Using SAS®
Seungyoung Hwang
CC-37
Sensitivity, specificity, and positive and negative predictive values are often
used in validation studies. However, few have examined what patient
characteristics are associated with diagnostic concordance between two measures
of interest. This paper provides an in-depth analysis, with some explanation
of the SAS® code, to identify sociodemographic and clinical characteristics
associated with diagnostic concordance between two measures of depression using
SAS®. Examples of using the GLIMMIX procedure are drawn from clinical data
that we recently published in the American Journal of Geriatric Psychiatry.
PROC CATALOG, the Wish Book SAS® Procedure
Louise Hadden
CC-58
SAS® data sets have PROC DATASETS, and SAS catalogs have PROC CATALOG. Michael
Raithel characterizes PROC DATASETS as the “Swiss Army Knife of SAS
Procedures” (Raithel, 2011). PROC DATASETS can do an amazing array of
tasks relating to SAS data sets; PROC CATALOG is a similar, utilitarian procedure. It
is handy (like a Leatherman® tool!) itself, and in conjunction with other SAS procedures can be very helpful in managing the special SAS files that are
SAS catalogs. Find out what the little known PROC CATALOG can do for you!
Document and Enhance Your SAS(R) Code, Data Sets, and Catalogs with SAS Functions, Macros and SAS Metadata
Louise Hadden and Roberta Glassl
CC-59
Discover how to document your SAS programs, data sets and catalogs with a few
lines of code that include SAS functions, macro code and SAS metadata! Learn how to conditionally process data based on the existence of a file,
variable types, and more! If you have ever wondered who was it that last ran a program
that overwrote your data, SAS has the answer.
Using PROC MEANS to Sum Duration Of Eligibility For Medicaid Beneficiaries
John Wedeles
CC-164
Background:
The Division of Analytics and Policy Research (DAPR) within the District of
Columbia Department of Health Care Finance (DHCF) produces the annual CMS-416
report to document the performance of the Early and Periodic Screening,
Diagnostic and Treatment (EPSDT) benefit for the District’s children under 21
who are enrolled in Medicaid. The report requires the calculation of the total
months of eligibility for all beneficiaries included in the report, as
beneficiaries can have multiple enrollment spans in a given year. Previously,
duration of eligibility was calculated in multiple steps using Microsoft Excel,
including IF functions and pivot tables. DAPR sought to streamline the
calculation of eligibility duration using SAS.
Methods: In SAS, binary variables were created for each month in the period of interest
as indicators for eligibility, based on monthly enrollment dates. The values
for each of these binary variables were then vertically summed by beneficiary
Medicaid number using PROC MEANS. This step created a new data set with a
de-duplicated list of Medicaid beneficiaries, and included a new variable
representing the count of the months of eligibility for each beneficiary. A new
variable measuring the total months of eligibility was then created, which
captured the sum of the variables created in the previous step.
Results: The use of PROC MEANS allowed DAPR to account for multiple enrollment spans for
Medicaid beneficiaries in the reporting year. DAPR was also able to use the
summary variables to determine 90-day continuous eligibility, which is a
requirement for inclusion in the denominator for several key measures of the
CMS-416 report.
Conclusion: The PROC MEANS procedure allowed for more accurate and efficient calculation of
beneficiary eligibility data, resulting in streamlined reporting capacities. DAPR has
continued to use the PROC MEANS procedure in several other reports
where calculation of beneficiary eligibility is required.
The COMPRESS Function: Hidden Superpowers
Pamela Reading
CC-138
Most SAS programmers rely on the COMPRESS function for cleaning up troublesome
string data. The many uses of the third ‘modifier’ argument, added in
Version 9, may not be as familiar. This paper will present a quick summary of
the options available and examples of their use. It will conclude with an
unusual application of the ‘keep’ option to reorder characters within a
string.
Past, Present and Future... who KNEW (Knows New Exciting Ways)?
Claudine Lougee
CC-178
Did you ever think, "Someone must have done this before"? If you've ever coded
anything that took some time or was challenging, someone probably has done it
another way. The other way could be different, the same, easier, or slightly
advanced. This paper will provide a list of past, present and future SAS users
and authors who are experienced in writing code and teaching methods through
SAS papers and BBU (Books by Users). This will be valuable for googling SAS
papers and finding the right code for your needs.
All Data Are (Most Likely) Not Created Equal: A SAS® Macro to Compare Structure and Data Across Multiple Datasets
Jason Salemi
CC-40
In nearly every discipline, from Accounting to Zoology, whether you are a
student-in-training or an established professional, a central tenet of
interacting with information is to “Know Thy Data”. Hasty compilation and
analysis of inadequately vetted data can lead to misleading if not erroneous
interpretation, which can have disastrous consequences ranging from business
downfalls to adopting health interventions that worsen rather than improve the
longevity and quality of people’s lives. In some situations, knowing thy data
involves only a single analytic dataset, in which case review of a data
dictionary to explore attributes of the dataset supplemented with univariate
and bivariate statistics will do the trick. This has been discussed extensively
in the literature and certainly in the SAS Global Forum and User’s Groups. In
other scenarios, there is a need for comparing the structure, variables, and
even values of variables across two datasets. Again, in this case, SAS offers a
powerful COMPARE procedure to compare pairs of datasets, and many papers have
offered macros to add additional functionality, refine the comparison, or
simplify the analytic output. However, imagine the following scenario: you are
provided with or download a myriad of datasets, perhaps which are produced
quarterly or annually. Each dataset has a corresponding data dictionary and you
might even be fortunate enough to have been provided with some code to
facilitate importation into SAS. Your initial goal, perhaps a “first date”
with your new datasets, is to understand whether variables exist in every
dataset, whether there are differences in the type or length of each variable,
the absolute and relative missingness of each variable, and whether the actual
values being input for each variable are consistent. This paper describes the
creation and use of a macro, “compareMultipleDS”, to make the first date
with your data a pleasant one. Macro parameters through which the user can
control which comparisons are performed/reported as well as the appearance of
the generated “comparison report” are discussed, and use of the macro is
demonstrated using two case studies that leverage publicly-available data.
The Mystery of Automatic Retain in a SAS Data Step
Huei-Ling Chen and Hong Zhang
CC-34
The data step is the most frequently used programming process in the SAS
System. As programmers we should be very familiar with it. However,
sometimes we write a piece of code, but the output is not our expectation. Is
our code incorrect or are there mysteries inside the data step? This paper will
focus on one of the mysteries - automatic retain in a data step. We will
investigate how variables are automatically retained yet no retain statement is
specified. Examples are provided to demonstrate the pitfalls one can
experience when constructing a data step. Being cautious can avoid unexpected
results. This paper uses a PUT _ALL_ statement to demonstrate how automatic
retain variables can be retained.
Successful Ways to Add and Drop Data, While Also Reformatting Data
Adetosoye Oladokun
CC-192
For my project my goal is to go through the process of explaining how to write
codes in SAS 9.4. The main codes of focus for this project will be how to drop
variables, and reformat variables. Besides that there will be codes that
discuss how I uploaded my data set, created output for my data set and also how
I used various frequency tables. I have highlighted the areas that contain
codes, procedures statements, log statements, and output statements. To
differentiate between the highlighted areas, I will put the heading in bold
letters. I will also provide a brief explanation so that will act as a better
guide.
29 Shades of Missing
Darryl Putnam
CC-106
Missing values can have many flavors of missingness in your data and
understanding these flavors of missingness can shed light on your data and
analysis. SAS® can identify 29 flavors of missing data, and a variety of
functions, statements, procedures, and options can be used to whip your missing
data into submission. This paper will focus solely on how SAS can use missing
values in unique and insightful ways.
Date Dimensiong
Christopher Johnson
CC-10
Intuition would suggest that it is more efficient to perform simple
calculations as needed than to store calculations in a table for reference. However, in some circumstances, creating lookup tables can save both programmer and CPU time. Dates present a particular difficulty in any programming
language. This paper will present a data structure that can simplify date
manipulations while gaining efficiency.
Delivering Quarterly Reporting Every Month – A Departure From the Traditional Calendar Definition, Using Formats
Barbara Moss and Anna Flynn
CC-140
What’s a quarter to you? Feeling constrained by the typical calendar
quarters of JAN thru MAR, APR thru JUN, JUL thru SEP and OCT thru DEC?
Quarterly trending applications need each date grouping to contain three months
of data. Running quarterly does not provide data frequently enough. Running
monthly, using the traditional definition of quarters, leaves some quarters
containing only one or two months of data. This distorts the output of
trending patterns. The requirement is to construct a quarter such that all
data, up to the current month, is used. For example, running in April the
quarters would be defined as MAY thru JUL, AUG thru OCT, NOV thru JAN and FEB
thru APR. This allows the process to run with full quarters of data each and
every month, exceeding data delivery beyond four times a year. Leveraging PROC
FORMAT, this presentation shows how to implement a rolling or shifting
definition of quarters, allowing for quarterly reporting every month!
Using Multilabel Formats in SAS to Analyze Data Over Moving Periods of Time
Christopher Aston
CC-84
The Food Safety Inspection Service collects a plethora of data from all over
the country on a daily basis. Many of the Agency's performance measures that
it uses to identify potential trends and to assess the effectiveness of its
Policies on the Meat and Poultry Industry are based on the most recent 12
months of data. Furthermore, these performance measures are normally assessed
on a monthly or quarterly basis, so that these data are used multiple times in
overlapping windows when we seek to do an analysis of performance over time
(multiple windows). The purpose of this paper is to present the method I
devised to analyze time dependent data that is evaluated as a "moving window,"
i.e. each data point is used multiple times as in overlapping windows, so that
the data are only analyzed one time. This is accomplished specifically using
multilabel formats in SAS to assign specific dates to more than one "period."
Where do the titles or footnotes go when using PROC SGPLOT in ODS PDF?
Julie Liu
CC-124
Normally people would think titles and/or footnotes are shown in the PROC
SGPLOT graphs by default, provided that they are not turned off. However, when
using this procedure with the ODS PDF statement, it is surprisingly untrue. Also placing titles or footnotes before or after PROC SGPLOT would show
different results. Finally, by using ODS LAYOUT, magically those titles or
footnotes pop out in the output. This presentation will use examples to
demonstrate the effects.
Beautiful PROC CONTENTS Output Using the ODS Excel Destination
Suzanne Dorinski
CC-76
A member of the Census Bureau’s in-house SAS® users group asked how to
export the output of PROC CONTENTS (variable name, type, length, and format)
from several Oracle database tables within the same database to separate
worksheets in an Excel file. You can use the _all_ keyword to work with all
the tables within a library. The ODS Excel destination, which is production in
SAS 9.4 maintenance release 3, displays the output beautifully.
Having a Mean Headache on Summary Data? Using SAS to Compute Actual Data from Summary Data
William Zupko
CC-88
SAS programmers might not get to choose how their data is formatted. It is
very easy to take raw data and provide descriptive statistics on data that has
not been modified. Unfortunately, sometimes raw data is unavailable and only
the summary data can be provided. One of the trickiest problems occurs with
this summary data, as SAS has difficulty breaking data from one line into many. Since many
of the functions SAS would perform need the actual data, such as
proc means, performing descriptive statistics on summary data can at best
provide misleading data or, at worst, completely incorrect data. This paper
describes how to create data sets from summary data into simulated raw data
sets, which allow accurate descriptive statistics on each single variable. The
following SAS code will be utilized and/or discussed: proc means, DATA step, do
loops, and the output statement.
The %LetPut Macro, and Other Proactive Macro Debugging Techniques
Shane Rosanbalm
CC-121
Macro debugging can sometimes be difficult. Having ready access to the values
of local macro variables is often quite helpful. This paper will introduce a
simple macro, %LetPut, to assist in the displaying of the values of macro
variables to the log. Other minimally invasive techniques for generating
helpful messages in the log will also be presented.
IA_CTT: A SAS® Macro for Conducting Item Analysis Using the Classical Test Theory
Yi-Hsin Chen
CC-184
Item analysis helps identify problems with test items when a bank of items that
will be used continually is being developed. These problems can be corrected,
resulting in a better test and better measurement. Item analysis is also a
useful tool anytime when students complain about items. Even though more
advanced psychometric models, such as item response theory or cognitive
diagnostic models, have been widely applied, item analysis based on the
classical test theory is still very often employed by researchers and
practitioners because of its conceptual simplicity. This paper provides a SAS®
macro, called IA_CTT, for conducting item analysis using the classical test
theory. Item analysis from this macro will yields the information including
test score statistics (e.g., mean, median, mode, Q1, Q3, standard deviation,
skewness, kurtosis, alpha, standard error of measurement), individual item
statistics (e.g., p-value, point-biserial correlation, corrected point-biserial
correlation, reliability when item deleted, two-top group item discrimination),
frequency distributions of individual options for each item based on overall
samples and two different groups from top 25% (above Q3) and bottom 25% (below
Q1) students (i.e., distractor analysis), and Mantel-Haenszel differential item
functioning statistics. The macro reads in the data file from Microsoft excel
and exports the outputs as excel files. In addition to the macro for item
analysis, this paper also provides the interpretations of all the relevant
statistics. Exemplary outputs are shown and interpreted at the end of the
paper.
Accessing and Extracting Unstructured XML Data using SAS and Python
Sai Mandagondi
CC-188
This paper discusses an approach to dynamically load unstructured XML data
using SAS and Python. When neither the SAS XML mapper nor a custom XML map can
parse the incoming data, using external programs (Shell Scripting and Python)
and integrating results from external programs into a SAS data set is an
efficient alternate. One of the methods to eventually load data into a
database to support upstream reporting and analytics is illustrated.
Because We Can: Using SAS® System Tools to Help Our Less Fortunate Brethren
John Cohen
CC-145
We may be called upon to provide data to developers -- frequently for
production support -- who work in other programming environments. Often
external recipients, they may require files in specific formats and
variable/column order, with proscribed delimiters, file-naming conventions, and
the like. Our goal should be to achieve this as simply as possible, both for
initial development and ease of maintainability. We will take advantage
of several SAS tricks to achieve this goal.
Hands On Workshop
Quick Results with SAS® Enterprise Guide®
Kirk Paul Lafler
How-23
SAS® Enterprise Guide® empowers organizations, programmers, business
analysts, statisticians and end-users with all the capabilities that SAS has to
offer. This hands-on workshop presents the built-in wizards for performing
reporting and analytical tasks, access to multi-platform enterprise data
sources, the delivery of data and results to a variety of mediums and outlets,
data manipulation without the need to learn complex coding constructs, and
support for data management and documentation requirements. Attendees learn
how to use the graphical user interface (GUI) to access tab-delimited and Excel
input files; subset and summarize data; join (or merge) two tables together;
flexibly export results to HTML, PDF and Excel; and visually manage projects
using flowcharts and diagrams.
An Introduction to Perl Regular Expressions
Ron Cody
How-194
Perl regular expressions, implemented in SAS Version 9, provide a way to
perform pattern matching of text strings. This is a new capability to SAS and
is particular useful in reading very unstructured data. You have the ability
to search for text patterns, extract the patterns, or substitute new patterns. Perl regular expressions along with dozens of new character functions, give you
enormous power to read and manipulate character data.
Introduction to ODS Graphics
Chuck Kincaid
How-98
This presentation teaches the audience how to use ODS Graphics. Now part of
Base SAS®, ODS Graphics are a great way to easily create clear graphics that
enable any user to tell their story well. SGPLOT and SGPANEL are two of the
procedures that can be used to produce powerful graphics that used to require a
lot of work. The core of the procedures is explained, as well as some of the
many options available. Furthermore, we explore the ways to combine the
individual statements to make more complex graphics that tell the story better. Any user of Base
SAS on any platform will find great value in the SAS ODS
Graphics procedures.
Intermediate ODS Graphics
Chuck Kincaid
How-99
This paper will build on the knowledge gained in the Intro to SAS® ODS
Graphics. The capabilities in ODS Graphics grow with every release as both new
paradigms and smaller tweaks are introduced. After talking with the ODS
developers, a selection of the many wonderful capabilities was selected. This
paper will look at that selection of both types of capabilities and provide the
reader with more tools for their belt.
Visualization of data is an important part of telling the story seen in the
data. And while the standards and defaults in ODS Graphics are very well done,
sometimes the user has specific nuances for characters in the story or
additional plot lines they want to incorporate. Almost any possibility, from
drama to comedy to mystery, is available in ODS Graphics if you know how. We
will explore tables, annotation and changing attributes, as well as the BLOCK
and BUBBLE plots.
Any user of Base SAS on any platform will find great value from the SAS ODS
Graphics procedures. Some experience with these procedures is assumed, but not
required.
A Tutorial on the SAS® Macro Language
John Cohen
How-152
The SAS® Macro language is another language that rests on top of regular SAS
code. If used properly, it can make programming easier and more fun. However,
not every program is improved by using macros. Furthermore, it is another
language syntax to learn, and can create problems in debugging programs that
are even more entertaining than those offered by regular SAS.
We will discuss using macros as code generators, saving repetitive and tedious
effort, for passing parameters through a program to avoid hard coding values,
and to pass code fragments, thereby making certain tasks easier than using
regular SAS alone. Macros facilitate conditional execution and can be used to
create program modules that can be standardized and re-used throughout your
organization. Finally, macros can help us create interactive systems in the
absence of SAS/AF.
When we are done, you will know the difference between a macro, a macro
variable, a macro statement, and a macro function. We will introduce
interaction between macros and regular SAS language, offer tips on debugging
macros, and discuss SAS macro options.
Applications Development, Theory or Practice
Ronald Fehd
How-96
This Hands-on Workshop
is a case study of the proof-of-concept phase of the
list processing suite Summarize-Each-Variable. Topics covered include
design principles, development strategy,
style guide, naming conventions,
requirements and specifications. List processing consists of two tasks,
making a list which is also called a control data set
where each row is a set of parameter values,
and processing the list which means calling another program
with the parameters of each row. The control data set shown here
is the list of names of variables in a data set. Design principles are reminders to write programs so that they are
readable, reusable, robust and easy to test. Two strategies are shown, bottom-up and top-down.
The style guide emphasizes naming conventions
that are used for the programs and most important,
the data structure, which guarantees the %success
acceptance of the output
described in the specifications.
Purpose:
Students leave the course with a set of small programs
which are a conceptual template
that can be modified to handle other lists
such as data sets, or files to process.
HOW to DoW
Paul Dorfman
How-30
The DoW-loop is a nested, repetitive DATA step structure enabling you to
isolate instructions related to a certain break event before, after, and during
a DO-loop cycle in a naturally logical manner. Readily recognizable in its most
ubiquitous form by the DO UNTIL(LAST.ID) construct, which readily lends itself
to control-break processing of BY-group data, the DoW-loop's nature is more
morphologically diverse and generic. In this workshop, the DoW-loop's logic is
examined via the power of example to reveal its aesthetic beauty and pragmatic
utility. In some industries like Pharma, where flagging BY-group observations
based on in-group conditions is standard fare, the DoW-loop is an ideal vehicle
greatly simplifying the alignment of business logic and SAS code. In this
workshop, the attendees will have an opportunity to investigate the program
control of the DoW-loop step by step using the SAS DATA step debugger and learn
of a range of nifty practical applications of the DoW-loop.
Application Development Techniques Using PROC SQL
Kirk Paul Lafler
How-24
Structured Query Language (SQL) is a database language found in the base-SAS
software. It permits access to data stored in data sets or tables using an
assortment of statements, clauses, options, functions, and other language
constructs. This hands-on workshop (HOW) demonstrates core concepts as well as
SQL’s many applications, and is intended for SAS users who desire an overview
of this exciting procedure’s capabilities. Attendees learn how to construct
SQL queries; create complex queries including inner and outer joins; apply
conditional logic with case expressions; identifying FIRST.row, LAST.row, and
BETWEEN.rows in By-groups; create and use views; and construct simple and
composite indexes.
Pharma & Healthcare
How to Build Study Quality Surveillance for a Clinical Study?
Angela Teng
PH-147
Study Quality Surveillance (SQS) is to provide oversight of the quality of a
study by reviewing and monitoring study data in a blinded fashion. The purpose
of SQS is to determine the critical risks that could affect subject safety,
data quality or compliance so that key issues can be quickly identified early
and prevented from recurring, therefore, to ensure that the study results are
valid and credible. Data errors may be found and noted during the SQS review. If data errors are noted, the logic used to find these errors will be
communicated to Data Management so that Data Management can incorporate new
edit checks in their specifications as appropriate. Also, all outputs will be
blinded and no information will be included that might risk unblinding. This manuscript describes the process of generating SQS outputs and key
components of a SQS report. In addition, it provides detailed examples of SQS
figures that facilitate data review.
Multilevel Randomization
Lois Lynn and Marina Komaroff
PH-28
Randomization in clinical trials is essential for the success and validity of a
study. PROC PLAN is an important SAS® procedure that generates randomization
schedules for variety of experimental designs. This procedure was developed for
the major types of randomization like simple, block and stratified
randomization where the latter controls and balances the influence of
covariates. In addition to SAS® documentation, multiple papers were written to
explain how to adapt and enhance the procedure with DATA steps and/or PROC
FORMAT.
Clinical research in transdermal medicine introduces the situation where a
multilevel randomization is required for levels like treatment, location (arm,
thigh, back, etc.) and side (left, right, upper, center, etc.) of a patch
application while retaining balance at each level and combination of levels. Schedules get especially complicated for cross-over studies where location and
side of patch application needs to be rotated by period and balanced as well. To the authors’ knowledge,
there are no published papers to accommodate these
requirements.
This paper introduces a novel concept of multilevel randomization, provides SAS
code utilizing PROC PLAN, and a few examples with increasing complexity to
generate balanced multilevel randomization schedules. The authors are convinced
that this paper will be useful to SAS-friendly researchers conducting similar
studies that require multilevel randomization.
A Descriptive Analysis of Reported Health Issues in Rural Jamaica
Verlin Joseph
PH-107
During spring break, I accompanied a medical missions team to two of the most
remote areas in Jamaica. While in Jamaica, my team and I established clinics
to treat a variety of health issues. This paper will illustrate how I used SAS
to produce a descriptive analysis report on various issues we treated.
Use of SAS and SAS Bridge for ESRI in the Study of Spatial Patterns of Children with Special Health Needs and Providers in Oklahoma
Ram Poudel, Maleeha Shahid, Mark Wolraich and Jennifer Lockhart
PH-126
It is speculated that service navigation could improve quality of care while
limiting costs of chronic conditions or diagnoses management for youth and
children with special health needs (CYSHCNs). While CYSHCNs in the state of
Oklahoma are in need of a well-functioning system to coordinate a wide variety
of health care services, dental care has been identified as a unique unmet need
in this population. From Community Needs Assessment, 2014 we found “Not
available in my county at all” as the top most barrier to the most needed
service needs. Therefore, we picked dental care as an example to assess the
distance and drive time, calculated to travel to this particular service
need. The aim of this study is to map the spatial distribution of CYSHCNs and
providers, to assess the type and distribution of diagnoses or chronic
conditions these children have and to identify the appropriate method to
calculate the distance and drive time from the zip-code of the family to the
providers. We used SAS and SAS Bridge for ESRI to find the spatial patterns of
CYSHCNs as well as to detect the clusters of dental needs. Comparing to
national data (18.4%) Oklahoma has 2.5 times more children (44.9%) with special
health needs who have 4 or more conditions reported. Only few counties
including two metros (14) have Sooner Care Pediatric Dentists. More discrepancy
is found in between urban and rural counties in terms of Sooner Care pediatric
dentists and children with special health needs. The MACRO can be used to
determine distance and driving time from counties or zip-codes to service
providers. There are some clusters of counties with dental needs. This cluster
detection method can be used to other needs and in other states too. More
analyses will be done to assess if some of these clusters may be partly
explained by socio-demographic and policy factors.
ANOVA_Robust: A SAS® Macro for Various Robust Approaches to Testing Mean Differences in One-Factor ANOVA Models
Thanh Pham, Eun Sook Kim, Diep Nguyen, Yan Wang, Jeffrey Kromrey and Yi-Hsin Chen
PH-134
Testing the equality of several independent group means is a common statistical
problem in social sciences. The traditional analysis of variance (ANOVA) is one
of the most popular methods. However, ANOVA F test is sensitive to the
violation of the homogeneity of variance assumption. Several alternative tests
have been developed in response to this problem of ANOVA F test. These tests
might be a modification of ANOVA F test based on Structured Means Modeling
technique. This paper provides a SAS macro for testing the equality of group
means using thirteen different methods including regular ANOVA F test. In
addition, this paper provides the results of simulation study to compare the
performance of these tests in terms of their Type I error rate and statistical
power under different conditions, especially, under the violation of
homogeneity variance assumption.
Same Question, Different Name: How to Merge Responses
Michelle Dahnke and Tyra Dark
PH-163
To correctly use data from the Collaborative Psychiatric Epidemiology Surveys,
which joins three individual surveys, users may need to evaluate the
cross-survey linking and merge responses. While the codebook identifies the
variable name assigned to each question, in some instances the same questions
were assigned different names in the three surveys. For example, a question
about being diagnosed with high blood pressure was named V04052 and V06677
depending on the survey. This paper demonstrates how to merge response data in
circumstances such an this so the user can conduct analysis on the maximum
number of valid responses.
Creating Quartiles from Continuous Responses: Making Income Data Manageable
Michelle Dahnke and Tyra Dark
PH-162
It is customary to collect data at the most granular level; however sometimes
that requires consolidating responses in categories before using them in
advanced analysis. For example, in the Collaborative Psychiatric Epidemiology
Surveys, the household income variable is a continuous variable with individual
responses ranging from 0 to $200,000. Working with the data may require
categorization so that the data is more manageable—like in quartiles. This
paper walks through how to create household income quartiles from free response
data, an important fundamental skill, in working with large data sets.
Using SAS Programing to Identify Super-utilizers and Improve Healthcare Services
An-Tsun Huang
PH-170
Introduction:
Improving public health, enhancing the quality of healthcare services, and
reducing unnecessary costs are important healthcare issues. Super-utilizers
are the small subset of population who account for the highest utilization of
healthcare services. The purpose of this study is to combine inpatient stays
(IS) and emergency-department visits (EV) to identify super-utilizers in the
Medicaid population, in order to enhance the quality of healthcare, decrease
Medicaid costs, and improve healthcare management systems.
Methods:
Medicaid claims data with dates of service in fiscal year 2014 were used to
create 16 scenarios of combined IS and EV. These scenarios represent 16
interactions between four IS groups and four EV groups. Among them, high
counts of IS and EV (IS ≥2 and EV ≥3) are considered as high utilization of
healthcare services. Super-utilizers are beneficiaries under the condition: IS
≥4 and EV ≥6. First, based on management/payment systems, Medicaid
beneficiaries were classified into two groups: managed care organization
(MCO)
enrollees and fee-for-service (FFS) beneficiaries. Second, PROC SQL was used
to count the number of IS and EV services for each beneficiary. Subsequently,
IF statements were used to create dummy variables to categorize IS and EV
counts into four groups, respectively, and then to categorize combined IS and
EV counts into 16 sub-groups. Afterwards, PROC SQL and PROC TABULATE were used
to obtain numbers of beneficiaries and Medicaid costs for each scenario. Lastly, PROC
FREQ was used to identify top three diseases in each scenario.
Results:
MCO super-utilizers account for 0.1% of MCO enrollees and 4.0% of MCO
expenditures. FFS super-utilizers account for 0.8% of FFS beneficiaries and
9.8% of FFS expenditures.
Conclusion:
This method is timely, especially after Affordable Care Act was launched in
2010. It could facilitate governments, healthcare industries, and researchers
to evaluate costs, performance of healthcare services, and improvement of
public health.
Prescription Opioid Use in the U.S. in 2012: Characterizing Sustained vs. Infrequent Use Using the Medical Expenditure Panel Survey
Monika Lemke
PH-182
Background/Objectives: Opioid has been declared an epidemic as trends of use,
abuse, addiction and overdose related deaths have increased. This study
provides a detailed portrait of opioid exposure in the United States and
characterizes subpopulations with varying levels of exposure.
Methods: A secondary analysis of the nationally representative Medical
Expenditure Panel Survey examines self-reported prescription opioid exposure in
US adults 18 years and older in 2012. Opioid users are divided into categories
based on use duration and drug DEA Schedule: infrequent (< 30 day supply or one
prescription), sustained (Narcotic Analgesics (Schedule II, 30-89 days and
Non-Schedule II, >30 days), Narcotic Analgesic Combinations (Schedule II and
Non-Schedule II, >30 day)), and intensive (Narcotic Analgesics (Schedule II,
90-day supply or more)). Socio-demographic factors such as sex, age, race,
census region, family income, insurance coverage, education, and BMI were
investigated.
Results: According to our estimates, 14.5% of the US adult population reported
opioid prescriptions in 2012, or about 21.5% of the US adult population with
any medication. Among opioid users, 62.8% were infrequent users, 30.6%
sustained users, and 6.5% intensive users. The mean total day supply was 8 days
(Standard Error 0.2) among infrequent users, 176 days (SE 9) among sustained
users, and 353 days (SE 13) among intensive users. Adults 65-85 years old (Odds
Ratio 6.7, 95% CI 3.7-12.0, p-value < 0.0001), those at less than 100% of the
Federal Poverty Level (OR 2.6, 95% CI 1.9-3.7, p-value < 0.0001), and those
with public insurance coverage (OR 1.5, 95% CI 1.2-1.9, p-value = 0.0013) were
more likely to be in a higher use group.
Conclusions: A significant proportion of individuals who reported an opioid
prescription in 2012 received a supply of 30 days or less and have the lowest
risk of dependency. The subgroup of individuals who received a supply of 90
days or more of high risk opioids needs to be better understood in order to
avoid adverse outcomes in this risk group.
Planning, Support, and Administration
Handling Sticky Situations - The Paper
Stephanie Thompson
PA-137
Have you ever been asked to perform an analysis where you were presented with
the expected outcome? What about analyzing personnel data to help with salary
negotiations or promotions? Report on metrics that can’t be measured? These
and other scenarios will be discussed in an effort to help you, as an analyst,
perform your task but also to make it meaningful and ethical.
SAS In-Database Decision Management for Teradata: Making the Best Decisions Possible
Tho Nguyen and William E Benjamin Jr
PA-202
We all make tactical and strategic decisions every day. With the presence of
big data, are we making the right or the best decisions possible as data
volume, velocity and variety continue to grow? As businesses become more
targeted, personalized and public, it is imperative to make precise data-driven
decisions for regulatory compliance and risk management. Come learn how SAS
In-Database Decision Management for Teradata can help you make the best
decision possible by integrating SAS and Teradata.
Tips and Tricks for Organizing and Administering Metadata
Michael Sadof
PA-183
The SAS® Management Console was designed to control and monitor virtually all
of the parts and features of the SAS Intelligence Platform. However,
administering even a small SAS Business Intelligence system can be a daunting
task. This paper will present a few techniques that will help you simplify
your administrative tasks and enable you and your user community to get the
most out of your system. The SAS Metadata server stores most of the
information required to maintain and run the SAS Intelligence Platform which is
obviously the heart of SAS BI. It stores information about libraries,
users, database logons, passwords, stored processes, reports, OLAP cubes and a
myriad of other information. Organization of this metadata is an essential
part of an optimally performing system. This paper will discuss ways of
organizing the
metadata to serve you organization well. It will also discuss some of the key
features of the SMC and best practices that will assist the administrator in
defining roles, promoting, archiving, backing up, securing, and simply just
organizing the data so it can be found and accessed easily by administrators
and users alike.
UCF SAS® Visual Analytics: Implementation, Usage, and Performance
Scott Milbuta, Carlos Piemonti and Ulf Borjesson
PA-187
At the University of Central Florida (UCF) we recently invested in SAS Visual
Analytics (VA) along with the updated SAS® Business Intelligence (BI) platform
(from 9.2 to 9.4), a project that took over a year to be completed, in order to
give our users the best and most updated tools available.
This paper introduces our SAS VA environment at UCF and it includes projects
created using this tool and also answers why we are choosing it for the
development over other SAS applications available.
It also explains the technical environment for our non-distributed SAS VA: ram,
servers, benchmarking, sizing and scaling, and why we chose this mode instead
of a distributed SAS VA environment.
Challenges in the design, implementation, usage, and performance are also being
presented, including the reasons why Hadoop has not been adopted.
Tips and Tricks for Introductory Workshops in SAS for Health Professionals
Jason Brinkley
PA-62
It can sometimes be the case that general health professionals need some basic
SAS training in order to effectively create simple reports and manipulate
incoming data. The presenter will share his experiences in leading SAS
Workshops Series in a university setting across the course of several years. Heading a team of university faculty members, the presenter has designed, implemented, and refined short term SAS overview training for general health
professionals. While multiple topics have been discussed in these workshops,
some have fared better with a general health professional audience than others. Topics will include tips on introducing code based work to individuals with no
previous experience, workshop format, good practices on instruction and
delivery, and introducing SAS macros in an example based manner.
Predictive Modeling Using SAS® Visual Statistics: Beyond the Prediction
Xiangxiang Meng
PA-197
Predictions, including regressions and classifications, are the predominant
focus of many statistical and machine-learning models. However, in the era of
Big Data, a predictive modeling process contains more than just making the
final predictions. For instance, a large collection of data often represents a
set of small, heterogeneous populations. Identification of these sub groups is
therefore an important step in predictive modeling. Additionally, Big Data data
sets are often complex, exhibiting high dimensionality. Consequently, variable
selection, transformation, and outlier detection are integral steps. This paper
provides working examples of these critical stages using SAS® Visual
Statistics, including data segmentation (supervised and unsupervised), variable
transformation, supervised variable selection, outlier detection, and
filtering, in addition to building the final predictive model using methodology
such as decision trees, logistic regressions, and random forests. The
illustration data were collected over 2010 to 2014 from vehicle emission
testing results.
Get cozy with your data sources using LSF queue technology: One LSF Cluster to Rule them All
Steve Garrison
PA-171
Do you have data sources located all over the world? Is your user data
travelling needlessly across long distances? Do you experience network lag
because your production SAS grid cluster is not housed in the same physical
location as each of your data sources? The answer is not to create multiple
production SAS environments, one in each data location. The solution is to
expand your grid cluster to span multiple data centers using dedicated LSF
queues for each 'remote' server(s).
In an attempt to save licensing and resource costs, many global corporations
are moving towards a single enterprise SAS grid verses multiple internal SAS
environments. Many of these legacy SAS environments were purposely co-located
beside critical data sources to provide the fastest possible response time when
pulling or pushing data to those sources of data. So, how does an organization
architect a single enterprise SAS grid utilizing data sources located all over
the world, without compromising performance?
One way to eliminate network latency is by cohabitating compute nodes with your
critical data. LSF queue technology makes this possible. Having a queue
assigned to each worker node in each location where critical data is housed,
there is no network lag because there is no traversing of distance. The server
performing the work is in the same location as the data source. A single
production LSF cluster can span the globe to submit a SAS job from Virginia to
be executed on a server in Tokyo (or wherever a critical data source may be
located).
This use of LSF queue technology creates a single production cluster to “rule
them all".
When Bad Things Happen to Good SAS Programmers
Jiangtang Hu
PA-102
It’s not the doom days for SAS programmers by any means, but we can’t call
it Golden Age anymore. As SAS programmers in their daily work, they face the
much more aggressive control from IT department: they might not be able to use
their favorite text editors to write/edit SAS codes, and they might even have
no choices of the SAS interface to run the codes but have to the stick to the
new game in town like SAS Enterprise Guide, SAS Studio, SAS Data Integration
Studio, SAS Drug Development.
In the external world, SAS language itself is facing the strong completion from
R, Python and it raises the even more profound question: would SAS programmers
as profession exist in the near future? We can see the volumes of traffic in
main SAS mailing list is slow down, the bloggers in SAS are not active as
before and even worse, the SAS user conferences at all levels are also facing
challenges of funding and participations.
In this paper, I will not focus on “why” these bad things happened, but
rather launch an open discussion on how we SAS programmer response to these
challenges when the bad things happened or is happening.
Supporting SAS in the Workplace: A Corporate SAS User Community
Barbara Okerson and Jennifer Harney
PA-71
Many SAS users are not aware of an abundance of resources available to them
from a variety of sources. The available resources range from those internal
to their own organization to SAS itself. In order for these resources to be
utilized they need to be available to the users in an accessible way. This
paper shows how one large company with SAS users at many locations throughout
the United States has built a highly successful collaborative community for SAS
support. Modeled in the style of sasCommunity.org, the online corporate SAS
community includes discussion forums, surveys, interactive training, places to
upload code, tips, techniques, and links to documentation and other relevant
resources that help users get their jobs done.
Establishing a Health Analytics Framework
Krisa Tailor
PA-198
Medicaid programs are the second largest line item in each state’s budget. In
2012, they contributed $421.2 billion, or 15 percent of total national
healthcare expenditures. With US health care reform at full speed, state
Medicaid programs must establish new initiatives that will reduce the cost of
healthcare, while providing coordinated, quality care to the nation’s most
vulnerable populations. This paper discusses how states can implement
innovative reform through the use of data analytics. It explains how to
establish a statewide health analytics framework that can create novel analyses
of health data and improve the health of communities. With solutions such as
SAS® Claims Analytics, SAS® Episode Analytics, and SAS® Fraud Framework,
state Medicaid programs can transform the way they make business and clinical
decisions. Moreover, new payment structures and delivery models can be
successfully supported through the use of healthcare analytics. A statewide
health analytics framework can support initiatives such as bundled and episodic
payments, utilization studies, accountable care organizations, and all-payer
claims databases. Furthermore, integrating health data into a single analytics
framework can provide the flexibility to support a unique analysis that each
state can customize with multiple solutions and multiple sources of data. Establishing
a health analytics framework can significantly improve the
efficiency and effectiveness of state health programs and bend the healthcare
cost curve.
A Review of "Free" Massive Open Online Content (MOOC) for SAS® Learners
Kirk Paul Lafler
PA-13
Leading online providers are now offering SAS® users with “free” access to
content for learning how to use and program in SAS. This content is available
to anyone in the form of massive open online content (or courses) (MOOC). Not
only is all the content offered for “free”, but it is designed with the
distance learner in mind, empowering users to learn using a flexible and
self-directed approach. As noted on Wikipedia.org, “A MOOC is an online
course or content aimed at unlimited participation and made available in an
open access forum using the web.” This presentation illustrates how anyone
can access a wealth of learning technologies including comprehensive student
notes, instructor lesson plans, hands-on exercises, PowerPoints, audio,
webinars, and videos.
Differentiate Yourself
Kirk Paul Lafler
PA-22
Today's employment and business marketplace is highly competitive. As a result,
it can be difficult to differentiate yourself and/or your business from
standing out from the competition. The success you're able to achieve depends
on how you position yourself and/or your business relative to your competitors. Topics include
learning how to cut through all the marketplace noise,
techniques on grabbing the attention of others, and strategies for attracting
the desired employer or client. This presentation emphasizes essential skills
that will help students, junior professionals, and seasoned professionals learn
how to differentiate yourself and/ or your business from the competition.
Downloading, Configuring, and Using the Free SAS® University Edition Software
Charlie Shipp and Kirk Paul Lafler
PA-51
The announcement of SAS Institute’s free “SAS University Edition” is an
exciting development for SAS users and learners around the world! The software
bundle includes Base SAS, SAS/STAT, SAS/IML, Designer Studio (user interface),
and SAS/ACCESS for Windows, with all the popular features found in the licensed
SAS versions. This is an incredible opportunity for users, statisticians, data
analysts, scientists, programmers, students, and academics everywhere to use
(and learn) for career opportunities and advancement. Capabilities include
data manipulation, data management, comprehensive programming language,
powerful analytics, high quality graphics, world-renowned statistical analysis
capabilities, and many other exciting features.
This presentation discusses and illustrates the process of downloading and
configuring the SAS University Edition. Additional topics include the process
of downloading the required applications, “key” configuration strategies to
run the “SAS University Edition” on your computer, and the demonstration of
a few powerful features found in this exciting software bundle. We conclude
with a summary of tips for success in downloading, configuring and using the
SAS University Edition.
Posters
Maintaining a 'Look and Feel' throughout a Reporting Package Created with Diverse SAS® Products
Barbara Okerson
PO-44
SAS® provides a number of tools for creating customized professional
reports. While SAS provides point-and-click interfaces through products such as SAS®
Web Report Studio, SAS® Visual Analytics or even SAS® Enterprise Guide®,
unfortunately, many users do not have access to the high-end tools and require
customization beyond the SAS Enterprise Guide point-and-click interface. Fortunately,
Base SAS procedures such as the REPORT procedure, combined with
graphics procedures, macros, ODS, and Annotate can be used to create very
customized professional reports.
When toggling together different solutions such as SAS Statistical Graphics,
the REPORT procedure, ODS, and SAS/GRAPH®, different techniques need to be
used to keep the same look and feel throughout the report package. This
presentation looks at solutions that can be used to keep a consistent look and
feel in a report package created with different SAS products.
Assessing Health Behavior Indicators Among Florida Middle School Students
Elizabeth Stewart, Ivette A. Lopez and Charlotte Baker
PO-45
This pilot study sought to assess selected health behavior indicators of middle
school students attending a developmental research school at a Historically
Black College/University. Students in grades 6-8 completed a modified Youth
Risk Behavior Survey (YRBS). Study participants (n=48) answered questions
concerning unintentional injury and violence, sexual behaviors, alcohol and
drug use, tobacco use, dietary behaviors and physical activity. The majority of
students, 89% reported no usage of tobacco, alcohol or other drugs. Regarding
screen time, 50% of all students reported watching 3-5 hours of television on
an average school day and 68% of males reported playing video/computer games
for the same time span. Most males and females reported being about the right
weight, 69% and 60%, respectively, while 22% of girls reported being slightly
and very overweight, combined. Concerning physical activity, 81% of males
reported 4 hours or more participation each week, compared to 50% of females. The health
behavior indicator with the greatest difference per sex among the
student population is physical activity, as males report greater activity
levels. This emphasizes the need for a segmented physical activity intervention
for the female student population and an intervention designed to increase
physical activity and decrease screen time among African American teens.
Using SAS to Examine Health-Promoting Life Style Activities of Upper Division Nursing Students One of the Major University in South Eastern
Abbas Tavakoli and Mary Boyd
PO-46
Health promotion is an important nursing intervention. Research has long
shown that one’s lifestyle affects health. A health-promoting lifestyle has been
described as a multi-dimensional pattern of self-initiated actions that
maintain or enhance one’s level of wellness, self-actualization, and
fulfillment. The purpose of this study was to evaluate the health
promotion/life style activities of upper division nursing students in a college
of nursing one of the major university in South Eastern. Specific aims of the
study were (1) to measure health-promoting life style activities of upper
division nursing students in a college of nursing in the Southeastern United
States; (2) to compare the health-promoting life styles of male and female
nursing students; (3) to compare the health-promoting life styles of Caucasian
students to students of other ethnic groups; and 4) to compare the
health-promoting life styles of by marital status. This study used a
descriptive, comparative design to assess the health promotion/life style
activities of upper division nursing students in a college of nursing in the
Southeastern US. Women are often more involved in interpersonal relationships
than men and use and provide more social support than men. The result did not
reveal any significant difference for total and subscales of health promoting
by gender. There were statistically significant differences for white students
versus other race in terms of physical activity, nutrition, and interpersonal
relations. There was significant difference for physical activity by marital
status. However, there were not any statistical differences for other subscales
by marital status.
Reporting Of Treatment Emergent Adverse Events Based On Pooled Data Analysis or Country Specific Submissions: A Case Study
Sheetal Shiralkar
PO-53
Often sponsor need to file for regulatory submissions at different
Country specific regulatory authorities after they get approval from
Food and Drug Administration. The key reporting aspect of country
specific submissions pertaining to emerging markets involve accurate
reporting of adverse events from the clinical trials conducted for the
specific drug in those countries. For reporting of these adverse
events, we need to develop a robust algorithm and comprehensive system
architecture for efficient and accurate data representation.
Pooling of data from multiple studies is often the first step in
ensuring that adverse events from all the trials of the drug get
accurately reported. The data pooling specifications involve lot of
conditioning and sub-setting of data based on reporting
specifications. This poster describes a case study of a typical data
pooling and reporting process of trial level data available in ADaM
model. The analysis also elaborates on more details pertaining to
reporting requirements and on programming algorithms developed to meet
those requirements.
Automating Preliminary Data Cleaning in SAS
Alec Zhixiao Lin
PO-63
Preliminary data cleaning or scrubbing tries to delete the following types of
variables considered to be of little or no use: 1) variables with missing
values or a uniform value across all records; 2) variables with very low
coverages; 3) character variables for name, addresses and IPs. These variables
are very commonly seen in big data. This paper introduces a SAS process that
will automatically and efficiently detect these variables with a minimal manual
handling from users. The output also helps users to identify those character
variables that need to be converted to numeric values for downstream analytics.
Using SAS to Examine Health Effects of Intimate Partner Violence among HIV+ Women
Abbas Tavakoli, Sabra Custer-Smith and Ni Yu-Min
PO-97
Intimate partner violence (IPV) is a recognized national public health issue
that includes physical abuse and unwanted or forced sexual contact by a
partner. Numerous studies have documented the negative health consequences of
IPV. There is evidence that IPV has a negative effect on the self-management
of HIV, which is now a chronic disease. The purpose of this study was to use
descriptive statistics and correlations to measure the prevalence of IPV and
the possible effects of IPV among HIV+ women. A convenience sample of 200 HIV+
women recruited at a Ryan White-funded clinic in Columbia, SC. The prevalence
of IPV was assessed using the Severity of Violence Against Women Scale (SVAWS). The SVAWS
is a 46-item Likert scale that assesses experiences with IPV over the
last 12 months. In addition to a summary score of total IPV, the SVAWS also
contains subcategories of types of IPV. Participants were also asked to report
their most recent HIV viral load in order to gauge the management of their HIV. Statistical
analysis included descriptive statistics and correlation
procedures. SAS 9.4 used to analyze the data. The Spearman correlation was
used to examine the association between total levels of IPV, each subcategory
of IPV, and viral load. There were no significant positive linear relationships
between viral load and violence subscales. The Pearson correlation for
different subscales of violence and the HIV viral load ranged from -0.01 to 0.1.
Ouch, how did that get here? The pitfalls of merging ...
Nancy McGarry
PO-101
This poster is a short slide show presentation of the pitfalls of
merging. It highlights some common problems of the unwary, looking at 5 common missteps
in merging data in hopes of preventing errors by making the audience more aware
of things that can go wrong during a DATA step merge.
A SAS Macro to Investigate Statistical Power in Meta-analysis
Jin Liu and Fan Pan
PO-109
Meta-analysis is a quantitative review method, which synthesizes the results of
individual studies on the same topic. Cohen’s d was selected as the effect
size index in the current study because it is widely used in the practical
met-analysis and there are few simulation studies investigating this index. Statistical power is conceptually defined as the probability of detecting the real-existing effect/difference. The current power analytical procedure of
meta-analysis involves approximations and the accuracy of using the procedure
is uncertain.
Simulation can be used to calculate power in a more accurate way by addressing
approximation in formula. The simulation studies involve generating data from
computer programs to study the performance of the statistical estimates under
different conditions (Hutchinson and Bandalo, 1997). If there is no real
effect, researchers would hope to retain the null hypothesis. If there is a
real existing effect, researchers would hope to reject the null hypothesis to
increase statistical power. In each simulation step, a p-value is retained to
decide if the null hypothesis is rejected or retained. The proportion of the
rejected null hypotheses of all simulation steps is the simulated statistical
power when there is a non-zero treatment effect.
The purpose of the study is to inform meta-analysis practitioners the degree of
discrepancy between the analytical power and real simulated power in the
meta-analysis framework (i.e., fixed and random effects models). SAS macro was
developed to show researchers power under following conditions: simulated power
in the fixed effects model, analytical power in the fixed effects model,
simulated power in the random effects model, and analytical power in the random
effects model. As long as researchers know the parameters that are needed in
their meta-analysis, they can run the SAS macro to receive the power values
they need in different conditions. Results indicate that the analytical power
was close to the simulated power, while some conditions in the random-effects
models had noticeable power discrepancies. This study will yield a better
understanding of statistical power in real meta-analyses.
Documentation as you go: aka Dropping Breadcrumbs
Elizabeth Axelrod
PO-114
Your project ended a year ago, and now you need to explain what you did, or
rerun some of your code. Can you retrace your steps? Find your data, programs,
printouts? Replicate the results? This poster presents tools and techniques
that follow best practices to help us, as
programmers, manage the flow of information from source data to a final
product.
Many Tables, Short Turnaround: Displaying Survey Results Quickly and Efficiently
Rebecca Fink, David Izrael, Sarah W. Ball and Sara M.A. Donahue
PO-120
When constructing multiple tables with a tight deadline, it is critical to have
a tool to quickly and efficiently create tables using a standard set of
parameters. We demonstrate a set of SAS® macros based on PROC SURVEYFREQ which
can be used to summarize survey data in tables that contain unweighted and
weighted counts and percentages, as well as a weighted treatment ratio, with
respective confidence intervals. These macros allow us to display both
single-select survey questions (i.e., survey questions with only one response
allowed, such as gender) and multiple choice survey questions that allow the
respondent to choose more than one response (for example, insurance status)
within the same table. Further, we can use these macros to sort the output by
the treatment ratio or by the column percent distribution in either ascending
or descending order. Finally, these macros have the ability to output the
results on a defined subset of the sample.
Using SAS® Macros to Split a Data Set into Several Output Files
Imelda Go
PO-127
Generating output files from various subsets of a master data set is a common
task. For example, a statewide master data set has records grouped by district
and the task requires splitting the master data into an output file per
district. In this example, PROC FREQ is used to generate a frequency
distribution of the district values in the master data set. A macro loop is
then executed for each district value in the frequency distribution. For each
loop, macro variables are assigned values based on the data per district as
specified in the frequency distribution. These values, which vary per district,
were used to give unique names to each output file. Another macro discussed
enables the programmer to assign the number of observations in a data set to a
macro variable. This macro variable is useful for determining how many loops
will be executed in the macro loop mentioned above, which eliminates hardcoding
the number of loops in the macro loop. The form of the output files may vary
depending on the programmer’s needs and is specified within the macro loop.
SAS Macros for Constraining Arrays of Numbers
Charles Coleman
PO-131
Many applications require constraining arrays of numbers to controls in one or
two dimensions. Example applications include survey estimates, disclosure
avoidance, input-output tables, and population and other estimates and
projections. If the results are allowed to take on any nonnegative values,
raking (a.k.a. scaling) solves the problem in one dimension and two-way
iterative raking solves it in two dimensions. Each of these raking macros has
an option for the user to output a dataset containing the rakes. The problem is
more complicated in one dimension if the data can be of any sign, the so-called
“plus-minus” problem, as simple raking may produce unacceptable results. This problem is addressed
by generalized raking, which preserves the structure
of the data at the cost of a nonunique solution. Often, results are required
to be rounded so as to preserve the original totals. The Cox-Ernst algorithm
accomplishes an optimal controlled rounding in two dimensions. In
one dimension, the Greatest Mantissa algorithm is a simplified version of the
Cox-Ernst algorithm.
Each macro contains error control code. The macro variable &errorcode is made
available to the programmer to enable error trapping.
Time Series Analysis: U.S. Military Casualties in the Pacific Theater during World War Two
Rachael Becker
PO-133
This paper aims to show how statistical analysis can be used in the field of
History. The primary focus of this paper is show how SAS® can be utilized to
obtain a Time Series Analysis of data regarding World War II. The hope of this
analysis is to test whether Truman's justification for the use of atomic
weapons was valid. Truman believed that by using the atomic weapons he would be
preventing unacceptable levels of U.S. casualties that would be incurred in the
course of a conventional invasion of the Japanese home islands.
Streamlining Medicaid Enrollment Reporting and Calculation of Enrollment Trends Using Macro and PROC Statements in SAS
Deniz Soyer
PO-153
Background: The Division of Analytics and Policy Research (DAPR) within the District of
Columbia Department of Health Care Finance produces a monthly Medicaid
enrollment report for the District’s Medical Care Advisory Committee (MCAC)
to document enrollment trends in the District’s various Fee-For-Service and
Managed Care programs. This retrospective report requires retrieving Medicaid
eligibility data for multiple monthly enrollment spans, organization of the
data by program type and month, and calculation of enrollment growth for
programs of interest. Previously, DAPR used Microsoft Excel to organize
multiple outputs of monthly enrollment data and perform calculations on trends. To minimize
time spent to manually produce recurring reports, DAPR sought to
develop a SAS program to streamline and automate its monthly MCAC report.
Methods: The use of %LET statements created macro variables to represent each monthly
enrollment span, which allowed for the formatting and transposing steps,
occurring later in the program, to automatically reference each month. Functions PUT and %SCAN were used to reference and redefine variable formats.
PROC FREQ was used to obtain total enrollment counts, by month, for each
Medicaid program. PROC TRANSPOSE was used to convert each month from an
observation into a variable type, which allowed each row to display enrollment
for each Medicaid program type, while the columns served to classify enrollment
by month. DATA steps were used to perform calculations for enrollment trends,
such as growth.
Results: Using essential macro and PROC statements in SAS, DAPR was able to organize
data, calculate trends, and output a finalized report related to monthly
Medicaid program enrollment, thereby resulting in near-automation of its
reporting process.
Conclusion: Adopting the use of macro statements alongside data steps and PROC statements
in SAS enables greater automation and accuracy in periodic reporting. DAPR has
incorporated the combined use of macro and PROC statements in an effort to
streamline other recurring reports.
PROC REG: A SAS Macro to Determine Cut Score for a Universal Pre-School Maladaptive Screener
Yin Burgess
PO-165
Linear Regression is a commonly and widely used statistical tool for
determining relationships (or lack thereof) of variables. In SAS, we use the
REG procedure, or PROC REG, which fits linear regression models using
least-squares, to carry out the analyses. We will focus on the very basics of
PROC REG and, needless to say, explanation of many features is beyond the scope
of this proposal. Using Ordinary Least-squares estimates (OLS) in statistical
analyses involves many assumptions, such as homoscedasticity, independence,
correct model, random sample (uncorrelated error terms), and normally
distributed error terms. We will avoid possible complications with these
assumptions for the purpose of this demonstration. We will use educational
research data on a pre-school children maladaptive screener to demonstrate how
we use a SAS Macro to yield a score that determines if a child is deemed to
have maladaptive behavior.
Reporting and Information Visualization
How to Become the MacGyver of Data Visualizations
Tricia Aanderud
RIV-104
If you don't understand what makes a good data visualization - then chances are
you're doing it wrong. Many business people are given data to analyze and
present when they often don't understand how to present their ideas visually. We are taught to think about data as
numbers. We often fail to understand that
numbers show causes and help others reason through issues. In this paper, we
will review how data visualizations fail to understand what makes a good data
visualization work.
Bridging the Gap: Importing Health Indicators Warehouse data into SAS® Visual Analytics using Stored Processes and APIs
Li Hui Chen, Manuel Figallo and Josh McCullough
RIV-92
The National Center for Health Statistics’ Health Indicators Warehouse (HIW)
is part of the Department of Health and Human Services’ (DHHS) response to
the Open Government Initiative to make federal data more accessible to all
users. Through it, users can view and download data and metadata for over 1,200
indicators on health status, outcomes, and determinants from approximately 180
different federal and nonfederal sources. HIW also provides access to data
through the use of an Application Programming Interface (API). API is a
communication interface that applications such as SAS® Visual Analytics
(SAS®VA) can use to access data from HIW and other data repositories. This
paper provides detailed information on how to access HIW data with SAS®VA
in order to produce easily understood health statistics visualizations with
minimal effort. It will guide readers through a process and methodology to
automate data access to the HIW and demonstrate the use of SAS®VA to present
health information via a web browser.
This paper also shows how to run SAS macros inside a stored process to generate
the API calls to the HIW in order to make the indicators and associated data
and metadata available in SAS®VA for exploration and reporting; the macro
codes are provided. Use cases are also explored in order to demonstrate the
value of using SAS®VA and stored processes to stream data directly from the
HIW API. Dashboards, for instance, are created to visually summarize results
gained from exploring the data.
Both IT professionals and population health analysts will benefit from
understanding how to import HIW data into SAS®VA using Stored Processes and
APIs. This paper ultimately provides a starting point for any organization
interested in using HIW data to augment their analysis related to population
health. Integrating HIW with SAS®VA can be very helpful to organizations that
want to streamline their data management processes and lower high maintenance
costs associated with data extraction and access while gaining insights into
health data. Analysts will also benefit from this paper through the use cases,
which demonstrate the value of population health data accessed through an API
with SAS®VA.
Take Your Data Analysis and Reporting to the Next Level by Combining SAS® Office Analytics, SAS® Visual Analytics, and SAS® Studio
Tim Beese
RIV-196
SAS® Office Analytics, SAS® Visual Analytics, and SAS® Studio provide
excellent data analysis and report generation. When these products are
combined, their deep interoperability enables you to take your analysis and
reporting to the next level. Build interactive reports in SAS® Visual
Analytics Designer, and then view, customize and comment on them from Microsoft
Office and SAS® Enterprise Guide®. Create stored processes in SAS Enterprise
Guide, and then run them in SAS Visual Analytics Designer, mobile tablets, or
SAS Studio. Run your SAS Studio tasks in SAS Enterprise Guide and Microsoft
Office using data provided by those applications. These interoperability
examples and more will enable you to combine and maximize the strength of each
of the applications. Learn more about this integration between these products
and what's coming in the future in this session.
How We Visualize Data and How to Apply Those Findings in SAS® Visual Analytics
Ryan Kumpfmiller
RIV-123
With data discovery tools becoming more useful and elaborate each year, the
capabilities of displaying data and designing reports have never been better. We have gotten
to the point where we can now create interfaces that end users
view and interact with. To get the most out of these capabilities, just like
with data, we need to know what is going on behind the scenes. Now that we are building interfaces with data discovery tools such as SAS®
Visual Analytics, it’s time to understand the way that we view data and
incorporate that research into how we build reports.
A Journey from data to dashboard: Visualizing the university instructional classroom utilization and diversity trends with SAS Visual Analytics
Shweta Doshi and Julie Davis
RIV-125
Transforming data into intelligence for effective decision-making support is
critically based on Office of Institutional Research’s role and capacity in
managing the institution’s data. Presenters will share their journey from
providing spreadsheet data into developing SAS programs and dashboard using SAS
Visual Analytics. Experience gained and lessons learned will also be shared at
this session.
The presenters will:
- demonstrate two dashboards the IR office developed, one for classroom
utilization and one for the University’s diversity initiatives;
- describe the process the office took in getting the stakeholders
involved in determining the KPI and evaluating and providing feedback regarding
the dashboard; and
- share their experience gained and lessons learned in building the
dashboard.
Key Features in ODS Graphics for Efficient Clinical Graphing
Yuxin (Ellen) Jiang
RIV-174
High-quality effective graphs not only enhance understanding of the data but
also facilitate the regulators in the review and approval process. In recent
SAS releases, SAS has made significant progress toward more efficient graphing
in ODS Statistical Graphics (SG) procedures and Graph Template Language (GTL). A variety
of graphs can be quickly produced using convenient built-in options
in SG procedures. With graphical examples and comparison between SG procedures
and traditional SAS/GRAPH procedure in reporting clinical trial data, this
paper highlights several key features in ODS Graphics to efficiently produce
sophisticated statistical graphs with more flexible and dynamic control of
graphical presentation.
How to Make a Stunning State Map Using SAS/Graph® for Beginners
Sharon Avrunin-Becker
RIV-74
Making a map for the first time can be an overwhelming task if you are just
beginning to learn how to navigate your way through SAS/Graph. It can be
especially paralyzing when you are trying to narrow your map to a smaller scale
by identifying counties in a state. This paper will walk you through the steps
to getting started with your map and how to add ranges of colors and
annotations. It will also point out a few traps to avoid as you are designing
your programs and maps.
PROC RANK, PROC SQL, PROC FORMAT and PROC GMAP Team Up and a (Map) Legend is Born!
Christianna Williams and Louise Hadden
RIV-80
The task was to produce a figure legend that gave the quintile ranges of a
continuous measure corresponding to each color on a five-color choropleth US
map. Actually, we needed to produce the figures and associated legends for
several dozen maps for several dozen different continuous measures and time
periods as well as associated "alt-text" for compliance with Section 508…so,
the process needed to be automated. A method was devised using SAS(R) PROC RANK
to generate the quintiles, PROC SQL to get the data value ranges within each
quintile, and PROC FORMAT (with the CNTLIN= option) to generate and store the
legend labels. The resulting data files and format catalogs are then used to
generate both the maps (with legends) and associated "alt text". Then, these
processes were rolled into a macro to apply the method for the many different
maps and their legends. Each part of the method is quite simple – even
mundane – but together these techniques allowed us to standardize and
automate an otherwise very tedious process. The same basic strategy could be
used whenever one needs to dynamically generate data "buckets" but then keep
track of the bucket boundaries – whether for producing labels, map legends,
"alt-text", or so that future data can be benchmarked against the stored
categories.
Text Analytics Using JMP®
Melvin Alexander
RIV-31
JMP® version 11 introduced the Free Text Command in the Analyze > Consumer
Research > Categorical Platform under the “Multiple” tab. This utility
restricted users to just produce word frequency counts and create indicator
columns of the words that appeared in free-text comment columns. For more
extensive text mining, users must use other JMP® Scripting Language (JSL)
scripts, functions, and tools. This presentation will review different ways how
JMP® can parse and convert qualitative text data into quantified measures. Text mining
techniques covered in this presentation include forming
Term-Document-Matrices (TDMs); apply singular value decomposition (SVD) to
identify the underlying dimensions that accounts for most of the information
found in documents and text, and cluster word groups to convey similar topics
or themes. Attendees should be able to use the methods for further reporting
and modelling.
The Last Axis Macro You'll Ever Need
Shane Rosanbalm
RIV-122
There are several good papers out there about automating the creation of SAS
axes (Dorothy E. Pugh 2000, Don Li 2003, Rick Edwards 2012). These papers are
written with classic SAS/GRAPH in mind. But, with the rise of ODS Graphics and
the corresponding much improved axes, one might reasonably ask the question,
"Do we even need axis macros anymore?" While I wholeheartedly agree that the
ODS Graphics axis defaults are much nicer than what we were used to getting out
of classic SAS/GRAPH, there are still situations in which we will not want to
leave control of axis ranges entirely up to SAS.
In this paper I take what I see as the best ideas from the above papers and
combine them into a bigger, faster, stronger axis macro. The default behavior
of the new macro is to mimic what ODS Graphics would give you. The new macro
also includes several optional parameters that allow the behavior to be
customized to fit a wide variety of specialty situations (multiple variables,
reference values, preferred number of tick marks, etc.). By the end of this
paper I hope you'll agree that this is indeed the last axis macro you'll ever
need!
Design of Experiments (DOE) Using JMP® and SAS®
Charlie Shipp
RIV-115
JMP/SAS provides the best design of experiment software available. The DOE
team continues the tradition of providing state-of-the-art DOE support. In
addition to the full range of classical and modern design of experiment
approaches, JMP provides a template for Custom Design for specific
requirements. The other choices include: Screening Design; Response Surface
Design; Choice Design; Accelerated Life Test Design; Nonlinear Design; Space
Filling Design; Full Factorial Design; Taguchi Arrays; Mixture Design; and
Augmented Design. Further, sample size and power plots are available. We give an
introduction to these methods followed by important examples with factors.
Creating Geographic Rating Area Maps: How to Combine Counties, Split Counties, and use Zip Code Boundaries
Rick Andrews
RIV-204
SAS/GRAPH® will be used to create choropleth maps that identify the geographic
rating areas implemented by the Affordable Care Act (ACA). The default areas
for each state are Metropolitan Statistical Areas (MSAs) plus the remainder of
the State that is not included in a MSA. States may seek approval to base the
rating areas on counties or three-digit zip codes, which requires that counties
be combined in some states and split in two in others. For the states
that use
zip codes to identify the areas, ZIP code tabulation area (ZCTA) files from the
U.S. Census Bureau that are in ESRI shapefile format (.shp) are used. Also
demonstrated will be the utilization of the annotate facility to identify each
area and place major cities on the maps.
Layout the Grid and You Control the Document: ODS Meets OOP and You Reap the Benefits
Daniel Ralyea and Karen Price
RIV-25
ODS is a journey not just a destination. You can exhibit a fine degree of
control over your output once you understand the basic structures involved. Most of us
are familiar with opening and closing an ODS destination (and it is good!). Exploring the
construction of a destination allows a greater understanding of
the power at your finger tips. ODS Layout provides a guiding structure for the
information canvas. ODS Region provides control of a subset of the canvas and
Object Oriented Programming allows cell by cell control of a custom built
table. These tools, combined with the flexibility inherent in the output
destination, allow a wide variety of production possibilities.
From SAS Data to Interactive Web Graphics Built Through PROC JSON
Robert Seffrin
RIV-128
The National Agricultural Statistics Service (NASS) publishes extensive data
covering the breadth of agriculture in the United States. To make this data
more accessible to the public, NASS is exploring new and dynamic visualizations
through the web. JavaScript has become a standard for displaying and
interacting with this type of data. Developing charts from scratch has a steep
learning curve requiring skill in JavaScript, HTML, and cascading style sheets. Many
JavaScript visualization libraries assist with various aspects of
charting, but a library called Vega greatly reduces the need for programming by
defining chart parameters through a declarative grammar formatted as JSON
(JavaScript Object Notation). While this eliminates most, if not all of, the
JavaScript programming the JSON declarations can be complex with multiple
nested levels.
The new PROC JSON accessed through the SAS University Edition
greatly simplifies the creation of a JSON file to create an interactive
scatterplot matrix where a selection in one subplot will appear in all other
subplots. Charting parameters will be stored in an easy to edit Excel file
which SAS will read and use to build a JSON file with data set specific
variable names. Creating interactive web charts from SAS data is as simple as
updating some parameters and building the JSON file.
Statistics and Data Analysis
MIXED_RELIABILITY: A SAS Macro for Estimating Lambda and Assessing the Trustworthiness of Random Effects in Multilevel Models
Jason Schoeneberger and Bethany Bell
SD-189
When estimating multilevel models (also called hierarchical models, mixed
models, and random effect models), researchers are often interested not only in
the regression coefficients but also in the fit of the overall model to the
data (e.g., -2LL, AIC, BIC). Whereas both model fit and regression coefficient
estimates are important to examine when estimating multilevel models, the
reliability of multilevel model random effects should also be examined. –
lambda. However, neither PROC MIXED nor PROC GLIMMIX produce estimates of
lambda, the statistic often used to represent reliability. As a result, this
important metric is often not examined by researchers who estimate their
multilevel models in SAS. The macro presented in this paper will provide
analysts estimating multilevel models with a readily-available method for
generating reliability estimates within SAS PROC MIXED.
Testing the Gateway Hypothesis from Waterpipe to Cigarette Smoking among Youth Using Dichotomous Grouped-Time Survival Analysis (DGTSA) with Shared frailty in SAS®
Rana Jaber
SD-56
Dichotomous grouped-time survival analyses is a combination of grouped-Cox
model (D'Agostino et al., 1990), discrete time-hazard model (Singer and Willet,
1993), and the dichotomous approach (Hedeker et al., 2000). Items measured from
wave 1 through wave 4 were used for time-dependent covariates linking the
predictors to the risk of waterpipe smoking progression at the subsequent
student’s interview. This analysis allows for maximum data use, inclusion of
the time-dependent covariates and relaxing of the proportional hazards
assumption, and takes into consideration the interval censored (i.e. the event
occurred during a certain known interval (e.g., one year), but the exact time
at which it was occurred cannot be specified) nature of the data. The aim of
this paper is to provide new method of analyzing panel data where
the outcome is binary with some explanation of the SAS® codes. Examples of
using the PROC PHREG procedure are drawn from data that was recently published
in the International Journal of Tuberculosis and Lung Disease (IJTLD).
Optimizing Pilot Connection Time Using PROC REG and PROC LOGISTIC
Andrew Hummel and Shevawn Christian
SD-60
As any airline traveler knows, connection time is a key element of the travel
experience. A tight connection time can cause angst and concern, while a
lengthy connection time can introduce boredom and a longer than desired travel
time. The same elements apply when constructing schedules for airline pilots. Like passengers, pilot schedules are built with connections. Delta Air Lines
operates a hub and spoke system that feeds both passengers and pilots from the
spoke stations and connects them through the hub stations. Pilot connection
times that are tight can result in operational disruptions whereas extended
pilot connection times are inefficient and unnecessarily costly. This paper
will demonstrate how Delta Air Lines utilized SAS® PROC REG to analyze
historical data in order to build operationally robust and financially
responsible pilot connections.
Regression Analysis of the Levels of Chlorine in the Public Water Supply in Orange County, FL
Drew Doyle
SD-185
Public water supplies contain disease-causing microorganisms in the water or
transport ducts. In order to kill off these pathogens, a disinfectant, such as
chlorine, is added to the water. Chlorine is the most widely used disinfectant
in all U.S. water treatment facilities. Chlorine is known to be one of the most
powerful disinfectants to restrict harmful pathogens from reaching the
consumer. In the interest of obtaining a better understanding of what variables
affect the levels of chlorine in the water, this thesis will analyze a
particular set of water samples randomly collected from locations in Orange
County, Florida. Thirty water samples will be collected and have their chlorine
level, temperature, and pH recorded. The chlorine levels will be read by a
LaMotte Model DC1100 Colorimeter and will output the amount of chlorine in
parts per million (ppm). This colorimeter will read the total chlorine of the
sample, including both free and combined chlorine levels. A linear regression
analysis will be performed on the data collected with several qualitative and
quantitative variables. Water age, temperature, time of day, location, pH, and
dissolved oxygen level will be the independent variables collected from each
water sample. All data collected will be analyzed through various Statistical
Analysis System (SAS) procedures. Partial residual plots will be used to
determine possible relationships between the chlorine level and the independent
variables and stepwise selection to eliminate possible insignificant
predictors. From there, several possible models for the data will be selected. F tests will be conducted to determine which of the models appears to be the
most useful. All tests will include hypotheses, test statistics, p values, and
conclusions. There will also be an analysis of the residual plot, jackknife
residuals, leverage values, Cook’s D, press statistic, and normal probability
plot of the residuals. Possible outliers will be investigated and the critical
values for flagged observations will be stated along with what problems the
flagged values indicate. A nonparametric regression analysis can be performed
for further research of the existing data.
Alternative Methods of Regression When OLS Is Not the Right Choice
Peter Flom
SD-27
Ordinary least square regression is one of the most widely used statistical
methods. However, it is a parametric model and relies on assumptions that are
often not met. Alternative methods of regression for continuous dependent
variables relax these assumptions in various ways. This paper will explore
PROCS such as QUANTREG, ADAPTIVEREG and TRANSREG for these data.
An Intermediate Guide to Estimating Multilevel Models for Categorical Data using SAS® PROC GLIMMIX
Whitney Smiley, Zhaoxia Guo, Mihaela Ene, Genine Blue, Elizabeth Leighton and Bethany Bell
SD-173
This paper expands upon Ene et al.’s (2015) SAS Global Forum proceeding paper
“Multilevel Models for Categorical Data using SAS® PROC GLIMMIX: The
Basics” in which the authors presented an overview of estimating two-level
models with non-normal outcomes via PROC GLIMMIX. In their paper, the authors
focused on how to use GLIMMIX to estimate two-level organizational models;
however, they did not address more complex organizational models (e.g.,
three-level models) or models used to estimate longitudinal data. Hence, the
need for the current paper; by building from the examples in Ene et al. (2015),
the current paper presents users detailed discussions and illustrations about
how to use GLIMMIX to estimate organizational models in situations with three
levels of data, as well as two-level longitudinal data. Consistent with Ene et
al.’s paper, we will present the syntax and interpretation of the estimates
using a model with a dichotomous outcome as well as a model with a polytomous
outcome. Concrete examples will be used to illustrate how PROC GLIMMIX can be
used to estimate these models and how key pieces of the output can be used to
answer corresponding research questions.
Adaptive Fractional Polynomial Modeling in SAS®
George Knafl
SD-65
Regression predictors are usually entered into a model without
transformation. However, it is not unusual for regression relationships to be distinctly
nonlinear. Fractional polynomials account for nonlinearity through real-valued
power transformations of primary predictors. Adaptive methods have been
developed for searching through alternative fractional polynomials based on one
or more primary predictors. A SAS macro called genreg (for general regression)
is available from the author for conducting such analyses. It supports adaptive
linear, logistic, and Poisson regression modeling of expected values and/or
variances/dispersions in terms of fractional polynomials. Fractional polynomial
models are compared using k-fold likelihood cross-validation scores and
adaptively selected through heuristic search. The genreg macro supports
adaptive modeling of both univariate and multivariate outcomes. It also
supports adaptive moderation analyses based on geometric combinations, that is,
products of transforms of primary predictors with possibly different powers,
generalizing power transforms of interactions. Example analyses and code for
conducting them are presented demonstrating adaptive fractional polynomial
modeling.
High-Performance Procedures in SAS 9.4: Comparing Performance of HP and Legacy Procedures
Jessica Montgomery, Sean Joo, Anh Kellerman, Jeffrey Kromrey, Diep Nguyen, Thanh Pham, Patricia Rodriguez de Gil and Yan Wang
SD-180
The growing popularity of big data coupled with increases in computing
capabilities has led to the development of new SAS procedures designed to more
effectively and efficiently complete such tasks. Although there is a great deal
of documentation regarding how to use these new high-performance (HP)procedures
relatively little had been disseminated regarding under what specific
conditions users can expect performance improvements. This paper serves as a
practical guide to getting started with HP procedures in SAS. The paper will
describe the differences that exist between key HP procedures (HPGENSELECT,
HPLMIXED, HPLOGISTIC, HPNLMOD, HPREGHPCORR, HPIMPUTE, HPSAMPLE, and HPSUMMARY)
and their legacy counterparts both in terms of capability and performance, with
a particular focus on discrepancies in Real Time required to execute. Simulation
will be used to generate data sets that vary on number of
observations (10,000, 50,000, 100,000, 500,000, 1,000,000, and 10,000,000) and
number of variables (50, 100, 500, 1000) to create these comparisons.
Keywords: HPGENSELECT, HPLMIXED, HPLOGISTIC, HPNLMOD, HPREG, HPCORR, HPIMPUTE,
HPSAMPLE, HPSUMMARY, high-performance analytics procedures
Probability Density for Repeated Events
Bruce Lund
SD-111
In customer relationship management (CRM) or consumer finance it is important
to predict the time of repeated events. These repeated events might be a
purchase, service visit, or late payment. Specifically, the goal is to find the
probability density for the time to first event, the probability density for
the time to second event, etc.
Two approaches are presented and contrasted. One approach uses discrete time hazard modeling. The second, a distinctly
different approach, uses multinomial logistic regression. The performances of
the two methods are evaluated using a simulation study.
A SAS Macro for Improved Correlation Coefficient Inference
Stephen Looney
SD-139
We present a SAS macro for improved statistical inference for measures of
association, including Pearson's correlation, Spearman's coefficient, and
Kendall's coefficient. While PROC CORR is a powerful tool for calculating and
testing these coefficients, some analyses are lacking. For example, PROC CORR
does not incorporate recent theoretical improvements in confidence interval
estimation for Spearman's rho, nor does it provide any confidence interval at
all for Kendall's tau-b. We have written a SAS macro that incorporates these
new developments; it produces confidence intervals, as well as p-values for
testing any null value of these coefficients. Improved sample-size calculations
for all three coefficients are also provided in the macro.
How Latent Structure Analyses Can Improve the Fit of a Regression Model
Deanna Schreiber-Gregory
SD-191
The current study looks at several ways to investigate latent variables in
longitudinal surveys and their use in regression models. Three different
analyses for latent variable discovery will be briefly reviewed and explored. The latent
analysis procedures explored in this paper are PROC LCA, PROC LTA,
PROC CATMOD, PROC FACTOR, and PROC TRAJ. The analyses defined through these
procedures are latent profile analyses, latent class analyses, and latent
transition analyses. The latent variables will then be included in separate
regression models. The effect of the latent variables on the fit and use of the
regression model compared to a similar model using observed data will be
briefly reviewed. The data used for this study was obtained via the National
Longitudinal Study of Adolescent Health, a study distributed and collected by
Add Health. Data was analyzed using SAS 9.3. This paper is intended for any
level of SAS user. This paper is also written to an audience with a background
in behavioral science and/or statistics.
A Demonstration of SAS Analytics in Teradata
Tho Nguyen and William E Benjamin Jr
SD-203
SAS analytics in Teradata refers to the integration of advanced analytics into
the data warehouse. With this capability, analytic processing is optimized to
run where the data reside, in parallel, without having to copy or move the data
for analysis. Many analytical computing solutions and large databases use this
technology because it provides significant performance improvements over more
traditional methods. Come see how SAS Analytics in Teradata works and learn
some of the best practices demonstrated in this session.
Confidence Intervals for Binomial Proportion Using SAS: The All You Need to Know and No More…
Jiangtang Hu
SD-103
Confidence Intervals (CI) is extremely important in presenting clinical
report. The choosing of right algorithms of CI is the plate of statisticians, but this
paper is for SAS programmers where more than 15 methods to compute CI for
single proportion is presented with SAS codes, by SAS procedures or customized
codes.
These codes is currently hosted in my Github page:
https://raw.githubusercontent.com/Jiangtang/Programming-SAS/master/CI_Single_Pr
oportion.sas
Some commentaries from A SAS programmer’s point of view will also be
presented.
Introducing Two-Way and Three-Way Interactions into the Cox Proportional Hazards Model Using SAS®
Seungyoung Hwang
SD-39
The Cox proportional hazards model to explore the effect of explanatory
variables on survival is by far the most popular and powerful statistical
technique. It is used throughout a wide variety of types of clinical
studies. However, special techniques are required when multiple interaction terms are
introduced into the Cox model. This paper provides an in-depth analysis, with
some explanation of the SAS® code. It examines two-way and three-way
interaction terms into the Cox proportional hazards model using SAS. Examples
of using the PHREG procedure are drawn from clinical data that we recently
submitted to the Journal of American Geriatrics Society (JAGS).
A SAS Algorithm for Imputing Discrete Missing Outcomes Based on Minimum Distance
Macaulay Okwuokenye and Karl E. Peace
SD-113
Missing outcome data are encountered in many clinical trials and public health
studies and present challenges in imputation. We present a simple and easy to
use SAS based imputation method for missing discrete outcome data. The method
is based on minimum distance between baseline covariates of those with missing
data and those without missing data. The imputation algorithm, a method that
may be viewed as a variant of the hotdec imputation method, imputes missing
values that are “close to” the observed values, implying that had there
been data on those missing, it would have been similar to
those non-missing. An illustrative example will be presented.
A Macro for Calculating Percentiles on Left Censored Environmental Data using the Kaplan-Meier Method
Dennis Beal
SD-160
Calculating percentiles such as the median and quartiles is straight forward
when the data values are known. However, environmental data often are reported
from the analytical laboratory as left censored, meaning the actual
concentration for a given contaminant was not detected above the method
detection limit. Therefore, the true concentration is known only to be between
0 and the reporting limit. The nonparametric Kaplan-Meier product limit
estimator has been widely used in survival analysis on right censored data, but
recently this method has also been applied to left censored data. Kaplan-Meier
can be used on censored data with multiple reporting limits with minimal
assumptions. This paper presents a SASâ macro that calculates percentiles
such as the median of a left censored environmental data set using the
nonparametric Kaplan-Meier method. Kaplan-Meier has been shown to provide more
robust estimates of the mean, standard deviation and percentiles of left
censored data than other methods such as simple substitution and maximum
likelihood estimates. This paper is for intermediate SAS users of SAS/BASE.
Using SAS to Create an Effect Size Resampling Distribution for a Statistical Test
Peter Wludyka
SD-159
One starts with data to perform a statistical test of a hypothesis. An effect
size is associated with a particular test/sample and this effect size can be
used to decide whether there is a clinically/operationally significant effect. Since
the data (the sample) is usually all a researcher knows factually
regarding the phenomenon under study, one can imagine that by sampling
(resampling) with replacement from that original data that additional
information about the hypothesis and phenomenon/study can be acquired.
One way to acquire such information is to repeatedly resample for the original data set
(using, for example, PROC SURVEYSELECT) and at each iteration (replication of
the data set) perform the statistical test of interest and calculate the
corresponding effect size. At the end of this stage one has R effect sizes (R
is typically greater than 1,000), one for each performance of the statistical
test. This effect size distribution can be presented in a histogram. Uses for
this distribution and its relation to the p-value resampling distribution which
was presented at SESUG 2014 will be explored.
How to be a Data Scientist with SAS(r)
Chuck Kincaid
SD-100
The role of the Data Scientist is the viral job description of the decade. And
like LOLcats, there are many types of Data Scientists. What is this new role? Who is hiring
them? What do they do? What skills are required to do their job? What does this mean for the SAS programmer and the statistician? Are they
obsolete? And finally, if I am a SAS user, how can I become a Data Scientist? Come learn about this “job of the future” and what you can do to be part of it.
Power and Sample Size Computations
John Castelloe
SD-200
Power determination and sample size computations are an important aspect of
study planning and help produce studies with useful results for minimum
resources. This tutorial reviews basic methodology for power and sample size
computations for a number of analyses including proportion tests, t tests,
confidence intervals, equivalence and noninferiority tests, survival analyses,
correlation, regression, ANOVA, and more complex linear models. The tutorial
illustrates these methods with numerous examples using the POWER and GLMPOWER
procedures in SAS/STAT® software as well as the Power and Sample Size
Application. Learn how to compute power and sample size, perform sensitivity
analyses for other factors such as variability and type I error rate, and
produce customized tables, graphs, and narratives. Special attention will be
given to the newer power and sample size analysis features in SAS/STAT software
for logistic regression and the Wilcoxon-Mann-Whitney (rank-sum) test.
Prior exposure to power and sample size computations is assumed.
Where Did My Students Go?
Stephanie Thompson
SD-136
Many freshmen leave their first college and go on to attend another
institution. Some of these students are even successful in earning degrees
elsewhere. As there is more focus on college graduation rates, this paper shows
how the power of SAS® can pull in data from many disparate sources, including
the National Student Clearinghouse, to answer questions on the minds of many
institutional researchers. How do we use the data to answer questions such as
“What would my graduation rate be if these students graduated at my
institution instead of at another one?", “What types of schools do students
leave to attend?”, and “Are there certain characteristics of students who
leave, and are they concentrated in certain programs?” The data-handling
capabilities of SAS are perfect for this type of analysis, and this
presentation walks you through the process.
Comparing Results from Cox Proportional Hazards Models using SUDAAN® and SAS® Survey Procedures to a Logistic Regression Model for Analysis of Influenza Vaccination Coverage
Yusheng Zhai, Katherine Kahn, Alissa O’Halloran and Tammy Santibanez
SD-42
The National Immunization Survey-Flu (NIS-Flu) is an ongoing, national
telephone survey of households with children in the United States used to
measure influenza vaccination coverage. The data collected by NIS-Flu has
similarities to data typically analyzed using survival analytic procedures. Estimates of vaccination coverage from the
NIS-Flu survey are calculated using
Kaplan-Meier survival analysis procedures to account for censoring of the data. However, multivariable models to examine socio-demographic characteristics associated with receipt of influenza vaccination using NIS-Flu data have
typically been done using logistic regression rather than using survival
analytic methods. The logistic regression approach ignores the time-to-event
and censoring characteristics of the influenza data and assumes that censoring
throughout the survey period occurs equally among the comparison groups of
interest. If this assumption is untrue, measures of association for receipt of
influenza vaccine could be biased. Another approach used to address the
censoring issues in NIS-Flu data is to restrict the logistic regression
analysis to interviews conducted at the end of the vaccination period (i.e.,
March-June) when it is unlikely that many respondents would be vaccinated after
the time of interview. However, this approach ignores a large amount of data,
results in a reduced precision of estimates, and potentially exacerbates recall
bias.
The project assessed the feasibility, methods, and advantages of using a Cox
proportional hazards model as opposed to a logistic regression model using full
NIS-Flu 2013-14 season data and a logistic regression model using end of
vaccination period data. This project also compared the results of Cox
proportional hazards model from SUDAAN SURVIVAL and from SAS SURVEYPHREG
procedures.
The results from logistic model seem to slightly underestimate the associations
between vaccination status and demographic characteristics, yet logistic model
remains a reasonable alternative to the Cox proportional hazards model in
analyzing the NIS-Flu data. The SAS SURVEYPHREG and the SUDAAN SURVIVAL
produced nearly identical Cox proportional hazards model results.
Conclusions drawn based on the results from logistic regression and Cox proportional
hazards models using full or post-vaccination period NIS-Flu data are
comparable.
An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies
Yan Wang, Seang-Hwane Joo, Patricia Rodriguez de Gil, Jeffrey Kromrey, Rheta E. Lanehart, Eun Sook Kim, Jessica Montgomery, Reginald Lee, Chunhua Cao and Shetay Ashford
SD-177
Missing data are a common and significant problem that researchers and data
analysts encounter in applied research. Because most statistical procedures
require complete data, missing data can substantially affect the analysis and
the interpretation of results if left untreated. Methods to treat missing data
have been developed so that missing values are imputed and analyses can be
conducted using standard statistical procedures. Among these missing data
methods, Multiple Imputation has received considerable attention and its
effectiveness has been explored, for example, in the context of survey and
longitudinal research. This paper compares four Multiple Imputation approaches
for treating missing continuous covariate data under MCAR, MAR, and MNAR
assumptions in the context of propensity score analysis and observational
studies. The comparison of four Multiple Imputation approaches in terms of bias
and variability in parameter estimates, Type I error rates, and statistical
power is presented. In addition, complete case analysis (listwise deletion) is
presented as the default analysis that would be conducted if missing data are
not treated. Issues are discussed, and conclusions and recommendations are
provided.
Can Fit Indices Yielded from the SAS GLIMMIX Procedure Select the Accurate Q-matrix?
Yan Wang, Yi-Hsin Chen, Issac Y. Li and Chunhua Cao
SD-179
In educational diagnostic assessments, it is not uncommon to develop several
competing Q-matrices that specify item-and-attribute relations and select the
best fit Q-matrix among them to make valid inferences about students’
strengths and weaknesses of cognitive attributes. Thus, selecting an accurate
Q-matrix plays a crucial role in making valid inferences in diagnostic
analyses. This study is intended to examine the effectiveness of fit indices
yielded from the SAS GLIMMIX procedure for selecting the accurate Q-matrix
using the cross random effects linear logistic test model (CRE-LLTM). A
simulation study is designed and five fit indices (i.e., log likelihood, AIC,
AICs, BIC, and HQIC) are examined. Five design factors are manipulated,
including sample size (50, 250, and 500), population distribution of cognitive
attributes (normal, positively skewed, and negatively skewed), percentage
(2.4%, 4.8%, and 9.6%) and type (over, under, and balanced) of the Q-matrix
misspecification as well as the Q-matrix density (sparse and dense). The number
of items is fixed to be 21 with 8 attributes. Datasets are simulated using the
SAS/IML package. For each condition, 1000 replications are generated. The
accuracy of selection is computed as the proportion of replications that select
the true Q-matrix as indicated by smaller values of fit indices. In addition,
factorial ANOVA analyses with the generalized eta-squared effect size are
employed to examine the impact of the manipulated factors on selecting the true
Q-matrix. The results indicate that the overall performance of the five fit
indices is similar. When sample size increases (e.g., N=500) it is relatively
easier for all indices to select the true Q-matrix. Not surprisingly, when the
misspecification percentage is larger (e.g., 9.6%), fit indices can be more
accurate to select the true Q-matrix regardless of the Q-matrix density,
misspecification type, and sample size. These fit indices seem to be more
sensitive to misspecification of sparse Q-matrices than to the dense
Q-matrices. They are also more sensitive to type of the Q-matrix
misspecification than percentage of misspecification.
Integrating PROC REG and PROC LOGISTIC for Collinearity Examination, Sample Scoring and Model Evaluation
Alec Zhixiao Lin
SD-69
At the final stage of regression, a modeler needs to examine the
multicollinearity between model attributes, to score all sample files and to
evaluate model performance. Existing options in PROC LOGISTIC and PROC REG are
somewhat different for obtaining variance inflation factor (VIF), conditional
index as well as for scoring sample files. This paper provides an efficient
and foolproof process in SAS® that integrates those functionalities with a
minimal manual handling needed. Multiple standardized summaries from the SAS
output also provide valuable insights that can be shared with business peers.
Behavioral Trajectories: An Analysis of Growth Patterns and Predictors of Changes from Preschool to First Grade
Jin Liu, Fan Pan, Yin Burgess and Christine DiStefano
SD-110
The purpose of this study is to investigate the behavioral trajectories of
children from preschool to First Grade and how the behavioral changes relate to
children’s demographic information. We track 233 children’s change
throughout the duration of three years from preschool to First Grade from fall,
2011 to spring, 2014 (6 time points). Participants are 233 children from Grades
One and Two. PROC MIXED in SAS ® 9.4 is used for data analysis. Linear mixed
models are selected to investigate the personal variation in behavioral
changes. Results indicate that children’s externalizing problems are stable
over time, while children’s internalizing problems decrease over time. Children’s
adaptive skills increase at the beginning but decrease at the end
of First Grade. Boys and free/reduced lunch status children are with more
problems at the beginning of preschool. We also find significant gender
(English language status) differences on adaptive skills over time. The
information can assist teachers, school psychologists, and others who are
concerned with children’s behavioral and emotional health.