SESUG 2015 Conference Abstracts

Application Development

Integrating Microsoft VBScript and SAS
Christopher Johnson
AD-9

VBScript and SAS are each powerful tools in their own right.  These two technologies can be combined so that SAS code can call a VBScript program or vice versa.  This gives a programmer the ability to automate SAS tasks, traverse the file system, send emails programmatically, manipulate Microsoft® Word, Excel, and PowerPoint files, get web data, and more.  This paper will present example code to demonstrate each of these capabilities.


One SAS To Rule Them All…
William Zupko
AD-86

In order to make graphs and charts, our audience preferred Excel charts and graphs compared to SAS charts and graphs.  However, to make the necessary 30 graphs in Excel took 2-3 hours of manual work, even having chart templates already created, and also led to mistakes due to human error.  SAS graphs took much less time to create, but lacked key functionality that the audience preferred available in Excel graphs.  Thanks to SAS, the answer came in X4ML programming.  SAS can actually submit coding to Excel in order to create customized data reporting, create graphs or update templates’ data series, and even populate word documents for finalized reports.  This paper explores how SAS is used to create presentation-ready graphs in a proven process that takes less than one minute, compared to the earlier process that took hours.  The following code will be utilized and/or discussed: %macro(macro_var), filename, rc commands, ODS, X4ML, and VBA (Microsoft Visual Basic for Applications).


Using PROC SURVEYSELECT: Random Sampling
Raissa Kouadjo
AD-190

This paper will examine some of the capabilities of PROC SURVEYSELECT in SAS Studio to show the task of drawing a random sample.  Every SAS programmer needs to know how to design a statistically efficient sample.  PROC SURVEYSELECT allows the user the flexibility to customize the design parameters.


Using SAS PROC SQL to Create a Build Combinations Tool to Support Modularity
Stephen Sloan
AD-32

With SAS PROC SQL we can use a combination of a manufacturing Bill of Materials and a sales specification document to calculate the total number of configurations of a product that are potentially available for sale.  This will allow the organization to increase modularity with maximum efficiency.

Since some options might require or preclude other options, the result is more complex than a straight multiplication of the numbers of available options.  Through judicious use of PROC SQL, we can maintain accuracy while reducing the time, space, and complexity involved in the calculations.


SAS/GRAPH and Annotate Facility--More Than Just a Bunch of Labels and Lines
Mike Hunsucker
AD-48

SAS/GRAPH procedures enhanced with the Annotate facility are a cornerstone capability that provides flexible capability to customize graphical displays that goes well beyond the "standard" outputs of SAS/GRAPH PROCs.  This paper does not attempt to describe unique or seldom used capabilities in SAS/GRAPH but instead it will expose the audience to several ways to exploit the Annotate facility that enhances output far beyond an occasional label or line drawing.  Products reviewed provide situational awareness to military planners and decisions makers daily.

INTRODUCTION:   14th Weather Squadron in Asheville, NC, is the Department of Defense’s climatology organization supplying planning weather and climatological statistics to military, intelligence, and research communities.  The squadron has exploited SAS capabilities for over 25 years but recently implemented dynamically built SAS/GRAPH graphics-based capabilities ranging from simple “cartoon” visualizations for deploying military members to complex statistical extreme-values gradient maps for national laboratory researchers.

This paper will highlight SAS/GRAPH capabilities including GFONT, GMAP, G3GRID, GINSIDE, GSLIDE, and more.


Five Little Known, But Highly Valuable and Widely Usable, PROC SQL Programming Techniques
Kirk Paul Lafler
AD-16

The SQL Procedure contains a number of powerful and elegant language features for SQL users.  This presentation highlights five little known, but highly valuable and widely usable, topics that will help users harness the power of the SQL procedure.  Topics include using PROC SQL to identify FIRST.row, LAST.row and Between.rows in BY-group processing; constructing and searching the contents of a value-list macro variable for a specific value; data validation operations; data summary operations to process down rows and across columns; and using the MSGLEVEL= system option and _METHOD SQL option to capture information into the processes during query evaluation, the algorithm selected and used by the optimizer when processing a query, testing and debugging operations, and other processes.


Masking Data To Obscure Confidential Values: A Simple Approach
Bruce Gilsen
AD-38

When I help users design or debug their SAS ® programs, they are sometimes unable to provide relevant SAS data sets because they contain confidential information.  Sometimes, confidential data values are intrinsic to their problem, but often the problem could still be identified or resolved with innocuous data values that preserve some of the structure of the confidential data.  Or, the confidential values are in variables that are unrelated to the problem.

While techniques for masking or disguising data exist, they are often complex or proprietary.  In this paper, I describe a very simple macro, REVALUE, that can change the values in a SAS data set.  REVALUE preserves some of the structure of the original data by ensuring that for a given variable, observations with the same real value have the same replacement value, and if possible, observations with a different real value have a different replacement value.  REVALUE allows the user to specify the variables to change and whether to order the replacement values for each variable by the sort order of the real values or by observation order.

In this paper, I will discuss the REVALUE macro in detail, and provide a copy of the macro.


Unlock SAS Code Automation with the Power of Macros
William Zupko
AD-87

SAS code, like any computer programming code, seems to go through a life cycle depending on the needs of that code.  Often, SAS programmers need to determine where a code might be in that life cycle and, depending on what that code is used for, choose to maintain, update, or reuse SAS code in current or future projects.  These SAS programmers need to decide what the best option for the code is.  Simple code that has few variables or options is easy to leave hard-coded, as it is a quick fix for the programmer to maintain and update this code.  Complex code, which can have multiple variables and options, can be difficult to maintain and update.  This paper goes through the process a SAS programmer might encounter and talk about times when it is useful and necessary to automate SAS code.  Then, it explores useful SAS code that helps in maintenance and updating utilities, talking about when an option is appropriate and when it is not.  The following SAS code will be utilized: %macro, %let, call symput(x), symget, %put, %if, %do, :into, ods output, %eval, option statements, and %sysfunc.


GreenSpace: A Macro to Improve a SAS Data Set Footprint
Brian Varney
AD-150

SAS programs can be very I/O intensive.  SAS Data Sets with inappropriate variable attributes can degrade the performance of SAS programs.  Using SAS compression offers some relief but does not eliminate the issue of inappropriately defined SAS variables.  This paper intends to examine the problems inappropriate SAS variable attributes cause as well as a macro to tackle the problem of minimizing the footprint of a SAS Data Set.


Automating Simulation Studies with Base SAS(r) Macros
Vincent Hunter
AD-93

Simulations are common in methodological studies in the social sciences.  Even the most dedicated researchers have difficulty processing more than 100-200 repetitions, especially where different analysis programs must be processed in sequence.  However, studies requiring hundreds or thousands of processing repetitions of data under the same set of study conditions are necessary for robust results.  Where different study conditions are to be compared, the number of repetitions becomes even larger making the processing of an adequate number of iterations all but impossible when done one at a time.

Base SAS offers two tools for automating the processing of simulations: (1) Macros which divide the task into distinct jobs that may be easily modified for different conditions and number of iterations; (2) the ability to invoke non-SAS analysis programs (e.g., Mplus, R, and Bilog).  Using these tools a researcher can create and process an appropriate amount of simulated data to obtain adequate power and control of Type I and II errors.

A recent simulation performed by the author is used as an example.


Rapidly Assessing Data Completeness
David Abbott
AD-130

Data analysts are often asked to work with collections of data sets prepared by others and with varying degrees of history/documentation.  An important early question is, “How complete are these data? What data completeness issues might be present?”  This paper presents an efficient technique for addressing this question both in terms of characterizing the number and patterns of missing values and, similarly, the omitted rows of data (i.e., primary identifier values not occurring in a given data set and occurring in some other dataset).

Several short macros and two key algorithms enable the technique presented.  The first algorithm produces a table of missing value patterns in the style of PROC MI on a per dataset basis.  The second performs the manipulations needed to exhibit patterns of missing identifiers across a collection of datasets.

Following this technique, analysts will be able to rapidly assess data completeness of inherited data set collections, provided a primary identifier (e.g., a subject ID) is used consistently in the collection.


Programming Compliance Made Easy with a Time Saving Toolbox
Patricia Guldin
AD-35

Programmers perform validation in accordance with established regulations, guidelines, policies and procedures to ensure the integrity of analyses and reporting, reduce risk for delays in product approvals, fines, legal actions, and to safeguard reputations.  We understand the importance, but the time involved to produce and appropriately store the documentation and evidence required to prove we followed process and SOPs can be labor intensive and burdensome.  Using SAS/AF®, SAS® Component Language and .NET we have developed two versions of an automated tool that can be used with PC SAS® or Enterprise Guide®.  The toolbox is designed to make compliance with programming SOPs easier, increase consistency, and save the programmer time.  The toolbox auto-populates some information and saves documentation in designated locations as actions are performed.  Functions include creating and verifying a standard program header, updating program headers, revision history and version date, creating validation environments including testing checklists, and promoting programs.  The toolbox is also used to view transaction logs, create and/or generate batch jobs for remote execution in UNIX, and to select and include macro calls from a macro library.


SAS Data Integration Studio – Take Control with Conditional & Looping Transformations
Harry Droogendyk
AD-167

SAS Data Integration Studio jobs are not always linear.  While Loop transformations have been part of DI Studio for ages, only more recently has SAS Data Integration Studio included the Conditional Control transformations to control logic flow within a job.  This paper will demonstrate the use of both the Loop and Conditional transformations in a real world example.


A Methodology for Truly Dynamic Prompting in SAS® Stored Processes
Haikuo Bian, Carlos Jimenez and David Maddox
AD-172

Dynamic prompts in SAS stored processes may be developed by selecting the “dynamic list” option during prompt construction.  The list is usually sourced from a SAS dataset that is pre-defined in the metadata.  However, the process of refreshing the dataset is usually independent of the stored process and must be included somewhere in the application.  Using SAS views as a source for dynamic prompts will insure that the list will be truly dynamic.  This paper illustrates the process with a cascading prompt example.


The Perfect Marriage: Using SAS Enterprise Guide, the SAS Add-In for Microsoft Office, and Excel to Support Enrollment Forecasting at A Large University
Andre Watts and Lisa Sklar
AD-141

The Office of Institutional Research at the University of Central Florida is tasked with supporting the enrollment management process of the institution by providing five year enrollment forecasting of various enrollment measures.  A key component of the process is providing university stakeholders with a self-service, secure, and flexible tool that enables them to quickly generate different enrollment projections using the most up-to-date information as possible in Microsoft Excel.  This presentation will show an example of how to effectively integrate both SAS Enterprise Guide and the SAS Add-In for Microsoft Office to support a critical process which has very specific stakeholder requirements and expectations.


You’ve Got Mail®: Automating SAS® from an Email
Peter Davis and Mark Asiala
AD-149

The American Community Survey is an ongoing population and housing survey that provides data every year – giving communities the current information they need to plan investments and services.  As such, repetitive processing is necessary and must be completed in a timely manner.  Automation, where appropriate, is an essential component for operational efficiency.

As an example of where automation is implemented, we receive an email each month that serves as notification that one operation has completed and the next operation may begin.  Instead of waiting for this email to manually submit our SAS programs, what if the delivery of the email initiated our SAS programs?

This paper demonstrates a KornShell (ksh93) script which parses through an email delivered to a user’s UNIX email account.  The script “reads” the email.  As long as the deliverer and the subject of the email meet certain requirements, the appropriate SAS programs are submitted.  If not, an email is sent to the user stating that an email was received but no further action occurred.


Data Labs with SAS and Teradata: Value, Purpose and Best Practices
Tho Nguyen and William E. Benjamin Jr
AD-201

A data lab also called a ‘play pen’ or ‘sand box’ is an area to explore and examine ideas and possibilities by combining new data with existing data to create experimental designs and ad-hoc queries without interrupting the production environment.  A Teradata data lab with SAS that provides SAS users immediate access to critical data for exploration and discovery.  It is an environment that enables agile in-database analytics by simplifying the provisioning and management of analytic workspace within the production data warehouse.  By allocating that space, it provides data lab users easy access to all of the data without moving or duplicating the data.  Come learn how SAS and Teradata are integrated in the data lab, hear some best practices and use cases from our joint customers.



Banking and Finance


Migrating Databases from Oracle to Teradata
Phillip Julian
BKF-91

We carefully planned and provisioned our database migration from Oracle to Teradata.  We had timelines, weekly progress and planning meetings, presentations comparing current state to the future state, detailed project plans for each task, Oracle DBAs, consultants from Teradata, and support from all departments.  It was an ideal situation for moving the Enterprise to a new system of record based upon Teradata.

Our team had delays and data issues that no one could anticipate.  I had researched every issue that may happen with SAS upgrades, Teradata, and our particular environment.  But the literature and support did not prepare us for anticipating or solving our migration problems.  Instead of 6 months, we only had 6 weeks to finish the project.

We had no time to hire an army of experts, so we had to solve our own issues.   I will describe those issues, our solutions, and our tricks that facilitated rapid development.  I'm still surprised at our rapid progress, and I am thankful for the team's efforts and ingenuity.  Our experiences should help others who are planning or performing database and SAS migrations.

Our industry is financial services, we are regulated by the federal government, and we must keep records for every change.  Our software environment is SAS on UNIX, SAS on PC, Teradata, Oracle, and Data Integration Studio with the framework of SAS Credit Scoring for Banking.  The UNIX hosts are multi-tiered with separate development and production platforms.


Getting Your SAS® Program to do Your Typing for You!
Nancy Wilson
BKF-55

Do you have a SAS® program that requires adding file names to the input every time you run it?  Aren't you tired of having to check for the files, check the names and type them in?  Check out how this SAS® Enterprise Guide Project checks for files, figures out the file names and eliminates the need for having to type in the file names for the input data files!


Reducing Credit Union Member Attrition with Predictive Analytics
Nate Derby and Mark Keintz
BKF-118

As credit unions market themselves to increase their market share against the big banks, they understandably focus on gaining new members.  However, they must also retain (and further engage) their existing members.  Otherwise, the new members they gain can easily be offset by existing members who leave.  Happily, by using predictive analytics as described in this paper, it can actually be much easier and less expensive to keep (and further cultivate) existing members than to enlist new ones.

This paper provides a step-by-step overview of a relatively simple but comprehensive approach to reduce member attrition.  We first prepare the data for a statistical analysis.  With some basic predictive analytics techniques, we can then identify those members who have the highest chance of leaving and the highest value.  For each of these members, we can also identify why they would leave, thus suggesting the best way to intervene to retain them.  We then make suggestions to improve the model for better accuracy.  Finally, we provide suggestions to extend this approach to cultivating existing members and thus increasing their lifetime value.

Code snippets will be shown for any version of SAS but will require the SAS/STAT package.  This approach can also be applied to many other organizations and industries.


Population Stability and Model Performance Metrics Replication for Business Model at SunTrust Bank
Bogdan Gadidov and Benjamin McBurnett
BKF-132

Board of Governors of the Federal Reserve System has published Supervisory Guidance on Model Risk Management (SR Letter 11-7) emphasizing that banks rely heavily on quantitative analysis and models in most aspects of financial decision making.  Ongoing monitoring and maintenance (M&M) is essential for timely evaluation of model performance to determine whether changes in business strategies and market conditions require adjustment, redevelopment, or replacement of the model.  A typical M&M plan includes tracking of Population Stability Index (PSI), Rank Ordering Test, and Kolmogorov-Smirnov Statistic (KS).  As part of an internship program at SunTrust bank, I was able to track these key metrics for one business critical model.  The model uses a logistic regression to predict the probability of default for a given customer.

To track the three metrics stated above, data from quarter 1 of 2014 is compared with a baseline distribution, generally the dataset which is used to create the model.  PSI quantifies the shift in the distribution of the population between the baseline and current time periods.  Rank Ordering Testing involves comparing the expected default rate, predicted by the model, to the actual default rate in the current quarter.  The KS statistic assesses model performance by measuring the model's ability to discern defaults from non-defaults.  The npar1way procedure was used in SAS to calculate KS.  Reports and charts presented in this poster will be sanitized due to the confidential nature of the data, but methodology and step-by-step procedures represent actual research results.


Analysis of the Impact of Federal Funds Rate Change on US Treasuries Returns using SAS
Svetlana Gavrilova and Maxim Terekhov
BKF-181

This paper analyzes the impact of federal funds rate changes on government bond returns and return volatility and compares it with equities market reaction.  The purpose of this work is to construct a model estimating an expected risk exposure at a hypothetical point of time in the future given a description of current market conditions and historically observed events, which can be helpful in explaining expected movements and predicting future bond prices and volatility to use by portfolio managers in choosing asset allocation.  We identify to which extent there is an impact of rate changes and the length of its effects.  For our analysis we use data on major government bond prices and major macroeconomic characteristics of the U.S. economy between February 1990 and June 2015, collected on a daily basis.  We model and forecast expected returns using ARIMA modeling based on different scenarios.  Vector Autoregression and Vector Error Correction modeling is applied to estimate the impact of rate changes on government bonds performance and volatility.  Credit markets behavior is compared to the equities market reaction.  Findings are consistent with the previously published papers.  US treasuries positively react on Federal Funds rate change, while equities market demonstrates a negative reaction.  Long-term relationship between US Treasuries markets and Federal Funds rate is identified.  The fact that a change in US treasuries market may be Granger caused by a change in Federal Funds target rate is statistically proved.  All estimations are performed using SAS software.


Regulatory Stress Testing—A Manageable Process with SAS®
Wei Chen
BKF-195

As a consequence of the financial crisis, banks are required to stress test their balance sheet and earnings based on prescribed macroeconomic scenarios.  In the US, this exercise is known as the Comprehensive Capital Analysis and Review (CCAR) or Dodd-Frank Act Stress Testing (DFAST).  In order to assess capital adequacy under these stress scenarios, banks need a unified view of their projected balance sheet, incomes, and losses.  In addition, the bar for these regulatory stress test is very high regarding governance and overall infrastructure.  Regulators and auditors want to ensure that the granularity and quality of data, model methodology, and assumptions reflect the complexity of the banks.  This calls for close internal collaboration and information sharing across business lines, risk management, and finance.  Currently, this process is managed in an ad hoc, manual fashion.  Results are aggregated from various lines of business using spreadsheets and Microsoft SharePoint.  Although the spreadsheet option provides flexibility, it brings ambiguity into the process and makes the process error prone and inefficient.  This paper introduces a new SAS® stress testing solution that can help banks define, orchestrate and streamline the stress-testing process for easier traceability, auditability, and reproducibility.  The integrated platform provides greater control, efficiency, and transparency to the CCAR process.  This will enable banks to focus on more value-added analysis such as scenario exploration, sensitivity analysis, capital planning and management, and model dependencies.  Lastly, the solution was designed to leverage existing in-house platforms that banks may already have in place.



Building Blocks

Point-and-Click Programming Using SAS® Enterprise Guide®
Kirk Paul Lafler and Mira Shapiro
BB-14

SAS® Enterprise Guide® (EG) empowers organizations with all the capabilities that SAS has to offer.  Programmers, business analysts, statisticians and end-users have a powerful graphical user interface (GUI) with built-in wizards to perform reporting and analytical tasks, access to multi-platform enterprise data sources, deliver data and results to a variety of mediums and outlets, construct data manipulations without the need to learn complex coding constructs, and support data management and documentation requirements.  Attendees learn how to use the GUI to access tab-delimited and Excel input files; subset and summarize data; join two or more tables together; flexibly export results to HTML, PDF and Excel; and visually manage projects using flowcharts and diagrams.


A SURVEY OF SOME USEFUL SAS FUNCTIONS
Ron Cody
BB-193

SAS Functions provide amazing power to your DATA step programming.  Some of these functions are essential—some of them save you writing volumes of unnecessary code.  This talk covers some of the most useful SAS functions.  Some of these functions may be new to you and they will change the way you program and approach common programming tasks.  The majority of the functions described in this talk work with character data.  There are functions that search for strings, others that can find and replace strings or join strings together.  Still others that can measure the spelling distance between two strings (useful for "fuzzy" matching).  Some of the newest and most amazing functions are not functions at all, but call routines.  Did you know that you can sort values within an observation?  Did you know that not only can you identify the largest or smallest value in a list of variables, but you can identify the second or third or nth largest of smallest value?  A knowledge of the functions described here will make you a much better SAS programmer.


Tales from the Help Desk 6: Solutions to common SAS ® tasks
Bruce Gilsen
BB-72

In 30 years as a SAS ® consultant at the Federal Reserve Board, questions about some common SAS tasks seem to surface again and again.  This paper collects some of these common questions, and provides code to resolve them.  The following tasks are reviewed.
  1. Convert a variable from character to numeric or vice versa and keep the same name.
  2. Convert multiple variables from character to numeric or vice versa and keep the same names.
  3. Convert character or numeric values to SAS date values.
  4. Use a data set when the custom format assigned to a variable cannot be found.
  5. Use an array definition in multiple DATA steps.
  6. Use values of a variable in a data set throughout a DATA step by copying the values into a temporary array.
  7. Use values of multiple variables in a data set throughout a DATA step by copying the values into a 2-dimensional temporary array.
In the context of discussing these tasks, the paper provides details about SAS system processing that can help users employ the SAS system more effectively.  This paper is the sixth of its type


Using the SAS Hash Object with Duplicate Key Entries
Paul Dorfman
BB-94

By default, the SAS hash object permits only entries whose keys, defined in its data portion, are unique.  While in certain programming applications this is a rather utile feature, there also others, where being able to insert and manipulate entries with duplicate keys is imperative.  Such an ability, facilitated in SAS since Version 9.2, was a welcome development: it vastly expanded the functionality of the hash object and eliminated the necessity to work around the distinct-key limitation using custom code.  However, nothing comes without a price; and the ability of the hash object to store duplicate key entries is no exception.  In particular, additional hash object methods had to be - and were - developed to handle specific entries sharing the same key.  The extra price is that using these methods is surely not quite as straightforward as the simple corresponding operations on distinct-key tables, and the documentation alone is a rather poor help for making them work in practice.  Rather extensive experimentation and investigative coding is necessary to make that happen.  This paper is a result of such endeavor, and hopefully, it will save those who delve into it a good deal of time and frustration.


Introduction to SAS® Data Loader: The Power of Data Transformation in Hadoop
Keith Renison
BB-199

SAS Model Manager provides an easy way for deploying analytical models to various types of relational databases and to a Hadoop Distributed File System.  There are two publishing methods that can be used: scoring functions and the SAS®Embedded Process.  This paper gives a brief introduction of both the SAS® Model Manager publishing functionality and the SAS® Scoring Accelerator.  It describes the major differences between using the scoring function and the SAS Embedded Process publish methods to publish a model.  The paper also explains how to use SAS applications as well as SQL code outside of SAS® to perform in-database processing of a published model.  Along with Hadoop, the supported databases are Teradata, Oracle, Netezza, DB2, and SAP HANA.  Samples are provided for publishing a model in one of the supported databases and Hadoop.  After reading this paper, you should feel comfortable using a published model in your business environment.


A Beginner’s Babblefish: Basic Skills for Translation Between R and SAS®
Sarah Woodruff
BB-90

SAS professionals invest time and energy in improving their fluency with the broad range of capabilities SAS software has to offer.  However, the computer programming field is not limited to SAS alone and it behooves the professional to be well rounded in his or her skill sets.  One of the most interesting contenders in the field of analytics is the open source R software.  Due to its range of applications and the fact that it is free, more organizations are considering how to incorporate it into their operations and many people are already seeing its use incorporated into project requirements.  As such, it is now common to need to move code between R and SAS, a process which is not inherently seamless.

This paper serves as a basic tutorial on some of the most critical functions in R and shows their parallel in SAS to aid in the translation process between the two software packages.  A brief history of R is covered followed by information on the basic structure and syntax of the language.  This is followed by the foundational skill involved in importing data and establishing R data sets.  Next, some common reporting and graphing strategies are explored with additional coverage on creating data sets that can be saved, as well as how to export files in various formats.  By having the R and SAS code together in the same place, this tutorial serves as a reference that a beginner can follow to gain confidence and familiarity when moving between the two.


Sampling in SAS using PROC SURVEYSELECT
Rachael Becker and Drew Doyle
BB-129

This paper examines various sampling options that are available in SAS through PROC SURVEYSELECT.  We will not be covering all of the possible sampling methods or options that SURVEYSELECT features.  Instead, we will look at Simple Random Sampling, Stratified Random Sampling, Cluster Sampling, Systematic Sampling, and Sequential Random Sampling.


Hash: Is it always the best solution?
David Izrael and Elizabeth Axelrod
BB-75

When you get a new hammer, everything looks like a nail.  That’s how we felt about the Hash object when we started to use it: Wow, this is fantastic - It can solve everything!  But… we soon learn that everything is not a nail, and sometimes a hammer is not the best tool for the job.  In SAS Version 9, direct addressing with the Hash object was introduced, and this enabled users to perform look-ups much faster than traditional methods of joining or merging.   Even beyond look-ups, we can now use the HASH object for summation, splitting files, array sorting, and fuzzy matching - just to name a few.  What an all-purpose hammer!  But… is it always the best tool to use?

After a brief review of basic HASH syntax, we will pose some problems and provide several solutions, using both the Hash object and more traditional methods.  We will compare real- and CPU–time, as well as programmer’s time to develop the respective programs.  Which is the better tool for the job?  Our recommendations will be revealed through our results.


Table Lookups: Getting Started With Proc Format®
John Cohen
BB-144

Table lookups are among the coolest tricks you can add to your SAS® toolkit.  Unfortunately, these techniques can be intimidating both conceptually and in terms of the programming.  We will introduce one of the simplest of these techniques, employing Proc Format and the CNTLIN option as part of our construct.  With any luck, this will prove both easy-enough to program and more efficient to run.


No FREQ-in Way
Renee Canfield
BB-105

In the consumer credit industry, privacy is key and the scrutiny increases every day.  When returning files to a client, they must be depersonalized so the client cannot match back to any personally identifiable identification (PII).  This means we must locate any values for a variable that occur on a limited number of records and null them out (i.e. replace them with missing values).  Working with large files which have more than one million observations and thousands of variables, locating variables with few unique values is a difficult task.  While PROC FREQ and DATA step merging can accomplish the task, using first./last. by variable processing to locate the suspect values and hash objects to merge the data set back together may offer increased efficiency.


Better Metadata Through SAS® II: %SYSFUNC, PROC DATASETS, and Dictionary Tables
Louise Hadden
BB-57

SAS® provides a wealth of resources for users to create useful, attractive metadata tables, including PROC CONTENTS listing output (to ODS destinations), the PROC CONTENTS OUT= SAS data set, and PROC CONTENTS ODS Output Objects.  This paper and presentation explore some less well-known resources to create metadata such as %SYSFUNC, PROC DATASETS, Dictionary Tables, SASHELP views, and SAS "V" functions.  All these options will be explored with an eye towards exploring, enhancing, and reporting on SAS metadata.


Don’t Forget About Small Data
Lisa Eckler
BB-168

Beginning in the world of data analytics and eventually flowing into mainstream media, we are seeing a lot about Big Data and how it can influence our work and our lives.  Through examples, this paper will explore how Small Data -- which is everything Big Data is not -- can and should influence our programming efforts.  The ease with which we can read and manipulate data from different formats into usable tables in SAS® makes using data to manage data very simple and supports healthy and efficient practices.  This paper will explore how using small or summarized data can help to organize and track program development, simplify coding and optimize code.


To Macro or Not... that is the Question
Claudine Lougee
BB-176

Do you need a macro for your program?  How do you know if it's worth the time to create one for your program?  This paper will give some guidelines, based on user experience, if it's worth the time to create a macro whether it's parameter driven macro or just a simple macro variable.  Extra tips and tricks for using system macros will be provided.  This paper is geared towards new users and maybe experienced users who do not use macros.


Arrays – Data Step Efficiency
Harry Droogendyk
BB-157

Arrays are a facility common to many programming languages, useful for programming efficiency.  SAS® data step arrays have a number of unique characteristics that make them especially useful in enhancing your coding productivity.  This presentation will provide a useful tutorial on the rationale for arrays and their definition and use.


PROC TRANSPOSE: Flip your Data 90o and Save Time
Rachel Straney
BB-135

The process of transforming data from a vertical to horizontal structure is sometimes referred to as long-to-wide conversion, and is common in the analytical world.  Although there is always more than one way to accomplish a task using SAS®, PROC TRANSPOSE is a staple procedure that should be in every programmer’s tool box.  This paper will guide the reader through some basic examples of PROC TRANSPOSE and share situations where it is most appropriately used.


Hash Objects for Everyone
Jack Hall
BB-83

The introduction of Hash Objects into the SAS toolbag gives programmers a powerful way to improve performance, especially when JOINing a large data set with a small one.  This presentation will focus on the basics of creating and using a simple hash object, using an example from the Healthcare Insurance sector.



Coder's Corner

Implementing a Bayesian Approach to Record Linkage
Lynn Imel and Thomas Mule
CC-41

The Census Coverage Measurement survey-based program estimated household population coverage of the 2010 Decennial Census.  Calculating coverage estimates required linking survey person data to census enumerations.  For record linkage research, we applied a Bayesian Latent Class Models approach to both 2010 coverage survey data and simulated household data.  This paper presents our use of Base SAS® to implement the Bayesian approach.  It also discusses coding adaptations to handle changes including removing hard-coded variable names to allow for varying input parameters.


RUN; RUN; RUN; - Methods for Running Multiple Programs in a Series
Robert Matthews
CC-12

Anyone who has ever had to run a series of programs multiple times in a row has probably thought about ways to automate the process.  For example, if you have 25 programs that need to be run one after another, the normal method would be to run the first program, wait for it to finish, then submit the next one, and so on.  If you have the ability to run multiple SAS sessions, then you can speed up the process a bit by submitting programs in each session.  However, it still takes some time and effort to monitor the programs, wait for each one to finish, and then submit the next program in the series.  We encountered this issue several years ago and have developed two methods for implementing a “hands-off” approach for submitting a series of programs.  Some additional features we implemented include the ability to either stop or continue processing the remaining programs if an individual program in the series encounters an error as well as the ability to send email messages after individual programs in the series have been run.  These methods greatly reduce the need for manual intervention when running a long series of programs and help alleviate an otherwise laborious, and sometimes error-prone process.


Running Parts of a SAS Program while Preserving the Entire Program
Stephen Sloan
CC-33

We often have the need to execute only parts of a SAS program, while at the same time preserving the entire program for documentation or for future use.  This occurs when parts of the program have been run, when different circumstances require different parts of a program, or when only subsets of the output are required.

There are different ways in which parts of a program can be run while preserving the entire program:

Merging and Analysis of Complex Survey Data Sets by using Proc Survey Procedures in SAS
Nushrat Alam
CC-116

This paper is focused on merging and analysis of the complex survey data sets.  The sample design of any complex survey data is consists of stratification, clustering, multi-stage sampling, and unequal probability of selection of observations.This paper provides an outline of merging of different complex survey datasets and the use multiple SAS procedures like PROC SURVEYMEANS, PROC SURVEYFREQ, PROC SURVEYREG to analyze different variables.


A Macro To Copy, Rename, Update and Move Your SAS Programs- All In One Click
Julio Ruiz
CC-154

As SAS programmers we often have to copy, rename, update, and/or move SAS programs from one directory to another.  Performing these tasks can be time-consuming, particularly when two or more of them need to be performed manually.  This paper presents a macro developed in SAS that gives the end-user the ability to programmatically accomplish any of these tasks with one simple click.  The macro's main goal is to offer the end-user the ability to save time, as it can perform any or all of these tasks in a matter of seconds.


Count the Number Of Delimiters In the Input File
Kannan Deivasigamani
CC-26

Many of us are faced at times with surprise by the presence (or even absence) of an additional (missing) variable in the incoming file (file from the sender) due to the appearance of an additional delimiter as part of the data.  For example, if an address variable requires a parsing and unintentionally, a user inputs a pipe ("|") as part of the address, it can pose a problem if a pipe delimited file is created.  If 5 delimiters are expected on every record, the record with the special address will have 6 delimiters if not intervened and cleansed before being written to the file.  On the receiving end, if the file is just read with the usual DLM='|’ option, a shift in values will be noticed in the variables after the address.  In order to mitigate this situation, a small snippet of code to interrogate each record and ensure that all records have the same number of variables (delimiters) as expected on the receiving end can help.  If a record was received with unexpected delimiter count, then the process is halted and the support personnel can be alerted.  This will give some peace of mind to the recipient assuring a quality check from a variable count perspective.

In addition to preventing any erroneous processing, timely alert might save other EUC (End User Computing) related costs in the organization as well.  The file may be fixed and resent before another audit can screen through and release the file for further processing by other jobs/programs/scripts.  A rough example (code) of the delimiter audit is included to show how it might be applied to mitigate the issue.  The subsequent processing may be handled by the respective job schedulers used in different mainframe (or other) shops with appropriate controls in place as needed.


Tips for Identifying Patient Characteristics Associated with Diagnostic Concordance between Two Measures Using SAS®
Seungyoung Hwang
CC-37

Sensitivity, specificity, and positive and negative predictive values are often used in validation studies.  However, few have examined what patient characteristics are associated with diagnostic concordance between two measures of interest.  This paper provides an in-depth analysis, with some explanation of the SAS® code, to identify sociodemographic and clinical characteristics associated with diagnostic concordance between two measures of depression using SAS®.  Examples of using the GLIMMIX procedure are drawn from clinical data that we recently published in the American Journal of Geriatric Psychiatry.


PROC CATALOG, the Wish Book SAS® Procedure
Louise Hadden
CC-58

SAS® data sets have PROC DATASETS, and SAS catalogs have PROC CATALOG.  Michael Raithel characterizes PROC DATASETS as the “Swiss Army Knife of SAS Procedures” (Raithel, 2011).  PROC DATASETS can do an amazing array of tasks relating to SAS data sets; PROC CATALOG is a similar, utilitarian procedure.  It is handy (like a Leatherman® tool!) itself, and in conjunction with other SAS procedures can be very helpful in managing the special SAS files that are SAS catalogs.  Find out what the little known PROC CATALOG can do for you!


Document and Enhance Your SAS(R) Code, Data Sets, and Catalogs with SAS Functions, Macros and SAS Metadata
Louise Hadden and Roberta Glassl
CC-59

Discover how to document your SAS programs, data sets and catalogs with a few lines of code that include SAS functions, macro code and SAS metadata!  Learn how to conditionally process data based on the existence of a file, variable types, and more!  If you have ever wondered who was it that last ran a program that overwrote your data, SAS has the answer.


Using PROC MEANS to Sum Duration Of Eligibility For Medicaid Beneficiaries
John Wedeles
CC-164

Background: The Division of Analytics and Policy Research (DAPR) within the District of Columbia Department of Health Care Finance (DHCF) produces the annual CMS-416 report to document the performance of the Early and Periodic Screening, Diagnostic and Treatment (EPSDT) benefit for the District’s children under 21 who are enrolled in Medicaid.  The report requires the calculation of the total months of eligibility for all beneficiaries included in the report, as beneficiaries can have multiple enrollment spans in a given year.  Previously, duration of eligibility was calculated in multiple steps using Microsoft Excel, including IF functions and pivot tables.  DAPR sought to streamline the calculation of eligibility duration using SAS.

Methods: In SAS, binary variables were created for each month in the period of interest as indicators for eligibility, based on monthly enrollment dates.  The values for each of these binary variables were then vertically summed by beneficiary Medicaid number using PROC MEANS.  This step created a new data set with a de-duplicated list of Medicaid beneficiaries, and included a new variable representing the count of the months of eligibility for each beneficiary.  A new variable measuring the total months of eligibility was then created, which captured the sum of the variables created in the previous step.

Results: The use of PROC MEANS allowed DAPR to account for multiple enrollment spans for Medicaid beneficiaries in the reporting year.  DAPR was also able to use the summary variables to determine 90-day continuous eligibility, which is a requirement for inclusion in the denominator for several key measures of the CMS-416 report.

Conclusion: The PROC MEANS procedure allowed for more accurate and efficient calculation of beneficiary eligibility data, resulting in streamlined reporting capacities.  DAPR has continued to use the PROC MEANS procedure in several other reports where calculation of beneficiary eligibility is required.


The COMPRESS Function: Hidden Superpowers
Pamela Reading
CC-138

Most SAS programmers rely on the COMPRESS function for cleaning up troublesome string data.  The many uses of the third ‘modifier’ argument, added in Version 9, may not be as familiar.  This paper will present a quick summary of the options available and examples of their use.  It will conclude with an unusual application of the ‘keep’ option to reorder characters within a string.


Past, Present and Future... who KNEW (Knows New Exciting Ways)?
Claudine Lougee
CC-178

Did you ever think, "Someone must have done this before"?  If you've ever coded anything that took some time or was challenging, someone probably has done it another way.  The other way could be different, the same, easier, or slightly advanced.  This paper will provide a list of past, present and future SAS users and authors who are experienced in writing code and teaching methods through SAS papers and BBU (Books by Users).  This will be valuable for googling SAS papers and finding the right code for your needs.


All Data Are (Most Likely) Not Created Equal: A SAS® Macro to Compare Structure and Data Across Multiple Datasets
Jason Salemi
CC-40

In nearly every discipline, from Accounting to Zoology, whether you are a student-in-training or an established professional, a central tenet of interacting with information is to “Know Thy Data”.  Hasty compilation and analysis of inadequately vetted data can lead to misleading if not erroneous interpretation, which can have disastrous consequences ranging from business downfalls to adopting health interventions that worsen rather than improve the longevity and quality of people’s lives.  In some situations, knowing thy data involves only a single analytic dataset, in which case review of a data dictionary to explore attributes of the dataset supplemented with univariate and bivariate statistics will do the trick.  This has been discussed extensively in the literature and certainly in the SAS Global Forum and User’s Groups.  In other scenarios, there is a need for comparing the structure, variables, and even values of variables across two datasets.  Again, in this case, SAS offers a powerful COMPARE procedure to compare pairs of datasets, and many papers have offered macros to add additional functionality, refine the comparison, or simplify the analytic output.  However, imagine the following scenario: you are provided with or download a myriad of datasets, perhaps which are produced quarterly or annually.  Each dataset has a corresponding data dictionary and you might even be fortunate enough to have been provided with some code to facilitate importation into SAS.  Your initial goal, perhaps a “first date” with your new datasets, is to understand whether variables exist in every dataset, whether there are differences in the type or length of each variable, the absolute and relative missingness of each variable, and whether the actual values being input for each variable are consistent.  This paper describes the creation and use of a macro, “compareMultipleDS”, to make the first date with your data a pleasant one.  Macro parameters through which the user can control which comparisons are performed/reported as well as the appearance of the generated “comparison report” are discussed, and use of the macro is demonstrated using two case studies that leverage publicly-available data.


The Mystery of Automatic Retain in a SAS Data Step
Huei-Ling Chen and Hong Zhang
CC-34

The data step is the most frequently used programming process in the SAS System.  As programmers we should be very familiar with it.  However, sometimes we write a piece of code, but the output is not our expectation.  Is our code incorrect or are there mysteries inside the data step?  This paper will focus on one of the mysteries - automatic retain in a data step.  We will investigate how variables are automatically retained yet no retain statement is specified.  Examples are provided to demonstrate the pitfalls one can experience when constructing a data step.  Being cautious can avoid unexpected results.  This paper uses a PUT _ALL_ statement to demonstrate how automatic retain variables can be retained.


Successful Ways to Add and Drop Data, While Also Reformatting Data
Adetosoye Oladokun
CC-192

For my project my goal is to go through the process of explaining how to write codes in SAS 9.4.  The main codes of focus for this project will be how to drop variables, and reformat variables.  Besides that there will be codes that discuss how I uploaded my data set, created output for my data set and also how I used various frequency tables.  I have highlighted the areas that contain codes, procedures statements, log statements, and output statements.  To differentiate between the highlighted areas, I will put the heading in bold letters.  I will also provide a brief explanation so that will act as a better guide.


29 Shades of Missing
Darryl Putnam
CC-106

Missing values can have many flavors of missingness in your data and understanding these flavors of missingness can shed light on your data and analysis.  SAS® can identify 29 flavors of missing data, and a variety of functions, statements, procedures, and options can be used to whip your missing data into submission.  This paper will focus solely on how SAS can use missing values in unique and insightful ways.


Date Dimensiong
Christopher Johnson
CC-10

Intuition would suggest that it is more efficient to perform simple calculations as needed than to store calculations in a table for reference.  However, in some circumstances, creating lookup tables can save both programmer and CPU time.  Dates present a particular difficulty in any programming language.  This paper will present a data structure that can simplify date manipulations while gaining efficiency.


Delivering Quarterly Reporting Every Month – A Departure From the Traditional Calendar Definition, Using Formats
Barbara Moss and Anna Flynn
CC-140

What’s a quarter to you?  Feeling constrained by the typical calendar quarters of JAN thru MAR, APR thru JUN, JUL thru SEP and OCT thru DEC?   Quarterly trending applications need each date grouping to contain three months of data.  Running quarterly does not provide data frequently enough.  Running monthly, using the traditional definition of quarters, leaves some quarters containing only one or two months of data.  This distorts the output of trending patterns.  The requirement is to construct a quarter such that all data, up to the current month, is used.  For example, running in April the quarters would be defined as MAY thru JUL, AUG thru OCT, NOV thru JAN and FEB thru APR.  This allows the process to run with full quarters of data each and every month, exceeding data delivery beyond four times a year.  Leveraging PROC FORMAT, this presentation shows how to implement a rolling or shifting definition of quarters, allowing for quarterly reporting every month!


Using Multilabel Formats in SAS to Analyze Data Over Moving Periods of Time
Christopher Aston
CC-84

The Food Safety Inspection Service collects a plethora of data from all over the country on a daily basis.  Many of the Agency's performance measures that it uses to identify potential trends and to assess the effectiveness of its Policies on the Meat and Poultry Industry are based on the most recent 12 months of data.  Furthermore, these performance measures are normally assessed on a monthly or quarterly basis, so that these data are used multiple times in overlapping windows when we seek to do an analysis of performance over time (multiple windows).  The purpose of this paper is to present the method I devised to analyze time dependent data that is evaluated as a "moving window," i.e. each data point is used multiple times as in overlapping windows, so that the data are only analyzed one time.  This is accomplished specifically using multilabel formats in SAS to assign specific dates to more than one "period."


Where do the titles or footnotes go when using PROC SGPLOT in ODS PDF?
Julie Liu
CC-124

Normally people would think titles and/or footnotes are shown in the PROC SGPLOT graphs by default, provided that they are not turned off.  However, when using this procedure with the ODS PDF statement, it is surprisingly untrue.  Also placing titles or footnotes before or after PROC SGPLOT would show different results.  Finally, by using ODS LAYOUT, magically those titles or footnotes pop out in the output.  This presentation will use examples to demonstrate the effects.


Beautiful PROC CONTENTS Output Using the ODS Excel Destination
Suzanne Dorinski
CC-76

A member of the Census Bureau’s in-house SAS® users group asked how to export the output of PROC CONTENTS (variable name, type, length, and format) from several Oracle database tables within the same database to separate worksheets in an Excel file.  You can use the _all_ keyword to work with all the tables within a library.  The ODS Excel destination, which is production in SAS 9.4 maintenance release 3, displays the output beautifully.


Having a Mean Headache on Summary Data? Using SAS to Compute Actual Data from Summary Data
William Zupko
CC-88

SAS programmers might not get to choose how their data is formatted.  It is very easy to take raw data and provide descriptive statistics on data that has not been modified.  Unfortunately, sometimes raw data is unavailable and only the summary data can be provided.  One of the trickiest problems occurs with this summary data, as SAS has difficulty breaking data from one line into many.  Since many of the functions SAS would perform need the actual data, such as proc means, performing descriptive statistics on summary data can at best provide misleading data or, at worst, completely incorrect data.  This paper describes how to create data sets from summary data into simulated raw data sets, which allow accurate descriptive statistics on each single variable.  The following SAS code will be utilized and/or discussed: proc means, DATA step, do loops, and the output statement.


The %LetPut Macro, and Other Proactive Macro Debugging Techniques
Shane Rosanbalm
CC-121

Macro debugging can sometimes be difficult.  Having ready access to the values of local macro variables is often quite helpful.  This paper will introduce a simple macro, %LetPut, to assist in the displaying of the values of macro variables to the log.  Other minimally invasive techniques for generating helpful messages in the log will also be presented.


IA_CTT: A SAS® Macro for Conducting Item Analysis Using the Classical Test Theory
Yi-Hsin Chen
CC-184

Item analysis helps identify problems with test items when a bank of items that will be used continually is being developed.  These problems can be corrected, resulting in a better test and better measurement.  Item analysis is also a useful tool anytime when students complain about items.  Even though more advanced psychometric models, such as item response theory or cognitive diagnostic models, have been widely applied, item analysis based on the classical test theory is still very often employed by researchers and practitioners because of its conceptual simplicity.  This paper provides a SAS® macro, called IA_CTT, for conducting item analysis using the classical test theory.  Item analysis from this macro will yields the information including test score statistics (e.g., mean, median, mode, Q1, Q3, standard deviation, skewness, kurtosis, alpha, standard error of measurement), individual item statistics (e.g., p-value, point-biserial correlation, corrected point-biserial correlation, reliability when item deleted, two-top group item discrimination), frequency distributions of individual options for each item based on overall samples and two different groups from top 25% (above Q3) and bottom 25% (below Q1) students (i.e., distractor analysis), and Mantel-Haenszel differential item functioning statistics.  The macro reads in the data file from Microsoft excel and exports the outputs as excel files.  In addition to the macro for item analysis, this paper also provides the interpretations of all the relevant statistics.  Exemplary outputs are shown and interpreted at the end of the paper.


Accessing and Extracting Unstructured XML Data using SAS and Python
Sai Mandagondi
CC-188

This paper discusses an approach to dynamically load unstructured XML data using SAS and Python.  When neither the SAS XML mapper nor a custom XML map can parse the incoming data, using external programs (Shell Scripting and Python) and integrating results from external programs into a SAS data set is an efficient alternate.  One of the methods to eventually load data into a database to support upstream reporting and analytics is illustrated.


Because We Can: Using SAS® System Tools to Help Our Less Fortunate Brethren
John Cohen
CC-145

We may be called upon to provide data to developers -- frequently for production support -- who work in other programming environments.  Often external recipients, they may require files in specific formats and variable/column order, with proscribed delimiters, file-naming conventions, and the like.  Our goal should be to achieve this as simply as possible, both for initial development and ease of maintainability.  We will take advantage of several SAS tricks to achieve this goal.



Hands On Workshop

Quick Results with SAS® Enterprise Guide®
Kirk Paul Lafler
How-23

SAS® Enterprise Guide® empowers organizations, programmers, business analysts, statisticians and end-users with all the capabilities that SAS has to offer.  This hands-on workshop presents the built-in wizards for performing reporting and analytical tasks, access to multi-platform enterprise data sources, the delivery of data and results to a variety of mediums and outlets, data manipulation without the need to learn complex coding constructs, and support for data management and documentation requirements.  Attendees learn how to use the graphical user interface (GUI) to access tab-delimited and Excel input files; subset and summarize data; join (or merge) two tables together; flexibly export results to HTML, PDF and Excel; and visually manage projects using flowcharts and diagrams.


An Introduction to Perl Regular Expressions
Ron Cody
How-194

Perl regular expressions, implemented in SAS Version 9, provide a way to perform pattern matching of text strings.  This is a new capability to SAS and is particular useful in reading very unstructured data.  You have the ability to search for text patterns, extract the patterns, or substitute new patterns.  Perl regular expressions along with dozens of new character functions, give you enormous power to read and manipulate character data.


Introduction to ODS Graphics
Chuck Kincaid
How-98

This presentation teaches the audience how to use ODS Graphics.  Now part of Base SAS®, ODS Graphics are a great way to easily create clear graphics that enable any user to tell their story well.  SGPLOT and SGPANEL are two of the procedures that can be used to produce powerful graphics that used to require a lot of work.  The core of the procedures is explained, as well as some of the many options available.  Furthermore, we explore the ways to combine the individual statements to make more complex graphics that tell the story better.  Any user of Base SAS on any platform will find great value in the SAS ODS Graphics procedures.


Intermediate ODS Graphics
Chuck Kincaid
How-99

This paper will build on the knowledge gained in the Intro to SAS® ODS Graphics.  The capabilities in ODS Graphics grow with every release as both new paradigms and smaller tweaks are introduced.  After talking with the ODS developers, a selection of the many wonderful capabilities was selected.  This paper will look at that selection of both types of capabilities and provide the reader with more tools for their belt.

Visualization of data is an important part of telling the story seen in the data.  And while the standards and defaults in ODS Graphics are very well done, sometimes the user has specific nuances for characters in the story or additional plot lines they want to incorporate.  Almost any possibility, from drama to comedy to mystery, is available in ODS Graphics if you know how.  We will explore tables, annotation and changing attributes, as well as the BLOCK and BUBBLE plots.

Any user of Base SAS on any platform will find great value from the SAS ODS Graphics procedures.  Some experience with these procedures is assumed, but not required.


A Tutorial on the SAS® Macro Language
John Cohen
How-152

The SAS® Macro language is another language that rests on top of regular SAS code.  If used properly, it can make programming easier and more fun.  However, not every program is improved by using macros.  Furthermore, it is another language syntax to learn, and can create problems in debugging programs that are even more entertaining than those offered by regular SAS.

We will discuss using macros as code generators, saving repetitive and tedious effort, for passing parameters through a program to avoid hard coding values, and to pass code fragments, thereby making certain tasks easier than using regular SAS alone.  Macros facilitate conditional execution and can be used to create program modules that can be standardized and re-used throughout your organization.  Finally, macros can help us create interactive systems in the absence of SAS/AF.

When we are done, you will know the difference between a macro, a macro variable, a macro statement, and a macro function.  We will introduce interaction between macros and regular SAS language, offer tips on debugging macros, and discuss SAS macro options.


Applications Development, Theory or Practice
Ronald Fehd
How-96

This Hands-on Workshop is a case study of the proof-of-concept phase of the list processing suite Summarize-Each-Variable.  Topics covered include design principles, development strategy, style guide, naming conventions, requirements and specifications.  List processing consists of two tasks, making a list which is also called a control data set where each row is a set of parameter values, and processing the list which means calling another program with the parameters of each row.  The control data set shown here is the list of names of variables in a data set.  Design principles are reminders to write programs so that they are readable, reusable, robust and easy to test.  Two strategies are shown, bottom-up and top-down.   The style guide emphasizes naming conventions that are used for the programs and most important, the data structure, which guarantees the %success acceptance of the output described in the specifications.

Purpose: Students leave the course with a set of small programs which are a conceptual template that can be modified to handle other lists such as data sets, or files to process.


HOW to DoW
Paul Dorfman
How-30

The DoW-loop is a nested, repetitive DATA step structure enabling you to isolate instructions related to a certain break event before, after, and during a DO-loop cycle in a naturally logical manner.  Readily recognizable in its most ubiquitous form by the DO UNTIL(LAST.ID) construct, which readily lends itself to control-break processing of BY-group data, the DoW-loop's nature is more morphologically diverse and generic.  In this workshop, the DoW-loop's logic is examined via the power of example to reveal its aesthetic beauty and pragmatic utility.  In some industries like Pharma, where flagging BY-group observations based on in-group conditions is standard fare, the DoW-loop is an ideal vehicle greatly simplifying the alignment of business logic and SAS code.  In this workshop, the attendees will have an opportunity to investigate the program control of the DoW-loop step by step using the SAS DATA step debugger and learn of a range of nifty practical applications of the DoW-loop.


Application Development Techniques Using PROC SQL
Kirk Paul Lafler
How-24

Structured Query Language (SQL) is a database language found in the base-SAS software.  It permits access to data stored in data sets or tables using an assortment of statements, clauses, options, functions, and other language constructs.  This hands-on workshop (HOW) demonstrates core concepts as well as SQL’s many applications, and is intended for SAS users who desire an overview of this exciting procedure’s capabilities.  Attendees learn how to construct SQL queries; create complex queries including inner and outer joins; apply conditional logic with case expressions; identifying FIRST.row, LAST.row, and BETWEEN.rows in By-groups; create and use views; and construct simple and composite indexes.



Pharma & Healthcare

How to Build Study Quality Surveillance for a Clinical Study?
Angela Teng
PH-147

Study Quality Surveillance (SQS) is to provide oversight of the quality of a study by reviewing and monitoring study data in a blinded fashion.  The purpose of SQS is to determine the critical risks that could affect subject safety, data quality or compliance so that key issues can be quickly identified early and prevented from recurring, therefore, to ensure that the study results are valid and credible.  Data errors may be found and noted during the SQS review.  If data errors are noted, the logic used to find these errors will be communicated to Data Management so that Data Management can incorporate new edit checks in their specifications as appropriate.  Also, all outputs will be blinded and no information will be included that might risk unblinding.  This manuscript describes the process of generating SQS outputs and key components of a SQS report.  In addition, it provides detailed examples of SQS figures that facilitate data review.


Multilevel Randomization
Lois Lynn and Marina Komaroff
PH-28

Randomization in clinical trials is essential for the success and validity of a study.  PROC PLAN is an important SAS® procedure that generates randomization schedules for variety of experimental designs.  This procedure was developed for the major types of randomization like simple, block and stratified randomization where the latter controls and balances the influence of covariates.  In addition to SAS® documentation, multiple papers were written to explain how to adapt and enhance the procedure with DATA steps and/or PROC FORMAT.

Clinical research in transdermal medicine introduces the situation where a multilevel randomization is required for levels like treatment, location (arm, thigh, back, etc.) and side (left, right, upper, center, etc.) of a patch application while retaining balance at each level and combination of levels.  Schedules get especially complicated for cross-over studies where location and side of patch application needs to be rotated by period and balanced as well.  To the authors’ knowledge, there are no published papers to accommodate these requirements.

This paper introduces a novel concept of multilevel randomization, provides SAS code utilizing PROC PLAN, and a few examples with increasing complexity to generate balanced multilevel randomization schedules.  The authors are convinced that this paper will be useful to SAS-friendly researchers conducting similar studies that require multilevel randomization.


A Descriptive Analysis of Reported Health Issues in Rural Jamaica
Verlin Joseph
PH-107

During spring break, I accompanied a medical missions team to two of the most remote areas in Jamaica.  While in Jamaica, my team and I established clinics to treat a variety of health issues.  This paper will illustrate how I used SAS to produce a descriptive analysis report on various issues we treated.


Use of SAS and SAS Bridge for ESRI in the Study of Spatial Patterns of Children with Special Health Needs and Providers in Oklahoma
Ram Poudel, Maleeha Shahid, Mark Wolraich and Jennifer Lockhart
PH-126

It is speculated that service navigation could improve quality of care while limiting costs of chronic conditions or diagnoses management for youth and children with special health needs (CYSHCNs).  While CYSHCNs in the state of Oklahoma are in need of a well-functioning system to coordinate a wide variety of health care services, dental care has been identified as a unique unmet need in this population.  From Community Needs Assessment, 2014 we found “Not available in my county at all” as the top most barrier to the most needed service needs.  Therefore, we picked dental care as an example to assess the distance and drive time, calculated to travel to this particular service need.  The aim of this study is to map the spatial distribution of CYSHCNs and providers, to assess the type and distribution of diagnoses or chronic conditions these children have and to identify the appropriate method to calculate the distance and drive time from the zip-code of the family to the providers.  We used SAS and SAS Bridge for ESRI to find the spatial patterns of CYSHCNs as well as to detect the clusters of dental needs.  Comparing to national data (18.4%) Oklahoma has 2.5 times more children (44.9%) with special health needs who have 4 or more conditions reported.  Only few counties including two metros (14) have Sooner Care Pediatric Dentists.  More discrepancy is found in between urban and rural counties in terms of Sooner Care pediatric dentists and children with special health needs.  The MACRO can be used to determine distance and driving time from counties or zip-codes to service providers.  There are some clusters of counties with dental needs.  This cluster detection method can be used to other needs and in other states too.  More analyses will be done to assess if some of these clusters may be partly explained by socio-demographic and policy factors.


ANOVA_Robust: A SAS® Macro for Various Robust Approaches to Testing Mean Differences in One-Factor ANOVA Models
Thanh Pham, Eun Sook Kim, Diep Nguyen, Yan Wang, Jeffrey Kromrey and Yi-Hsin Chen
PH-134

Testing the equality of several independent group means is a common statistical problem in social sciences.  The traditional analysis of variance (ANOVA) is one of the most popular methods.  However, ANOVA F test is sensitive to the violation of the homogeneity of variance assumption.  Several alternative tests have been developed in response to this problem of ANOVA F test.  These tests might be a modification of ANOVA F test based on Structured Means Modeling technique.  This paper provides a SAS macro for testing the equality of group means using thirteen different methods including regular ANOVA F test.  In addition, this paper provides the results of simulation study to compare the performance of these tests in terms of their Type I error rate and statistical power under different conditions, especially, under the violation of homogeneity variance assumption.


Same Question, Different Name: How to Merge Responses
Michelle Dahnke and Tyra Dark
PH-163

To correctly use data from the Collaborative Psychiatric Epidemiology Surveys, which joins three individual surveys, users may need to evaluate the cross-survey linking and merge responses.  While the codebook identifies the variable name assigned to each question, in some instances the same questions were assigned different names in the three surveys.  For example, a question about being diagnosed with high blood pressure was named V04052 and V06677 depending on the survey.  This paper demonstrates how to merge response data in circumstances such an this so the user can conduct analysis on the maximum number of valid responses.


Creating Quartiles from Continuous Responses: Making Income Data Manageable
Michelle Dahnke and Tyra Dark
PH-162

It is customary to collect data at the most granular level; however sometimes that requires consolidating responses in categories before using them in advanced analysis.  For example, in the Collaborative Psychiatric Epidemiology Surveys, the household income variable is a continuous variable with individual responses ranging from 0 to $200,000.  Working with the data may require categorization so that the data is more manageable—like in quartiles.  This paper walks through how to create household income quartiles from free response data, an important fundamental skill, in working with large data sets.


Using SAS Programing to Identify Super-utilizers and Improve Healthcare Services
An-Tsun Huang
PH-170

Introduction: Improving public health, enhancing the quality of healthcare services, and reducing unnecessary costs are important healthcare issues.  Super-utilizers are the small subset of population who account for the highest utilization of healthcare services.  The purpose of this study is to combine inpatient stays (IS) and emergency-department visits (EV) to identify super-utilizers in the Medicaid population, in order to enhance the quality of healthcare, decrease Medicaid costs, and improve healthcare management systems.

Methods: Medicaid claims data with dates of service in fiscal year 2014 were used to create 16 scenarios of combined IS and EV.  These scenarios represent 16 interactions between four IS groups and four EV groups.  Among them, high counts of IS and EV (IS ≥2 and EV ≥3) are considered as high utilization of healthcare services.  Super-utilizers are beneficiaries under the condition: IS ≥4 and EV ≥6.  First, based on management/payment systems, Medicaid beneficiaries were classified into two groups: managed care organization (MCO) enrollees and fee-for-service (FFS) beneficiaries.  Second, PROC SQL was used to count the number of IS and EV services for each beneficiary.  Subsequently, IF statements were used to create dummy variables to categorize IS and EV counts into four groups, respectively, and then to categorize combined IS and EV counts into 16 sub-groups.  Afterwards, PROC SQL and PROC TABULATE were used to obtain numbers of beneficiaries and Medicaid costs for each scenario.  Lastly, PROC FREQ was used to identify top three diseases in each scenario.

Results: MCO super-utilizers account for 0.1% of MCO enrollees and 4.0% of MCO expenditures.  FFS super-utilizers account for 0.8% of FFS beneficiaries and 9.8% of FFS expenditures.

Conclusion: This method is timely, especially after Affordable Care Act was launched in 2010.  It could facilitate governments, healthcare industries, and researchers to evaluate costs, performance of healthcare services, and improvement of public health.


Prescription Opioid Use in the U.S. in 2012: Characterizing Sustained vs. Infrequent Use Using the Medical Expenditure Panel Survey
Monika Lemke
PH-182

Background/Objectives: Opioid has been declared an epidemic as trends of use, abuse, addiction and overdose related deaths have increased.  This study provides a detailed portrait of opioid exposure in the United States and characterizes subpopulations with varying levels of exposure.

Methods: A secondary analysis of the nationally representative Medical Expenditure Panel Survey examines self-reported prescription opioid exposure in US adults 18 years and older in 2012.  Opioid users are divided into categories based on use duration and drug DEA Schedule: infrequent (< 30 day supply or one prescription), sustained (Narcotic Analgesics (Schedule II, 30-89 days and Non-Schedule II, >30 days), Narcotic Analgesic Combinations (Schedule II and Non-Schedule II, >30 day)), and intensive (Narcotic Analgesics (Schedule II, 90-day supply or more)).  Socio-demographic factors such as sex, age, race, census region, family income, insurance coverage, education, and BMI were investigated.

Results: According to our estimates, 14.5% of the US adult population reported opioid prescriptions in 2012, or about 21.5% of the US adult population with any medication.  Among opioid users, 62.8% were infrequent users, 30.6% sustained users, and 6.5% intensive users.  The mean total day supply was 8 days (Standard Error 0.2) among infrequent users, 176 days (SE 9) among sustained users, and 353 days (SE 13) among intensive users.  Adults 65-85 years old (Odds Ratio 6.7, 95% CI 3.7-12.0, p-value < 0.0001), those at less than 100% of the Federal Poverty Level (OR 2.6, 95% CI 1.9-3.7, p-value < 0.0001), and those with public insurance coverage (OR 1.5, 95% CI 1.2-1.9, p-value = 0.0013) were more likely to be in a higher use group.

Conclusions: A significant proportion of individuals who reported an opioid prescription in 2012 received a supply of 30 days or less and have the lowest risk of dependency.  The subgroup of individuals who received a supply of 90 days or more of high risk opioids needs to be better understood in order to avoid adverse outcomes in this risk group.



Planning, Support, and Administration

Handling Sticky Situations - The Paper
Stephanie Thompson
PA-137

Have you ever been asked to perform an analysis where you were presented with the expected outcome?  What about analyzing personnel data to help with salary negotiations or promotions?  Report on metrics that can’t be measured?  These and other scenarios will be discussed in an effort to help you, as an analyst, perform your task but also to make it meaningful and ethical.


SAS In-Database Decision Management for Teradata: Making the Best Decisions Possible
Tho Nguyen and William E Benjamin Jr
PA-202

We all make tactical and strategic decisions every day.  With the presence of big data, are we making the right or the best decisions possible as data volume, velocity and variety continue to grow?  As businesses become more targeted, personalized and public, it is imperative to make precise data-driven decisions for regulatory compliance and risk management.  Come learn how SAS In-Database Decision Management for Teradata can help you make the best decision possible by integrating SAS and Teradata.


Tips and Tricks for Organizing and Administering Metadata
Michael Sadof
PA-183

The SAS® Management Console was designed to control and monitor virtually all of the parts and features of the SAS Intelligence Platform.  However, administering even a small SAS Business Intelligence system can be a daunting task.  This paper will present a few techniques that will help you simplify your administrative tasks and enable you and your user community to get the most out of your system.  The SAS Metadata server stores most of the information required to maintain and run the SAS Intelligence Platform which is obviously the heart of SAS BI.  It stores information about libraries, users, database logons, passwords, stored processes, reports, OLAP cubes and a myriad of other information.  Organization of this metadata is an essential part of an optimally performing system.  This paper will discuss ways of organizing the metadata to serve you organization well.  It will also discuss some of the key features of the SMC and best practices that will assist the administrator in defining roles, promoting, archiving, backing up, securing, and simply just organizing the data so it can be found and accessed easily by administrators and users alike.


UCF SAS® Visual Analytics: Implementation, Usage, and Performance
Scott Milbuta, Carlos Piemonti and Ulf Borjesson
PA-187

At the University of Central Florida (UCF) we recently invested in SAS Visual Analytics (VA) along with the updated SAS® Business Intelligence (BI) platform (from 9.2 to 9.4), a project that took over a year to be completed, in order to give our users the best and most updated tools available.

This paper introduces our SAS VA environment at UCF and it includes projects created using this tool and also answers why we are choosing it for the development over other SAS applications available.

It also explains the technical environment for our non-distributed SAS VA: ram, servers, benchmarking, sizing and scaling, and why we chose this mode instead of a distributed SAS VA environment.

Challenges in the design, implementation, usage, and performance are also being presented, including the reasons why Hadoop has not been adopted.


Tips and Tricks for Introductory Workshops in SAS for Health Professionals
Jason Brinkley
PA-62

It can sometimes be the case that general health professionals need some basic SAS training in order to effectively create simple reports and manipulate incoming data.  The presenter will share his experiences in leading SAS Workshops Series in a university setting across the course of several years.  Heading a team of university faculty members, the presenter has designed, implemented, and refined short term SAS overview training for general health professionals.  While multiple topics have been discussed in these workshops, some have fared better with a general health professional audience than others.  Topics will include tips on introducing code based work to individuals with no previous experience, workshop format, good practices on instruction and delivery, and introducing SAS macros in an example based manner.


Predictive Modeling Using SAS® Visual Statistics: Beyond the Prediction
Xiangxiang Meng
PA-197

Predictions, including regressions and classifications, are the predominant focus of many statistical and machine-learning models.  However, in the era of Big Data, a predictive modeling process contains more than just making the final predictions.  For instance, a large collection of data often represents a set of small, heterogeneous populations.  Identification of these sub groups is therefore an important step in predictive modeling.  Additionally, Big Data data sets are often complex, exhibiting high dimensionality.  Consequently, variable selection, transformation, and outlier detection are integral steps.  This paper provides working examples of these critical stages using SAS® Visual Statistics, including data segmentation (supervised and unsupervised), variable transformation, supervised variable selection, outlier detection, and filtering, in addition to building the final predictive model using methodology such as decision trees, logistic regressions, and random forests.  The illustration data were collected over 2010 to 2014 from vehicle emission testing results.


Get cozy with your data sources using LSF queue technology: One LSF Cluster to Rule them All
Steve Garrison
PA-171

Do you have data sources located all over the world?  Is your user data travelling needlessly across long distances?  Do you experience network lag because your production SAS grid cluster is not housed in the same physical location as each of your data sources?  The answer is not to create multiple production SAS environments, one in each data location.  The solution is to expand your grid cluster to span multiple data centers using dedicated LSF queues for each 'remote' server(s).

In an attempt to save licensing and resource costs, many global corporations are moving towards a single enterprise SAS grid verses multiple internal SAS environments.  Many of these legacy SAS environments were purposely co-located beside critical data sources to provide the fastest possible response time when pulling or pushing data to those sources of data.  So, how does an organization architect a single enterprise SAS grid utilizing data sources located all over the world, without compromising performance?

One way to eliminate network latency is by cohabitating compute nodes with your critical data.  LSF queue technology makes this possible.  Having a queue assigned to each worker node in each location where critical data is housed, there is no network lag because there is no traversing of distance.  The server performing the work is in the same location as the data source.  A single production LSF cluster can span the globe to submit a SAS job from Virginia to be executed on a server in Tokyo (or wherever a critical data source may be located).

This use of LSF queue technology creates a single production cluster to “rule them all".


When Bad Things Happen to Good SAS Programmers
Jiangtang Hu
PA-102

It’s not the doom days for SAS programmers by any means, but we can’t call it Golden Age anymore.  As SAS programmers in their daily work, they face the much more aggressive control from IT department: they might not be able to use their favorite text editors to write/edit SAS codes, and they might even have no choices of the SAS interface to run the codes but have to the stick to the new game in town like SAS Enterprise Guide, SAS Studio, SAS Data Integration Studio, SAS Drug Development.

In the external world, SAS language itself is facing the strong completion from R, Python and it raises the even more profound question: would SAS programmers as profession exist in the near future?  We can see the volumes of traffic in main SAS mailing list is slow down, the bloggers in SAS are not active as before and even worse, the SAS user conferences at all levels are also facing challenges of funding and participations.

In this paper, I will not focus on “why” these bad things happened, but rather launch an open discussion on how we SAS programmer response to these challenges when the bad things happened or is happening.


Supporting SAS in the Workplace: A Corporate SAS User Community
Barbara Okerson and Jennifer Harney
PA-71

Many SAS users are not aware of an abundance of resources available to them from a variety of sources.  The available resources range from those internal to their own organization to SAS itself.  In order for these resources to be utilized they need to be available to the users in an accessible way.  This paper shows how one large company with SAS users at many locations throughout the United States has built a highly successful collaborative community for SAS support.  Modeled in the style of sasCommunity.org, the online corporate SAS community includes discussion forums, surveys, interactive training, places to upload code, tips, techniques, and links to documentation and other relevant resources that help users get their jobs done.


Establishing a Health Analytics Framework
Krisa Tailor
PA-198

Medicaid programs are the second largest line item in each state’s budget.  In 2012, they contributed $421.2 billion, or 15 percent of total national healthcare expenditures.  With US health care reform at full speed, state Medicaid programs must establish new initiatives that will reduce the cost of healthcare, while providing coordinated, quality care to the nation’s most vulnerable populations.  This paper discusses how states can implement innovative reform through the use of data analytics.  It explains how to establish a statewide health analytics framework that can create novel analyses of health data and improve the health of communities.  With solutions such as SAS® Claims Analytics, SAS® Episode Analytics, and SAS® Fraud Framework, state Medicaid programs can transform the way they make business and clinical decisions.  Moreover, new payment structures and delivery models can be successfully supported through the use of healthcare analytics.  A statewide health analytics framework can support initiatives such as bundled and episodic payments, utilization studies, accountable care organizations, and all-payer claims databases.  Furthermore, integrating health data into a single analytics framework can provide the flexibility to support a unique analysis that each state can customize with multiple solutions and multiple sources of data.  Establishing a health analytics framework can significantly improve the efficiency and effectiveness of state health programs and bend the healthcare cost curve.


A Review of "Free" Massive Open Online Content (MOOC) for SAS® Learners
Kirk Paul Lafler
PA-13

Leading online providers are now offering SAS® users with “free” access to content for learning how to use and program in SAS.  This content is available to anyone in the form of massive open online content (or courses) (MOOC).  Not only is all the content offered for “free”, but it is designed with the distance learner in mind, empowering users to learn using a flexible and self-directed approach.  As noted on Wikipedia.org, “A MOOC is an online course or content aimed at unlimited participation and made available in an open access forum using the web.”  This presentation illustrates how anyone can access a wealth of learning technologies including comprehensive student notes, instructor lesson plans, hands-on exercises, PowerPoints, audio, webinars, and videos.


Differentiate Yourself
Kirk Paul Lafler
PA-22

Today's employment and business marketplace is highly competitive.  As a result, it can be difficult to differentiate yourself and/or your business from standing out from the competition.  The success you're able to achieve depends on how you position yourself and/or your business relative to your competitors.  Topics include learning how to cut through all the marketplace noise, techniques on grabbing the attention of others, and strategies for attracting the desired employer or client.  This presentation emphasizes essential skills that will help students, junior professionals, and seasoned professionals learn how to differentiate yourself and/ or your business from the competition.


Downloading, Configuring, and Using the Free SAS® University Edition Software
Charlie Shipp and Kirk Paul Lafler
PA-51

The announcement of SAS Institute’s free “SAS University Edition” is an exciting development for SAS users and learners around the world!  The software bundle includes Base SAS, SAS/STAT, SAS/IML, Designer Studio (user interface), and SAS/ACCESS for Windows, with all the popular features found in the licensed SAS versions.  This is an incredible opportunity for users, statisticians, data analysts, scientists, programmers, students, and academics everywhere to use (and learn) for career opportunities and advancement.  Capabilities include data manipulation, data management, comprehensive programming language, powerful analytics, high quality graphics, world-renowned statistical analysis capabilities, and many other exciting features.

This presentation discusses and illustrates the process of downloading and configuring the SAS University Edition.  Additional topics include the process of downloading the required applications, “key” configuration strategies to run the “SAS University Edition” on your computer, and the demonstration of a few powerful features found in this exciting software bundle.  We conclude with a summary of tips for success in downloading, configuring and using the SAS University Edition.



Posters

Maintaining a 'Look and Feel' throughout a Reporting Package Created with Diverse SAS® Products
Barbara Okerson
PO-44

SAS® provides a number of tools for creating customized professional reports.  While SAS provides point-and-click interfaces through products such as SAS® Web Report Studio, SAS® Visual Analytics or even SAS® Enterprise Guide®, unfortunately, many users do not have access to the high-end tools and require customization beyond the SAS Enterprise Guide point-and-click interface.  Fortunately, Base SAS procedures such as the REPORT procedure, combined with graphics procedures, macros, ODS, and Annotate can be used to create very customized professional reports.

When toggling together different solutions such as SAS Statistical Graphics, the REPORT procedure, ODS, and SAS/GRAPH®, different techniques need to be used to keep the same look and feel throughout the report package.  This presentation looks at solutions that can be used to keep a consistent look and feel in a report package created with different SAS products.


Assessing Health Behavior Indicators Among Florida Middle School Students
Elizabeth Stewart, Ivette A. Lopez and Charlotte Baker
PO-45

This pilot study sought to assess selected health behavior indicators of middle school students attending a developmental research school at a Historically Black College/University.  Students in grades 6-8 completed a modified Youth Risk Behavior Survey (YRBS).  Study participants (n=48) answered questions concerning unintentional injury and violence, sexual behaviors, alcohol and drug use, tobacco use, dietary behaviors and physical activity.  The majority of students, 89% reported no usage of tobacco, alcohol or other drugs.  Regarding screen time, 50% of all students reported watching 3-5 hours of television on an average school day and 68% of males reported playing video/computer games for the same time span.  Most males and females reported being about the right weight, 69% and 60%, respectively, while 22% of girls reported being slightly and very overweight, combined.  Concerning physical activity, 81% of males reported 4 hours or more participation each week, compared to 50% of females.  The health behavior indicator with the greatest difference per sex among the student population is physical activity, as males report greater activity levels.  This emphasizes the need for a segmented physical activity intervention for the female student population and an intervention designed to increase physical activity and decrease screen time among African American teens.


Using SAS to Examine Health-Promoting Life Style Activities of Upper Division Nursing Students One of the Major University in South Eastern
Abbas Tavakoli and Mary Boyd
PO-46

Health promotion is an important nursing intervention.  Research has long shown that one’s lifestyle affects health.  A health-promoting lifestyle has been described as a multi-dimensional pattern of self-initiated actions that maintain or enhance one’s level of wellness, self-actualization, and fulfillment.  The purpose of this study was to evaluate the health promotion/life style activities of upper division nursing students in a college of nursing one of the major university in South Eastern.  Specific aims of the study were (1) to measure health-promoting life style activities of upper division nursing students in a college of nursing in the Southeastern United States; (2) to compare the health-promoting life styles of male and female nursing students; (3) to compare the health-promoting life styles of Caucasian students to students of other ethnic groups; and 4) to compare the health-promoting life styles of by marital status.  This study used a descriptive, comparative design to assess the health promotion/life style activities of upper division nursing students in a college of nursing in the Southeastern US.  Women are often more involved in interpersonal relationships than men and use and provide more social support than men.  The result did not reveal any significant difference for total and subscales of health promoting by gender.  There were statistically significant differences for white students versus other race in terms of physical activity, nutrition, and interpersonal relations.  There was significant difference for physical activity by marital status.  However, there were not any statistical differences for other subscales by marital status.


Reporting Of Treatment Emergent Adverse Events Based On Pooled Data Analysis or Country Specific Submissions: A Case Study
Sheetal Shiralkar
PO-53

Often sponsor need to file for regulatory submissions at different Country specific regulatory authorities after they get approval from Food and Drug Administration.  The key reporting aspect of country specific submissions pertaining to emerging markets involve accurate reporting of adverse events from the clinical trials conducted for the specific drug in those countries.  For reporting of these adverse events, we need to develop a robust algorithm and comprehensive system architecture for efficient and accurate data representation.

Pooling of data from multiple studies is often the first step in ensuring that adverse events from all the trials of the drug get accurately reported.  The data pooling specifications involve lot of conditioning and sub-setting of data based on reporting specifications.  This poster describes a case study of a typical data pooling and reporting process of trial level data available in ADaM model.  The analysis also elaborates on more details pertaining to reporting requirements and on programming algorithms developed to meet those requirements.


Automating Preliminary Data Cleaning in SAS
Alec Zhixiao Lin
PO-63

Preliminary data cleaning or scrubbing tries to delete the following types of variables considered to be of little or no use: 1) variables with missing values or a uniform value across all records; 2) variables with very low coverages; 3) character variables for name, addresses and IPs.  These variables are very commonly seen in big data.  This paper introduces a SAS process that will automatically and efficiently detect these variables with a minimal manual handling from users.  The output also helps users to identify those character variables that need to be converted to numeric values for downstream analytics.


Using SAS to Examine Health Effects of Intimate Partner Violence among HIV+ Women
Abbas Tavakoli, Sabra Custer-Smith and Ni Yu-Min
PO-97

Intimate partner violence (IPV) is a recognized national public health issue that includes physical abuse and unwanted or forced sexual contact by a partner.  Numerous studies have documented the negative health consequences of IPV.  There is evidence that IPV has a negative effect on the self-management of HIV, which is now a chronic disease.  The purpose of this study was to use descriptive statistics and correlations to measure the prevalence of IPV and the possible effects of IPV among HIV+ women.  A convenience sample of 200 HIV+ women recruited at a Ryan White-funded clinic in Columbia, SC.  The prevalence of IPV was assessed using the Severity of Violence Against Women Scale (SVAWS).  The SVAWS is a 46-item Likert scale that assesses experiences with IPV over the last 12 months.  In addition to a summary score of total IPV, the SVAWS also contains subcategories of types of IPV.  Participants were also asked to report their most recent HIV viral load in order to gauge the management of their HIV.  Statistical analysis included descriptive statistics and correlation procedures.  SAS 9.4 used to analyze the data.  The Spearman correlation was used to examine the association between total levels of IPV, each subcategory of IPV, and viral load.  There were no significant positive linear relationships between viral load and violence subscales.  The Pearson correlation for different subscales of violence and the HIV viral load ranged from -0.01 to 0.1.


Ouch, how did that get here? The pitfalls of merging ...
Nancy McGarry
PO-101

This poster is a short slide show presentation of the pitfalls of merging.  It highlights some common problems of the unwary, looking at 5 common missteps in merging data in hopes of preventing errors by making the audience more aware of things that can go wrong during a DATA step merge.


A SAS Macro to Investigate Statistical Power in Meta-analysis
Jin Liu and Fan Pan
PO-109

Meta-analysis is a quantitative review method, which synthesizes the results of individual studies on the same topic.  Cohen’s d was selected as the effect size index in the current study because it is widely used in the practical met-analysis and there are few simulation studies investigating this index.  Statistical power is conceptually defined as the probability of detecting the real-existing effect/difference.  The current power analytical procedure of meta-analysis involves approximations and the accuracy of using the procedure is uncertain.

Simulation can be used to calculate power in a more accurate way by addressing approximation in formula.  The simulation studies involve generating data from computer programs to study the performance of the statistical estimates under different conditions (Hutchinson and Bandalo, 1997).  If there is no real effect, researchers would hope to retain the null hypothesis.  If there is a real existing effect, researchers would hope to reject the null hypothesis to increase statistical power.  In each simulation step, a p-value is retained to decide if the null hypothesis is rejected or retained.  The proportion of the rejected null hypotheses of all simulation steps is the simulated statistical power when there is a non-zero treatment effect.

The purpose of the study is to inform meta-analysis practitioners the degree of discrepancy between the analytical power and real simulated power in the meta-analysis framework (i.e., fixed and random effects models).  SAS macro was developed to show researchers power under following conditions: simulated power in the fixed effects model, analytical power in the fixed effects model, simulated power in the random effects model, and analytical power in the random effects model.  As long as researchers know the parameters that are needed in their meta-analysis, they can run the SAS macro to receive the power values they need in different conditions.  Results indicate that the analytical power was close to the simulated power, while some conditions in the random-effects models had noticeable power discrepancies.  This study will yield a better understanding of statistical power in real meta-analyses.


Documentation as you go: aka Dropping Breadcrumbs
Elizabeth Axelrod
PO-114

Your project ended a year ago, and now you need to explain what you did, or rerun some of your code.  Can you retrace your steps?  Find your data, programs, printouts? Replicate the results?  This poster presents tools and techniques that follow best practices to help us, as programmers, manage the flow of information from source data to a final product.


Many Tables, Short Turnaround: Displaying Survey Results Quickly and Efficiently
Rebecca Fink, David Izrael, Sarah W. Ball and Sara M.A. Donahue
PO-120

When constructing multiple tables with a tight deadline, it is critical to have a tool to quickly and efficiently create tables using a standard set of parameters.  We demonstrate a set of SAS® macros based on PROC SURVEYFREQ which can be used to summarize survey data in tables that contain unweighted and weighted counts and percentages, as well as a weighted treatment ratio, with respective confidence intervals.  These macros allow us to display both single-select survey questions (i.e., survey questions with only one response allowed, such as gender) and multiple choice survey questions that allow the respondent to choose more than one response (for example, insurance status) within the same table.  Further, we can use these macros to sort the output by the treatment ratio or by the column percent distribution in either ascending or descending order.  Finally, these macros have the ability to output the results on a defined subset of the sample.


Using SAS® Macros to Split a Data Set into Several Output Files
Imelda Go
PO-127

Generating output files from various subsets of a master data set is a common task.  For example, a statewide master data set has records grouped by district and the task requires splitting the master data into an output file per district.  In this example, PROC FREQ is used to generate a frequency distribution of the district values in the master data set.  A macro loop is then executed for each district value in the frequency distribution.  For each loop, macro variables are assigned values based on the data per district as specified in the frequency distribution.  These values, which vary per district, were used to give unique names to each output file.  Another macro discussed enables the programmer to assign the number of observations in a data set to a macro variable.  This macro variable is useful for determining how many loops will be executed in the macro loop mentioned above, which eliminates hardcoding the number of loops in the macro loop.  The form of the output files may vary depending on the programmer’s needs and is specified within the macro loop.


SAS Macros for Constraining Arrays of Numbers
Charles Coleman
PO-131

Many applications require constraining arrays of numbers to controls in one or two dimensions.  Example applications include survey estimates, disclosure avoidance, input-output tables, and population and other estimates and projections.  If the results are allowed to take on any nonnegative values, raking (a.k.a. scaling) solves the problem in one dimension and two-way iterative raking solves it in two dimensions.  Each of these raking macros has an option for the user to output a dataset containing the rakes.  The problem is more complicated in one dimension if the data can be of any sign, the so-called “plus-minus” problem, as simple raking may produce unacceptable results.  This problem is addressed by generalized raking, which preserves the structure of the data at the cost of a nonunique solution.  Often, results are required to be rounded so as to preserve the original totals.  The Cox-Ernst algorithm accomplishes an optimal controlled rounding in two dimensions.  In one dimension, the Greatest Mantissa algorithm is a simplified version of the Cox-Ernst algorithm.

Each macro contains error control code.  The macro variable &errorcode is made available to the programmer to enable error trapping.


Time Series Analysis: U.S. Military Casualties in the Pacific Theater during World War Two
Rachael Becker
PO-133

This paper aims to show how statistical analysis can be used in the field of History.  The primary focus of this paper is show how SAS® can be utilized to obtain a Time Series Analysis of data regarding World War II.  The hope of this analysis is to test whether Truman's justification for the use of atomic weapons was valid.  Truman believed that by using the atomic weapons he would be preventing unacceptable levels of U.S. casualties that would be incurred in the course of a conventional invasion of the Japanese home islands.


Streamlining Medicaid Enrollment Reporting and Calculation of Enrollment Trends Using Macro and PROC Statements in SAS
Deniz Soyer
PO-153

Background: The Division of Analytics and Policy Research (DAPR) within the District of Columbia Department of Health Care Finance produces a monthly Medicaid enrollment report for the District’s Medical Care Advisory Committee (MCAC) to document enrollment trends in the District’s various Fee-For-Service and Managed Care programs.  This retrospective report requires retrieving Medicaid eligibility data for multiple monthly enrollment spans, organization of the data by program type and month, and calculation of enrollment growth for programs of interest.  Previously, DAPR used Microsoft Excel to organize multiple outputs of monthly enrollment data and perform calculations on trends.  To minimize time spent to manually produce recurring reports, DAPR sought to develop a SAS program to streamline and automate its monthly MCAC report.

Methods: The use of %LET statements created macro variables to represent each monthly enrollment span, which allowed for the formatting and transposing steps, occurring later in the program, to automatically reference each month.  Functions PUT and %SCAN were used to reference and redefine variable formats. PROC FREQ was used to obtain total enrollment counts, by month, for each Medicaid program.  PROC TRANSPOSE was used to convert each month from an observation into a variable type, which allowed each row to display enrollment for each Medicaid program type, while the columns served to classify enrollment by month.  DATA steps were used to perform calculations for enrollment trends, such as growth.

Results: Using essential macro and PROC statements in SAS, DAPR was able to organize data, calculate trends, and output a finalized report related to monthly Medicaid program enrollment, thereby resulting in near-automation of its reporting process.

Conclusion: Adopting the use of macro statements alongside data steps and PROC statements in SAS enables greater automation and accuracy in periodic reporting.  DAPR has incorporated the combined use of macro and PROC statements in an effort to streamline other recurring reports.


PROC REG: A SAS Macro to Determine Cut Score for a Universal Pre-School Maladaptive Screener
Yin Burgess
PO-165

Linear Regression is a commonly and widely used statistical tool for determining relationships (or lack thereof) of variables.  In SAS, we use the REG procedure, or PROC REG, which fits linear regression models using least-squares, to carry out the analyses.  We will focus on the very basics of PROC REG and, needless to say, explanation of many features is beyond the scope of this proposal.  Using Ordinary Least-squares estimates (OLS) in statistical analyses involves many assumptions, such as homoscedasticity, independence, correct model, random sample (uncorrelated error terms), and normally distributed error terms.  We will avoid possible complications with these assumptions for the purpose of this demonstration.  We will use educational research data on a pre-school children maladaptive screener to demonstrate how we use a SAS Macro to yield a score that determines if a child is deemed to have maladaptive behavior.



Reporting and Information Visualization

How to Become the MacGyver of Data Visualizations
Tricia Aanderud
RIV-104

If you don't understand what makes a good data visualization - then chances are you're doing it wrong.  Many business people are given data to analyze and present when they often don't understand how to present their ideas visually.  We are taught to think about data as numbers.  We often fail to understand that numbers show causes and help others reason through issues.  In this paper, we will review how data visualizations fail to understand what makes a good data visualization work.


Bridging the Gap: Importing Health Indicators Warehouse data into SAS® Visual Analytics using Stored Processes and APIs
Li Hui Chen, Manuel Figallo and Josh McCullough
RIV-92

The National Center for Health Statistics’ Health Indicators Warehouse (HIW) is part of the Department of Health and Human Services’ (DHHS) response to the Open Government Initiative to make federal data more accessible to all users.  Through it, users can view and download data and metadata for over 1,200 indicators on health status, outcomes, and determinants from approximately 180 different federal and nonfederal sources.  HIW also provides access to data through the use of an Application Programming Interface (API).  API is a communication interface that applications such as SAS® Visual Analytics (SAS®VA) can use to access data from HIW and other data repositories.  This paper provides detailed information on how to access HIW data with SAS®VA in order to produce easily understood health statistics visualizations with minimal effort.  It will guide readers through a process and methodology to automate data access to the HIW and demonstrate the use of SAS®VA to present health information via a web browser.

This paper also shows how to run SAS macros inside a stored process to generate the API calls to the HIW in order to make the indicators and associated data and metadata available in SAS®VA for exploration and reporting; the macro codes are provided. Use cases are also explored in order to demonstrate the value of using SAS®VA and stored processes to stream data directly from the HIW API.  Dashboards, for instance, are created to visually summarize results gained from exploring the data.

Both IT professionals and population health analysts will benefit from understanding how to import HIW data into SAS®VA using Stored Processes and APIs.  This paper ultimately provides a starting point for any organization interested in using HIW data to augment their analysis related to population health.  Integrating HIW with SAS®VA can be very helpful to organizations that want to streamline their data management processes and lower high maintenance costs associated with data extraction and access while gaining insights into health data.  Analysts will also benefit from this paper through the use cases, which demonstrate the value of population health data accessed through an API with SAS®VA. 


Take Your Data Analysis and Reporting to the Next Level by Combining SAS® Office Analytics, SAS® Visual Analytics, and SAS® Studio
Tim Beese
RIV-196

SAS® Office Analytics, SAS® Visual Analytics, and SAS® Studio provide excellent data analysis and report generation.  When these products are combined, their deep interoperability enables you to take your analysis and reporting to the next level.  Build interactive reports in SAS® Visual Analytics Designer, and then view, customize and comment on them from Microsoft Office and SAS® Enterprise Guide®.  Create stored processes in SAS Enterprise Guide, and then run them in SAS Visual Analytics Designer, mobile tablets, or SAS Studio.  Run your SAS Studio tasks in SAS Enterprise Guide and Microsoft Office using data provided by those applications.  These interoperability examples and more will enable you to combine and maximize the strength of each of the applications.  Learn more about this integration between these products and what's coming in the future in this session.


How We Visualize Data and How to Apply Those Findings in SAS® Visual Analytics
Ryan Kumpfmiller
RIV-123

With data discovery tools becoming more useful and elaborate each year, the capabilities of displaying data and designing reports have never been better.  We have gotten to the point where we can now create interfaces that end users view and interact with.  To get the most out of these capabilities, just like with data, we need to know what is going on behind the scenes.  Now that we are building interfaces with data discovery tools such as SAS® Visual Analytics, it’s time to understand the way that we view data and incorporate that research into how we build reports.


A Journey from data to dashboard: Visualizing the university instructional classroom utilization and diversity trends with SAS Visual Analytics
Shweta Doshi and Julie Davis
RIV-125

Transforming data into intelligence for effective decision-making support is critically based on Office of Institutional Research’s role and capacity in managing the institution’s data.  Presenters will share their journey from providing spreadsheet data into developing SAS programs and dashboard using SAS Visual Analytics.  Experience gained and lessons learned will also be shared at this session.

The presenters will:
  1. demonstrate two dashboards the IR office developed, one for classroom utilization and one for the University’s diversity initiatives;
  2. describe the process the office took in getting the stakeholders involved in determining the KPI and evaluating and providing feedback regarding the dashboard; and
  3. share their experience gained and lessons learned in building the dashboard.

Key Features in ODS Graphics for Efficient Clinical Graphing
Yuxin (Ellen) Jiang
RIV-174

High-quality effective graphs not only enhance understanding of the data but also facilitate the regulators in the review and approval process.  In recent SAS releases, SAS has made significant progress toward more efficient graphing in ODS Statistical Graphics (SG) procedures and Graph Template Language (GTL).  A variety of graphs can be quickly produced using convenient built-in options in SG procedures.  With graphical examples and comparison between SG procedures and traditional SAS/GRAPH procedure in reporting clinical trial data, this paper highlights several key features in ODS Graphics to efficiently produce sophisticated statistical graphs with more flexible and dynamic control of graphical presentation.


How to Make a Stunning State Map Using SAS/Graph® for Beginners
Sharon Avrunin-Becker
RIV-74

Making a map for the first time can be an overwhelming task if you are just beginning to learn how to navigate your way through SAS/Graph.  It can be especially paralyzing when you are trying to narrow your map to a smaller scale by identifying counties in a state.  This paper will walk you through the steps to getting started with your map and how to add ranges of colors and annotations.  It will also point out a few traps to avoid as you are designing your programs and maps.


PROC RANK, PROC SQL, PROC FORMAT and PROC GMAP Team Up and a (Map) Legend is Born!
Christianna Williams and Louise Hadden
RIV-80

The task was to produce a figure legend that gave the quintile ranges of a continuous measure corresponding to each color on a five-color choropleth US map.  Actually, we needed to produce the figures and associated legends for several dozen maps for several dozen different continuous measures and time periods as well as associated "alt-text" for compliance with Section 508…so, the process needed to be automated.  A method was devised using SAS(R) PROC RANK to generate the quintiles, PROC SQL to get the data value ranges within each quintile, and PROC FORMAT (with the CNTLIN= option) to generate and store the legend labels.  The resulting data files and format catalogs are then used to generate both the maps (with legends) and associated "alt text".  Then, these processes were rolled into a macro to apply the method for the many different maps and their legends.  Each part of the method is quite simple – even mundane – but together these techniques allowed us to standardize and automate an otherwise very tedious process.  The same basic strategy could be used whenever one needs to dynamically generate data "buckets" but then keep track of the bucket boundaries – whether for producing labels, map legends, "alt-text", or so that future data can be benchmarked against the stored categories.


Text Analytics Using JMP®
Melvin Alexander
RIV-31

JMP® version 11 introduced the Free Text Command in the Analyze > Consumer Research > Categorical Platform under the “Multiple” tab.  This utility restricted users to just produce word frequency counts and create indicator columns of the words that appeared in free-text comment columns.  For more extensive text mining, users must use other JMP® Scripting Language (JSL) scripts, functions, and tools.  This presentation will review different ways how JMP® can parse and convert qualitative text data into quantified measures.  Text mining techniques covered in this presentation include forming Term-Document-Matrices (TDMs); apply singular value decomposition (SVD) to identify the underlying dimensions that accounts for most of the information found in documents and text, and cluster word groups to convey similar topics or themes.  Attendees should be able to use the methods for further reporting and modelling.


The Last Axis Macro You'll Ever Need
Shane Rosanbalm
RIV-122

There are several good papers out there about automating the creation of SAS axes (Dorothy E. Pugh 2000, Don Li 2003, Rick Edwards 2012).  These papers are written with classic SAS/GRAPH in mind.  But, with the rise of ODS Graphics and the corresponding much improved axes, one might reasonably ask the question, "Do we even need axis macros anymore?"  While I wholeheartedly agree that the ODS Graphics axis defaults are much nicer than what we were used to getting out of classic SAS/GRAPH, there are still situations in which we will not want to leave control of axis ranges entirely up to SAS.

In this paper I take what I see as the best ideas from the above papers and combine them into a bigger, faster, stronger axis macro.  The default behavior of the new macro is to mimic what ODS Graphics would give you.  The new macro also includes several optional parameters that allow the behavior to be customized to fit a wide variety of specialty situations (multiple variables, reference values, preferred number of tick marks, etc.).  By the end of this paper I hope you'll agree that this is indeed the last axis macro you'll ever need!


Design of Experiments (DOE) Using JMP® and SAS®
Charlie Shipp
RIV-115

JMP/SAS provides the best design of experiment software available.  The DOE team continues the tradition of providing state-of-the-art DOE support.  In addition to the full range of classical and modern design of experiment approaches, JMP provides a template for Custom Design for specific requirements.  The other choices include: Screening Design; Response Surface Design; Choice Design; Accelerated Life Test Design; Nonlinear Design; Space Filling Design; Full Factorial Design; Taguchi Arrays; Mixture Design; and Augmented Design.  Further, sample size and power plots are available.  We give an introduction to these methods followed by important examples with factors.


Creating Geographic Rating Area Maps: How to Combine Counties, Split Counties, and use Zip Code Boundaries
Rick Andrews
RIV-204

SAS/GRAPH® will be used to create choropleth maps that identify the geographic rating areas implemented by the Affordable Care Act (ACA).  The default areas for each state are Metropolitan Statistical Areas (MSAs) plus the remainder of the State that is not included in a MSA.  States may seek approval to base the rating areas on counties or three-digit zip codes, which requires that counties be combined in some states and split in two in others.  For the states that use zip codes to identify the areas, ZIP code tabulation area (ZCTA) files from the U.S. Census Bureau that are in ESRI shapefile format (.shp) are used.  Also demonstrated will be the utilization of the annotate facility to identify each area and place major cities on the maps.


Layout the Grid and You Control the Document: ODS Meets OOP and You Reap the Benefits
Daniel Ralyea and Karen Price
RIV-25

ODS is a journey not just a destination.  You can exhibit a fine degree of control over your output once you understand the basic structures involved.  Most of us are familiar with opening and closing an ODS destination (and it is good!).  Exploring the construction of a destination allows a greater understanding of the power at your finger tips.  ODS Layout provides a guiding structure for the information canvas.  ODS Region provides control of a subset of the canvas and Object Oriented Programming allows cell by cell control of a custom built table.  These tools, combined with the flexibility inherent in the output destination, allow a wide variety of production possibilities.


From SAS Data to Interactive Web Graphics Built Through PROC JSON
Robert Seffrin
RIV-128

The National Agricultural Statistics Service (NASS) publishes extensive data covering the breadth of agriculture in the United States.  To make this data more accessible to the public, NASS is exploring new and dynamic visualizations through the web.  JavaScript has become a standard for displaying and interacting with this type of data.  Developing charts from scratch has a steep learning curve requiring skill in JavaScript, HTML, and cascading style sheets.  Many JavaScript visualization libraries assist with various aspects of charting, but a library called Vega greatly reduces the need for programming by defining chart parameters through a declarative grammar formatted as JSON (JavaScript Object Notation).  While this eliminates most, if not all of, the JavaScript programming the JSON declarations can be complex with multiple nested levels.

The new PROC JSON accessed through the SAS University Edition greatly simplifies the creation of a JSON file to create an interactive scatterplot matrix where a selection in one subplot will appear in all other subplots.  Charting parameters will be stored in an easy to edit Excel file which SAS will read and use to build a JSON file with data set specific variable names.  Creating interactive web charts from SAS data is as simple as updating some parameters and building the JSON file.



Statistics and Data Analysis

MIXED_RELIABILITY: A SAS Macro for Estimating Lambda and Assessing the Trustworthiness of Random Effects in Multilevel Models
Jason Schoeneberger and Bethany Bell
SD-189

When estimating multilevel models (also called hierarchical models, mixed models, and random effect models), researchers are often interested not only in the regression coefficients but also in the fit of the overall model to the data (e.g., -2LL, AIC, BIC).  Whereas both model fit and regression coefficient estimates are important to examine when estimating multilevel models, the reliability of multilevel model random effects should also be examined. – lambda.  However, neither PROC MIXED nor PROC GLIMMIX produce estimates of lambda, the statistic often used to represent reliability.  As a result, this important metric is often not examined by researchers who estimate their multilevel models in SAS.  The macro presented in this paper will provide analysts estimating multilevel models with a readily-available method for generating reliability estimates within SAS PROC MIXED.


Testing the Gateway Hypothesis from Waterpipe to Cigarette Smoking among Youth Using Dichotomous Grouped-Time Survival Analysis (DGTSA) with Shared frailty in SAS®
Rana Jaber
SD-56

Dichotomous grouped-time survival analyses is a combination of grouped-Cox model (D'Agostino et al., 1990), discrete time-hazard model (Singer and Willet, 1993), and the dichotomous approach (Hedeker et al., 2000).  Items measured from wave 1 through wave 4 were used for time-dependent covariates linking the predictors to the risk of waterpipe smoking progression at the subsequent student’s interview.  This analysis allows for maximum data use, inclusion of the time-dependent covariates and relaxing of the proportional hazards assumption, and takes into consideration the interval censored (i.e. the event occurred during a certain known interval (e.g., one year), but the exact time at which it was occurred cannot be specified) nature of the data.  The aim of this paper is to provide new method of analyzing panel data where the outcome is binary with some explanation of the SAS® codes.  Examples of using the PROC PHREG procedure are drawn from data that was recently published in the International Journal of Tuberculosis and Lung Disease (IJTLD).


Optimizing Pilot Connection Time Using PROC REG and PROC LOGISTIC
Andrew Hummel and Shevawn Christian
SD-60

As any airline traveler knows, connection time is a key element of the travel experience.  A tight connection time can cause angst and concern, while a lengthy connection time can introduce boredom and a longer than desired travel time.  The same elements apply when constructing schedules for airline pilots.  Like passengers, pilot schedules are built with connections.  Delta Air Lines operates a hub and spoke system that feeds both passengers and pilots from the spoke stations and connects them through the hub stations.  Pilot connection times that are tight can result in operational disruptions whereas extended pilot connection times are inefficient and unnecessarily costly.  This paper will demonstrate how Delta Air Lines utilized SAS® PROC REG to analyze historical data in order to build operationally robust and financially responsible pilot connections.


Regression Analysis of the Levels of Chlorine in the Public Water Supply in Orange County, FL
Drew Doyle
SD-185

Public water supplies contain disease-causing microorganisms in the water or transport ducts.  In order to kill off these pathogens, a disinfectant, such as chlorine, is added to the water.  Chlorine is the most widely used disinfectant in all U.S. water treatment facilities.  Chlorine is known to be one of the most powerful disinfectants to restrict harmful pathogens from reaching the consumer.  In the interest of obtaining a better understanding of what variables affect the levels of chlorine in the water, this thesis will analyze a particular set of water samples randomly collected from locations in Orange County, Florida.  Thirty water samples will be collected and have their chlorine level, temperature, and pH recorded.  The chlorine levels will be read by a LaMotte Model DC1100 Colorimeter and will output the amount of chlorine in parts per million (ppm).  This colorimeter will read the total chlorine of the sample, including both free and combined chlorine levels.  A linear regression analysis will be performed on the data collected with several qualitative and quantitative variables.  Water age, temperature, time of day, location, pH, and dissolved oxygen level will be the independent variables collected from each water sample.  All data collected will be analyzed through various Statistical Analysis System (SAS) procedures.  Partial residual plots will be used to determine possible relationships between the chlorine level and the independent variables and stepwise selection to eliminate possible insignificant predictors.  From there, several possible models for the data will be selected.  F tests will be conducted to determine which of the models appears to be the most useful.  All tests will include hypotheses, test statistics, p values, and conclusions.  There will also be an analysis of the residual plot, jackknife residuals, leverage values, Cook’s D, press statistic, and normal probability plot of the residuals.  Possible outliers will be investigated and the critical values for flagged observations will be stated along with what problems the flagged values indicate.  A nonparametric regression analysis can be performed for further research of the existing data.


Alternative Methods of Regression When OLS Is Not the Right Choice
Peter Flom
SD-27

Ordinary least square regression is one of the most widely used statistical methods.  However, it is a parametric model and relies on assumptions that are often not met.  Alternative methods of regression for continuous dependent variables relax these assumptions in various ways.  This paper will explore PROCS such as QUANTREG, ADAPTIVEREG and TRANSREG for these data.


An Intermediate Guide to Estimating Multilevel Models for Categorical Data using SAS® PROC GLIMMIX
Whitney Smiley, Zhaoxia Guo, Mihaela Ene, Genine Blue, Elizabeth Leighton and Bethany Bell
SD-173

This paper expands upon Ene et al.’s (2015) SAS Global Forum proceeding paper “Multilevel Models for Categorical Data using SAS® PROC GLIMMIX: The Basics” in which the authors presented an overview of estimating two-level models with non-normal outcomes via PROC GLIMMIX.  In their paper, the authors focused on how to use GLIMMIX to estimate two-level organizational models; however, they did not address more complex organizational models (e.g., three-level models) or models used to estimate longitudinal data.  Hence, the need for the current paper; by building from the examples in Ene et al. (2015), the current paper presents users detailed discussions and illustrations about how to use GLIMMIX to estimate organizational models in situations with three levels of data, as well as two-level longitudinal data.  Consistent with Ene et al.’s paper, we will present the syntax and interpretation of the estimates using a model with a dichotomous outcome as well as a model with a polytomous outcome.  Concrete examples will be used to illustrate how PROC GLIMMIX can be used to estimate these models and how key pieces of the output can be used to answer corresponding research questions.


Adaptive Fractional Polynomial Modeling in SAS®
George Knafl
SD-65

Regression predictors are usually entered into a model without transformation.  However, it is not unusual for regression relationships to be distinctly nonlinear.  Fractional polynomials account for nonlinearity through real-valued power transformations of primary predictors.  Adaptive methods have been developed for searching through alternative fractional polynomials based on one or more primary predictors.  A SAS macro called genreg (for general regression) is available from the author for conducting such analyses.  It supports adaptive linear, logistic, and Poisson regression modeling of expected values and/or variances/dispersions in terms of fractional polynomials.  Fractional polynomial models are compared using k-fold likelihood cross-validation scores and adaptively selected through heuristic search.  The genreg macro supports adaptive modeling of both univariate and multivariate outcomes.  It also supports adaptive moderation analyses based on geometric combinations, that is, products of transforms of primary predictors with possibly different powers, generalizing power transforms of interactions.  Example analyses and code for conducting them are presented demonstrating adaptive fractional polynomial modeling.


High-Performance Procedures in SAS 9.4: Comparing Performance of HP and Legacy Procedures
Jessica Montgomery, Sean Joo, Anh Kellerman, Jeffrey Kromrey, Diep Nguyen, Thanh Pham, Patricia Rodriguez de Gil and Yan Wang
SD-180

The growing popularity of big data coupled with increases in computing capabilities has led to the development of new SAS procedures designed to more effectively and efficiently complete such tasks.  Although there is a great deal of documentation regarding how to use these new high-performance (HP)procedures relatively little had been disseminated regarding under what specific conditions users can expect performance improvements.  This paper serves as a practical guide to getting started with HP procedures in SAS.  The paper will describe the differences that exist between key HP procedures (HPGENSELECT, HPLMIXED, HPLOGISTIC, HPNLMOD, HPREGHPCORR, HPIMPUTE, HPSAMPLE, and HPSUMMARY) and their legacy counterparts both in terms of capability and performance, with a particular focus on discrepancies in Real Time required to execute.  Simulation will be used to generate data sets that vary on number of observations (10,000, 50,000, 100,000, 500,000, 1,000,000, and 10,000,000) and number of variables (50, 100, 500, 1000) to create these comparisons.

Keywords: HPGENSELECT, HPLMIXED, HPLOGISTIC, HPNLMOD, HPREG, HPCORR, HPIMPUTE, HPSAMPLE, HPSUMMARY, high-performance analytics procedures


Probability Density for Repeated Events
Bruce Lund
SD-111

In customer relationship management (CRM) or consumer finance it is important to predict the time of repeated events.  These repeated events might be a purchase, service visit, or late payment.  Specifically, the goal is to find the probability density for the time to first event, the probability density for the time to second event, etc.

Two approaches are presented and contrasted.  One approach uses discrete time hazard modeling.  The second, a distinctly different approach, uses multinomial logistic regression.  The performances of the two methods are evaluated using a simulation study.


A SAS Macro for Improved Correlation Coefficient Inference
Stephen Looney
SD-139

We present a SAS macro for improved statistical inference for measures of association, including Pearson's correlation, Spearman's coefficient, and Kendall's coefficient.  While PROC CORR is a powerful tool for calculating and testing these coefficients, some analyses are lacking.  For example, PROC CORR does not incorporate recent theoretical improvements in confidence interval estimation for Spearman's rho, nor does it provide any confidence interval at all for Kendall's tau-b.  We have written a SAS macro that incorporates these new developments; it produces confidence intervals, as well as p-values for testing any null value of these coefficients.  Improved sample-size calculations for all three coefficients are also provided in the macro.


How Latent Structure Analyses Can Improve the Fit of a Regression Model
Deanna Schreiber-Gregory
SD-191

The current study looks at several ways to investigate latent variables in longitudinal surveys and their use in regression models.  Three different analyses for latent variable discovery will be briefly reviewed and explored.  The latent analysis procedures explored in this paper are PROC LCA, PROC LTA, PROC CATMOD, PROC FACTOR, and PROC TRAJ.  The analyses defined through these procedures are latent profile analyses, latent class analyses, and latent transition analyses.  The latent variables will then be included in separate regression models.  The effect of the latent variables on the fit and use of the regression model compared to a similar model using observed data will be briefly reviewed.  The data used for this study was obtained via the National Longitudinal Study of Adolescent Health, a study distributed and collected by Add Health.  Data was analyzed using SAS 9.3.  This paper is intended for any level of SAS user.  This paper is also written to an audience with a background in behavioral science and/or statistics.


A Demonstration of SAS Analytics in Teradata
Tho Nguyen and William E Benjamin Jr
SD-203

SAS analytics in Teradata refers to the integration of advanced analytics into the data warehouse.  With this capability, analytic processing is optimized to run where the data reside, in parallel, without having to copy or move the data for analysis.  Many analytical computing solutions and large databases use this technology because it provides significant performance improvements over more traditional methods.  Come see how SAS Analytics in Teradata works and learn some of the best practices demonstrated in this session.


Confidence Intervals for Binomial Proportion Using SAS: The All You Need to Know and No More…
Jiangtang Hu
SD-103

Confidence Intervals (CI) is extremely important in presenting clinical report.  The choosing of right algorithms of CI is the plate of statisticians, but this paper is for SAS programmers where more than 15 methods to compute CI for single proportion is presented with SAS codes, by SAS procedures or customized codes.

These codes is currently hosted in my Github page: https://raw.githubusercontent.com/Jiangtang/Programming-SAS/master/CI_Single_Pr oportion.sas

Some commentaries from A SAS programmer’s point of view will also be presented.


Introducing Two-Way and Three-Way Interactions into the Cox Proportional Hazards Model Using SAS®
Seungyoung Hwang
SD-39

The Cox proportional hazards model to explore the effect of explanatory variables on survival is by far the most popular and powerful statistical technique.  It is used throughout a wide variety of types of clinical studies.  However, special techniques are required when multiple interaction terms are introduced into the Cox model.  This paper provides an in-depth analysis, with some explanation of the SAS® code.  It examines two-way and three-way interaction terms into the Cox proportional hazards model using SAS.  Examples of using the PHREG procedure are drawn from clinical data that we recently submitted to the Journal of American Geriatrics Society (JAGS).


A SAS Algorithm for Imputing Discrete Missing Outcomes Based on Minimum Distance
Macaulay Okwuokenye and Karl E. Peace
SD-113

Missing outcome data are encountered in many clinical trials and public health studies and present challenges in imputation.  We present a simple and easy to use SAS based imputation method for missing discrete outcome data.  The method is based on minimum distance between baseline covariates of those with missing data and those without missing data.  The imputation algorithm, a method that may be viewed as a variant of the hotdec imputation method, imputes missing values that are “close to” the observed values, implying that had there been data on those missing, it would have been similar to those non-missing.  An illustrative example will be presented.


A Macro for Calculating Percentiles on Left Censored Environmental Data using the Kaplan-Meier Method
Dennis Beal
SD-160

Calculating percentiles such as the median and quartiles is straight forward when the data values are known.  However, environmental data often are reported from the analytical laboratory as left censored, meaning the actual concentration for a given contaminant was not detected above the method detection limit.  Therefore, the true concentration is known only to be between 0 and the reporting limit.  The nonparametric Kaplan-Meier product limit estimator has been widely used in survival analysis on right censored data, but recently this method has also been applied to left censored data.  Kaplan-Meier can be used on censored data with multiple reporting limits with minimal assumptions.  This paper presents a SASâ macro that calculates percentiles such as the median of a left censored environmental data set using the nonparametric Kaplan-Meier method.  Kaplan-Meier has been shown to provide more robust estimates of the mean, standard deviation and percentiles of left censored data than other methods such as simple substitution and maximum likelihood estimates.  This paper is for intermediate SAS users of SAS/BASE.


Using SAS to Create an Effect Size Resampling Distribution for a Statistical Test
Peter Wludyka
SD-159

One starts with data to perform a statistical test of a hypothesis.  An effect size is associated with a particular test/sample and this effect size can be used to decide whether there is a clinically/operationally significant effect.  Since the data (the sample) is usually all a researcher knows factually regarding the phenomenon under study, one can imagine that by sampling (resampling) with replacement from that original data that additional information about the hypothesis and phenomenon/study can be acquired.

One way to acquire such information is to repeatedly resample for the original data set (using, for example, PROC SURVEYSELECT) and at each iteration (replication of the data set) perform the statistical test of interest and calculate the corresponding effect size.  At the end of this stage one has R effect sizes (R is typically greater than 1,000), one for each performance of the statistical test.  This effect size distribution can be presented in a histogram.  Uses for this distribution and its relation to the p-value resampling distribution which was presented at SESUG 2014 will be explored.


How to be a Data Scientist with SAS(r)
Chuck Kincaid
SD-100

The role of the Data Scientist is the viral job description of the decade.  And like LOLcats, there are many types of Data Scientists.  What is this new role?  Who is hiring them?  What do they do?  What skills are required to do their job?  What does this mean for the SAS programmer and the statistician?  Are they obsolete?  And finally, if I am a SAS user, how can I become a Data Scientist?  Come learn about this “job of the future” and what you can do to be part of it.


Power and Sample Size Computations
John Castelloe
SD-200

Power determination and sample size computations are an important aspect of study planning and help produce studies with useful results for minimum resources.  This tutorial reviews basic methodology for power and sample size computations for a number of analyses including proportion tests, t tests, confidence intervals, equivalence and noninferiority tests, survival analyses, correlation, regression, ANOVA, and more complex linear models.  The tutorial illustrates these methods with numerous examples using the POWER and GLMPOWER procedures in SAS/STAT® software as well as the Power and Sample Size Application.  Learn how to compute power and sample size, perform sensitivity analyses for other factors such as variability and type I error rate, and produce customized tables, graphs, and narratives.  Special attention will be given to the newer power and sample size analysis features in SAS/STAT software for logistic regression and the Wilcoxon-Mann-Whitney (rank-sum) test.

Prior exposure to power and sample size computations is assumed.


Where Did My Students Go?
Stephanie Thompson
SD-136

Many freshmen leave their first college and go on to attend another institution.  Some of these students are even successful in earning degrees elsewhere.  As there is more focus on college graduation rates, this paper shows how the power of SAS® can pull in data from many disparate sources, including the National Student Clearinghouse, to answer questions on the minds of many institutional researchers.  How do we use the data to answer questions such as “What would my graduation rate be if these students graduated at my institution instead of at another one?", “What types of schools do students leave to attend?”, and “Are there certain characteristics of students who leave, and are they concentrated in certain programs?”  The data-handling capabilities of SAS are perfect for this type of analysis, and this presentation walks you through the process.


Comparing Results from Cox Proportional Hazards Models using SUDAAN® and SAS® Survey Procedures to a Logistic Regression Model for Analysis of Influenza Vaccination Coverage
Yusheng Zhai, Katherine Kahn, Alissa O’Halloran and Tammy Santibanez
SD-42

The National Immunization Survey-Flu (NIS-Flu) is an ongoing, national telephone survey of households with children in the United States used to measure influenza vaccination coverage.  The data collected by NIS-Flu has similarities to data typically analyzed using survival analytic procedures.  Estimates of vaccination coverage from the NIS-Flu survey are calculated using Kaplan-Meier survival analysis procedures to account for censoring of the data.  However, multivariable models to examine socio-demographic characteristics associated with receipt of influenza vaccination using NIS-Flu data have typically been done using logistic regression rather than using survival analytic methods.  The logistic regression approach ignores the time-to-event and censoring characteristics of the influenza data and assumes that censoring throughout the survey period occurs equally among the comparison groups of interest.  If this assumption is untrue, measures of association for receipt of influenza vaccine could be biased.  Another approach used to address the censoring issues in NIS-Flu data is to restrict the logistic regression analysis to interviews conducted at the end of the vaccination period (i.e., March-June) when it is unlikely that many respondents would be vaccinated after the time of interview.  However, this approach ignores a large amount of data, results in a reduced precision of estimates, and potentially exacerbates recall bias.

The project assessed the feasibility, methods, and advantages of using a Cox proportional hazards model as opposed to a logistic regression model using full NIS-Flu 2013-14 season data and a logistic regression model using end of vaccination period data.  This project also compared the results of Cox proportional hazards model from SUDAAN SURVIVAL and from SAS SURVEYPHREG procedures.

The results from logistic model seem to slightly underestimate the associations between vaccination status and demographic characteristics, yet logistic model remains a reasonable alternative to the Cox proportional hazards model in analyzing the NIS-Flu data.  The SAS SURVEYPHREG and the SUDAAN SURVIVAL produced nearly identical Cox proportional hazards model results.

Conclusions drawn based on the results from logistic regression and Cox proportional hazards models using full or post-vaccination period NIS-Flu data are comparable.


An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies
Yan Wang, Seang-Hwane Joo, Patricia Rodriguez de Gil, Jeffrey Kromrey, Rheta E. Lanehart, Eun Sook Kim, Jessica Montgomery, Reginald Lee, Chunhua Cao and Shetay Ashford
SD-177

Missing data are a common and significant problem that researchers and data analysts encounter in applied research.  Because most statistical procedures require complete data, missing data can substantially affect the analysis and the interpretation of results if left untreated.  Methods to treat missing data have been developed so that missing values are imputed and analyses can be conducted using standard statistical procedures.  Among these missing data methods, Multiple Imputation has received considerable attention and its effectiveness has been explored, for example, in the context of survey and longitudinal research.  This paper compares four Multiple Imputation approaches for treating missing continuous covariate data under MCAR, MAR, and MNAR assumptions in the context of propensity score analysis and observational studies.  The comparison of four Multiple Imputation approaches in terms of bias and variability in parameter estimates, Type I error rates, and statistical power is presented.  In addition, complete case analysis (listwise deletion) is presented as the default analysis that would be conducted if missing data are not treated.  Issues are discussed, and conclusions and recommendations are provided.


Can Fit Indices Yielded from the SAS GLIMMIX Procedure Select the Accurate Q-matrix?
Yan Wang, Yi-Hsin Chen, Issac Y. Li and Chunhua Cao
SD-179

In educational diagnostic assessments, it is not uncommon to develop several competing Q-matrices that specify item-and-attribute relations and select the best fit Q-matrix among them to make valid inferences about students’ strengths and weaknesses of cognitive attributes.  Thus, selecting an accurate Q-matrix plays a crucial role in making valid inferences in diagnostic analyses.  This study is intended to examine the effectiveness of fit indices yielded from the SAS GLIMMIX procedure for selecting the accurate Q-matrix using the cross random effects linear logistic test model (CRE-LLTM).  A simulation study is designed and five fit indices (i.e., log likelihood, AIC, AICs, BIC, and HQIC) are examined.  Five design factors are manipulated, including sample size (50, 250, and 500), population distribution of cognitive attributes (normal, positively skewed, and negatively skewed), percentage (2.4%, 4.8%, and 9.6%) and type (over, under, and balanced) of the Q-matrix misspecification as well as the Q-matrix density (sparse and dense).  The number of items is fixed to be 21 with 8 attributes.  Datasets are simulated using the SAS/IML package.  For each condition, 1000 replications are generated.  The accuracy of selection is computed as the proportion of replications that select the true Q-matrix as indicated by smaller values of fit indices.  In addition, factorial ANOVA analyses with the generalized eta-squared effect size are employed to examine the impact of the manipulated factors on selecting the true Q-matrix.  The results indicate that the overall performance of the five fit indices is similar.  When sample size increases (e.g., N=500) it is relatively easier for all indices to select the true Q-matrix.  Not surprisingly, when the misspecification percentage is larger (e.g., 9.6%), fit indices can be more accurate to select the true Q-matrix regardless of the Q-matrix density, misspecification type, and sample size.  These fit indices seem to be more sensitive to misspecification of sparse Q-matrices than to the dense Q-matrices.  They are also more sensitive to type of the Q-matrix misspecification than percentage of misspecification.


Integrating PROC REG and PROC LOGISTIC for Collinearity Examination, Sample Scoring and Model Evaluation
Alec Zhixiao Lin
SD-69

At the final stage of regression, a modeler needs to examine the multicollinearity between model attributes, to score all sample files and to evaluate model performance.  Existing options in PROC LOGISTIC and PROC REG are somewhat different for obtaining variance inflation factor (VIF), conditional index as well as for scoring sample files.  This paper provides an efficient and foolproof process in SAS® that integrates those functionalities with a minimal manual handling needed.  Multiple standardized summaries from the SAS output also provide valuable insights that can be shared with business peers.


Behavioral Trajectories: An Analysis of Growth Patterns and Predictors of Changes from Preschool to First Grade
Jin Liu, Fan Pan, Yin Burgess and Christine DiStefano
SD-110

The purpose of this study is to investigate the behavioral trajectories of children from preschool to First Grade and how the behavioral changes relate to children’s demographic information.  We track 233 children’s change throughout the duration of three years from preschool to First Grade from fall, 2011 to spring, 2014 (6 time points).  Participants are 233 children from Grades One and Two.  PROC MIXED in SAS ® 9.4 is used for data analysis.  Linear mixed models are selected to investigate the personal variation in behavioral changes.  Results indicate that children’s externalizing problems are stable over time, while children’s internalizing problems decrease over time.  Children’s adaptive skills increase at the beginning but decrease at the end of First Grade.  Boys and free/reduced lunch status children are with more problems at the beginning of preschool.  We also find significant gender (English language status) differences on adaptive skills over time.  The information can assist teachers, school psychologists, and others who are concerned with children’s behavioral and emotional health.