All Together Now: Strategies for Combining Data from Multiple Sources
Scheduled Time: Sunday, October 20th, 8:00 am - 12:00 pm
Intended Audience: Beginning/Intermediate programmer; moderate pace
Instructors: Christianna Williams
Abstract: Problem 1: you have data and "metadata" that need
to be combined to produce a user-friendly report.
Problem 2: you have data in several
different data sources each at different levels of aggregation (such as person-level,
site-level, and event-level) and you need to combine it into a single data set for
analysis or generating a report.
Problem 3: You need to join data from two sources based
on a range of values rather than an exact match. What is the best SAS strategy to solve
each of these problems? When should you use a MERGE (or UPDATE)? When should you use an
INNER, OUTER, or LEFT join? When should you use DATA Step SET or SQL UNION or PROC APPEND?
Or when would thoughtful use of SAS Formats allow you to combine the data in an efficient
way? This workshop will begin by presenting basic methods for combining data sets
(both concatenation and joins) to set the stage for a series of examples addressing each
of these problems and more. We will discuss strategies and solutions for each in order
to help you choose the best approach for the data combination challenges you face, and
emphasis will be placed on making a plan for your target data set before you start to
code. We will use DATA Step, PROC SQL, PROC FORMAT and other strategies to get our data
act together! I encourage you to bring examples of the types of data combination problems
you have struggled with.
Instructor Bio:
Christianna Williams, PhD has been a Senior Associate at Abt Associates Inc. since 2008. Although Abt is based in
Cambridge, MA, Christianna primarily telecommutes from her home in Chapel Hill, North Carolina. An epidemiologist
by training and disposition, she has worked in a variety of subject areas from the association of birth trauma with
left-handedness to the quality of end-of-life care in nursing homes. Arguably, she spent way too much time in school
and holds degrees from Duke University, Yale University and the University of California at Berkeley. Christianna
started using SAS as a graduate student in population biology in 1985 and is still learning! She has been a frequent
presenter at local and regional user group conferences as well as SAS Global Forum, and has been sharing her geeky
love for SAS programming through teaching for more 20 years. She also devotes as much time as possible to her other
passions: running, vegetarian cooking and reading novels.
Advanced PROC SQL Concepts and Programming Techniques Using SAS®
Scheduled Time: Sunday, October 20th, 8:00 am - 12:00 pm
Intended Audience: Intermediate and Advanced SAS users. This is an Instructor-led Workshop with many examples.
Instructor: Kirk Paul Lafler
Abstract:
Structured Query Language (SQL) is a universal language used in data science, data analytics, statistics,
data management, and other disciplines to access, transform, manipulate and output data stored in SAS data sets,
relational databases and tables. Based on Kirk's new Third Edition PROC SQL: Beyond the Basics Using SAS®,
this half-day course presents core concepts and programming techniques to help leverage PROC SQL as a programming
and database language.
Attendees learn how to construct powerful and scalable queries; construct real-world queries including nearest
neighbor and first, last and between By-group processing; apply rule-based and cost-based optimization strategies -
influencing the SQL optimizer to choose from the available join algorithms; apply effective "fuzzy" matching
programming techniques when a table's key(s) is (are) inconsistent or less than reliable; use the SQL-macro
interface to create single-value (or aggregate) and value-list macro variables; construct effective simple and
composite indexes to dynamically access a table's data; construct table validation rules using table integrity
constraints; and explore "select" query performance tuning techniques for big data environments.
Instructor Bio:
Kirk Paul Lafler is an entrepreneur and founder at Software Intelligence Corporation, and has used SAS
software since 1979 as a consultant, application developer, programmer, SAS solutions provider, data analyst,
data manager, infrastructure specialist, performance tuner, educator and author. As a SAS Certified professional,
mentor, and educator at Software Intelligence Corporation, and an advisor and adjunct professor at the University of
California San Diego Extension, Kirk has taught SAS courses, seminars, workshops, and webinars to thousands of users
around the world.
Kirk is also the author of several books including PROC SQL: Beyond the Basics Using SAS, Third Edition
(SAS Press. 2019) along with hundreds of papers and articles on a variety of SAS topics; has been selected as an
Invited speaker, educator, keynote and section leader at SAS conferences and meetings worldwide; and is the
recipient of 25 "Best" contributed paper, hands-on workshop (HOW), and poster awards.
Data-Driven Design in SAS® and Python: Developing More Dynamic, Flexible, Configurable, Reusable Software
Scheduled Time: Sunday, October 20th, 8:00 am - 12:00 pm
Intended Audience: Data-driven design and development are relevant
to all levels of expertise across all industries.
Instructor: Troy Martin Hughes
Abstract: Students will receive a complimentary copy of the
author's 2019 book SAS® Data-Driven Development: From Abstract Design to Dynamic
Functionality, a $40 value! The course follows the book's outline and teaches data-driven
techniques in which software customization, configuration, business rules, data models,
data cleaning/validation, report style, and other dynamic elements are maintained in
external data structures - NOT in the underlying code. Data-driven development techniques
allow software to adapt flexibly to various organizations, environments, and objectives.
This design facilitates highly configurable (i.e., "codeless") software whose functionality
can be modified by changing only the underlying control data - the control tables,
configuration files, parameters, and user-specified options rather than the code itself.
All examples are demonstrated in both Base SAS 9.4 and Python 3.7, so the course is ideal for either SAS or Python
developers seeking to expand their skills. All students will walk away with an understanding of how data-driven
design minimizes software maintenance and modification, as well as proven data-driven development techniques that
can be immediately implemented.
In the first half, students will learn the basics of data-driven design and data structures (i.e., control data):
Compare data-driven software design with functionally equivalent code-driven design.
Identify dynamic elements within software and learn the benefits of controlling them remotely.
Create and read various file types that contain dynamic data elements, including batch files, configuration files, control files/tables, decision tables, business rule repositories, hierarchical taxonomies, and other data models.
Create and read various control data file formats, such as Excel spreadsheets, SAS data sets, XML files, CSS files, custom-formatted text files, and directory/folder contents.
In the second half, students will use data-driven methods to solve real-world problems:
Learn SAS-specific components that support data-driven development, such as the CALL EXECUTE statement, CNTLIN statement in PROC FORMAT, SYSPARM option, SAS dictionary tables, CSSSTYLE option in PROC REPORT, and SYMGET and SYMPUT functions.
Learn Python-specific components that support data-driven development, including NumPy and Pandas.
Write batch files that parameterize dynamic elements to initiate and execute software using customized user specifications.
Clean, standardize, and categorize data using dynamic data formats and dynamic data models.
Create quality control exception reports that use dynamic data dictionaries to identify erroneous data.
Transform data using dynamic business rules and conditional logic maintained outside of software.
Create "checkpoint" control tables that validate program/process success or indicate program/process failure.
Customize the style (e.g., format, font, color scheme, graphics, etc.) and content of data products.
Instructor Bio:
Troy has more than 20 years of experience leading SAS teams and projects in support of federal, state, and local government initiatives. Since 2013, he has given more than 90 presentations, trainings, and hands-on workshops at SAS conferences, including at SAS Global Forum, SAS Analytics Experience, WUSS, SCSUG, SESUG, MWSUG, and PharmaSUG. Additionally, he has authored two books that model design and development best practices:
- SAS Data Analytic Development: Dimensions of Software Quality (2016)
- SAS Data-Driven Development: From Abstract Design to Dynamic Functionality (2019)
Troy has an MBA in information systems management and numerous certifications including SAS Base, SAS Advanced, SAS Clinical Trials, PMP, PMI-RMP, PMI-PBA, PMI-ACP, CISSP, CSSLP, ITIL, CSM, CSD, CSPO, CSP-SM, and CSP-PO. He is a US Navy veteran with two tours of duty in Afghanistan.
A Variety of Mixed Models
Scheduled Time: Sunday, October 20th, 1:00 pm - 5:00 pm
Intended Audience: Statisticians, analysts, banking and medical statistics researchers
Instructor: David A. Dickey
Abstract:
Mixed models are those with fixed and random effects. In ordinary mixed models, one estimates the fixed effects
using estimated generalized least squares where the variance-covariance matrix of the data is estimated as part of
maximum likelihood or REML (Restricted, or Residual, Maximum Likelihood) algorithm. After reviewing how to distinguish
random from fixed effects, this course will describe the overall methodology and show several examples of its
application including random coefficient models, repeated measures and hierarchical models. A review of nonlinear
models is included and the additional complexities arising from the inclusion of random effects illustrated.
A third type of model, the generalized linear mixed model, is discussed with examples. Such a model arises when the
response is not normally distributed but rather is in the exponential family of distributions. Outstanding examples
of the exponential family are the binomial and Poisson distributions. Emphasis is on concepts, examples, when to
apply each type of model, and how to interpret each.
Instructor Bio:
David A. Dickey is W. N. Reynolds Professor (emeritus) of Statistics at NC State University. He is known for the
Dickey-Fuller test for unit roots in time series. He is a Fellow of the American Statistical Association. He has
spoken at the ASA's JSM, ASQ, and CSP meetings and many times at SAS Global Forum and regional SAS Users' Group
meetings. Dickey has co-authored several books and dozens of papers. He was major advisor to 16 PhD students at
NCSU and served on hundreds of graduate student committees across campus. Dickey is a member of NCSU's Academy of
Outstanding Teachers and Academy of Outstanding Faculty Engaged in Extension. He received the D.D. Mason Faculty
Award in 1986 and 2018 and the Outstanding Extension Service Award in 2007. Dickey was a founding faculty member of
NCSU's Institute for Advanced Analytics, holds an associate appointment in Economics, and is a member of the Financial
Math faculty. He taught at Randolph Macon College and The College of William and Mary in Virginia for 3 years before
earning his PhD in 1976 under Wayne Fuller and spending the next 43 year at NC State.
ODS Graphics I: Creating Quick and Easy Graphs with the Statistical Graphics (SG) Procedures
Scheduled Time: Sunday, October 20th, 1:00 pm - 5:00 pm
Intended Audience: Novice to intermediate SAS programmers. Pace will be moderate, allowing plenty of time for Q&A as we go. I like to keep my classes as interactive as possible.
Instructor: Josh Horstman
Abstract:
The ODS Statistical Graphics (SG) Procedures represent a complete paradigm shift for the creation of high-quality
graphics using the SAS system. Legacy SAS/GRAPH functions produce crude graphics that frequently do not meet today's
standards of presentation. While customization is possible, it can require extensive coding and several tricks to
achieve desirable results. With the introduction of the SG procedures, all of that changed. This course will
provide an overview of the major procedures such as SGPLOT, SGPANEL, and SGSCATTER as well as related statements
and common options using numerous examples. Upon completion of the course, students will have the tools they need
to start producing high-quality graphics and performing basic customization using the options available.
Instructor Bio:
Josh Horstman is an independent statistical programmer based in Indianapolis with over 20 years' experience using
SAS in the life sciences industry. He specializes in analyzing clinical trial data, and his clients have included
major pharmaceutical corporations, biotech companies, and research organizations. A SAS certified programmer,
Josh loves coding and is a frequent presenter at SAS Global Forum and various regional and local SAS users group.
Josh holds a bachelor's degree in mathematics and computer science, and a master's degree in statistics from
Colorado State University.
Top 10 SAS® Best Programming Practices They Didn't Teach You in School
Scheduled Time: Sunday, October 20th, 1:00 pm - 5:00 pm
Intended Audience: To be Determined
Instructor: Charu Shankar
Abstract: This practical session will discuss the Top 10
SAS best programming practices culled from years of experience in working with SAS to
help SAS customers resolve their efficiency issues. The audience will be guided on what
worked with benchmarking statistics and why a certain practice is a best practice. This
session will provide answers to the following questions: "What are 3 questions I need to
answer before I jump into working with data", "What is the data worker's rule #1?", "What
is the only answer to the question - what's the best way to do this task?" In this session
participants will learn top 10 SAS best programming practices to improve performance.
Participants will learn data access techniques, data manipulation techniques and data
output techniques to help conserve valuable resources such as I/O, CPU, Memory and last
but not least the programmer's time. The #10 best practices offers several tips to reduce
the time you spend on typing or programming. For each best practice the presenter will
demonstrate several ways of performing a task and then, using benchmarking statistics,
show why a certain technique is more efficient. The session will also compare the data
step with the PROC step to showcase where the data step has its strength, which PROC to
use, etc. Participants will also come away with an excellent understanding of a fundamental
law of nature and how it applies to SAS programming.
Instructor Bio:
SAS Senior Technical Trainer, Charu teaches by engaging with logic, visuals and analogies to spark critical thinking.
She interviews users to recommend the right SAS training.
SAS training post blogger, yoga teacher & chef, Charu also helps support users looking to land work using SAS
through Linkedin.
Charu has presented at over 100 SAS international user group conferences on SAS programming, SAS Enterprise Guide,
PROC SQL, DS2 programming, tips and tricks, efficiencies
Advanced SAS Macro Language Techniques for Building Dynamic Programs
Scheduled Time: Wednesday, October 23, 2019 8:00 am - 12:00 pm
Intended Audience: Early intermediate to advanced SAS programmers who are familiar with the basics of the SAS macro language. Pace will be moderate, allowing plenty of time for Q&A as we go. I like to keep my classes as interactive as possible.
Instructor: Josh Horstman
Abstract:
This seminar shows you how to take advantage of SAS Macro Language capabilities that enable you to write dynamic
programs and applications. By mastering the concepts and techniques presented in this class your programs will
become free of hard-coded data dependencies, thus eliminating the need to re-write the code every time a data
set name, variable name, or other data attribute changes. Topics will include how to build and process macro variable
lists, using the macro language to control the data environment, using control files, working with datasets and
libraries in the macro language, accessing the SAS data dictionaries, and other miscellaneous macro topics that
will help you create dynamic code. (course licensed from Art Carpenter)
Instructor Bio:
Josh Horstman is an independent statistical programmer based in Indianapolis with over 20 years' experience using
SAS in the life sciences industry. He specializes in analyzing clinical trial data, and his clients have included
major pharmaceutical corporations, biotech companies, and research organizations. A SAS certified programmer,
Josh loves coding and is a frequent presenter at SAS Global Forum and various regional and local SAS users group.
Josh holds a bachelor's degree in mathematics and computer science, and a master's degree in statistics from
Colorado State University.
|