One Click to Create a List of MEPS-HC ZIP Files' Download URLs: Web Scraping with SASĀ® and Python
September 24, 2024: 8:15 AM - 8:30 AM
Intermediate & Advanced SAS Skills, Salon A

Authors Abstract
Pradip K. Muhuri, Charles Z.X. Han, John Vickery This case study demonstrates the use of SASĀ® and Python to automate data extraction and processing from the Medical Expenditure Panel Survey (MEPS) webpages. Both programs streamline the tasks of fetching webpages, parsing the HTML content, and generating identical outputs: a comprehensive list of 1,299 data file download URLs and associated information for 431 MEPS-Household Component (HC) public-use ZIP files in various formats (e.g., ASCII, Excel, SAS transport, SAS V9, and Stata), covering survey years from 1996 to 2022. This list is saved in an Excel spreadsheet, allowing users to download files of interest directly to their devices with a single click on each corresponding URL, offering a faster alternative to manually navigating multiple webpages. The dynamic construction of these URLs and the reuse of the SAS/Python program to keep the list of data file download URLs updated streamline the process, making MEPS data file downloads from the web more efficient.

Paper