Hands-on Python PDFs: Using the pypdf Library To Programmatically Design, Complete, Read, and Extract Data from PDF Forms Having Digital Signatures
September 23, 2024: 8:00 AM - 9:30 AM
Hands-On Workshops, Glen Echo

Authors Abstract
Troy Martin Hughes The pypdf Python library (https://pypdf.readthedocs.io/en/stable/index.html) facilitates the programmatic creation, completion, cropping, and merging of PDF forms. Form data—including both dynamic text and field values—can be programmatically written to a PDF using pypdf, and data manually entered into PDF form fields by users can be programmatically extracted. With this combination of functionality, pypdf is a powerful tool that can build dynamically generated PDF forms that simplify user completion of forms, as well as subsequent form validation. This text introduces users to the pypdf library, and demonstrates a single use case in which copyright grant forms (CGFs, aka copyright-release or permission-to-publish forms) were automatically generated for the Western Users of SAS Software (WUSS) 2024 conference proceedings. This automation eliminated confusing language and components of the CGF—for example, by removing language specific only to US government employees (unless the author was a US government employee). Thus, in 2023, an author submitting a paper to WUSS had to navigate more than 50 unutilized fields in the CGF!! Moreover, an author completing the WUSS 2023 CGF had to enter his name, paper number, paper title, job title, and organization—information that the conference already had collected and which it should have been using to prepopulate forms for authors. Thus, the revised form and process now only requires each author to digitally sign the CGF; errors are eliminated and efficiency is maximized. This solution was developed and run on Python 3.11, and the full code is included in Appendix A.

Paper