Mike Gifford Introduces pdf-crawler for Automated PDF Accessibility Audits
Accessibility specialist Mike Gifford has introduced an open-source tool, pdf-crawler, designed to discover PDF files on websites and assess their accessibility via an automated GitHub-based workflow. The announcement was shared in a LinkedIn post describing the project as a portable adaptation of Luxembourg’s simplA11yPDFCrawler.
The tool enables users to submit a website URL, which triggers a GitHub Actions workflow that crawls the site for PDF files and evaluates them against accessibility criteria. Results are posted back to the originating GitHub issue, creating a traceable audit record without requiring dedicated server infrastructure.
According to the project documentation, pdf-crawler checks for common accessibility requirements aligned with WCAG-related standards. These include whether PDFs are tagged, contain selectable text, define document language and title, allow assistive technology access, and include bookmarks for longer documents.
The system uses a YAML manifest to track scanned files, including file hashes, allowing unchanged documents to be skipped in subsequent scans. PDF files are deleted after analysis to reduce storage overhead, while structured reports are retained.
The project builds on accessibility tooling developed by the Government of Luxembourg for public-sector monitoring. In the LinkedIn post, Gifford notes that artificial intelligence was used to help refactor and adapt the original codebase into a more portable implementation.
The tool reflects an approach that combines public-sector open-source work with automation workflows to support accessibility auditing at scale. However, the announcement does not include performance benchmarks or comparisons with other accessibility testing tools.
Further details, documentation, and access to the scanning interface are available on the official project website.


