Mike Gifford Introduces pdf-crawler for Automated PDF Accessibility Audits

Staff Reporter

Mike Gifford Launches pdf-crawler for Large-Scale PDF Accessibility Audits

Accessibility specialist Mike Gifford has introduced an open-source tool, pdf-crawler, designed to discover PDF files on websites and assess their accessibility via an automated GitHub-based workflow. The announcement was shared in a LinkedIn post describing the project as a portable adaptation of Luxembourg’s simplA11yPDFCrawler.

The tool enables users to submit a website URL, which triggers a GitHub Actions workflow that crawls the site for PDF files and evaluates them against accessibility criteria. Results are posted back to the originating GitHub issue, creating a traceable audit record without requiring dedicated server infrastructure.

According to the project documentation, pdf-crawler checks for common accessibility requirements aligned with WCAG-related standards. These include whether PDFs are tagged, contain selectable text, define document language and title, allow assistive technology access, and include bookmarks for longer documents.

The system uses a YAML manifest to track scanned files, including file hashes, allowing unchanged documents to be skipped in subsequent scans. PDF files are deleted after analysis to reduce storage overhead, while structured reports are retained.

The project builds on accessibility tooling developed by the Government of Luxembourg for public-sector monitoring. In the LinkedIn post, Gifford notes that artificial intelligence was used to help refactor and adapt the original codebase into a more portable implementation.

The tool reflects an approach that combines public-sector open-source work with automation workflows to support accessibility auditing at scale. However, the announcement does not include performance benchmarks or comparisons with other accessibility testing tools.

Further details, documentation, and access to the scanning interface are available on the official project website.

Mike Gifford

PDF

Accessibility

Digital Accessibility

WCAG

GovTech

Automation

Disclosure: This content is produced with the assistance of AI.

Note: The vision of this web portal is to help promote news and stories around the Drupal community and promote and celebrate the people and organizations in the community. We strive to create and distribute our content based on these content policy. If you see any omission/variation on this please reach out to us at #thedroptimes channel on Drupal Slack and we will try to address the issue as best we can.