Drupal GovCon 2024: Batch-It Crazy
At this year’s Drupal GovCon in College Park, Maryland, Steve Wirt and I presented about the many and sometimes daunting considerations related to making programmatic content changes on large sites.
Exemplary scripted changes
On the VA.gov website, we have had to perform many of these programmatic changes. In one situation, the number for the National Suicide Hotline changed. We have multiple content types and paragraphs that contain references to the hotline. Because of that, we wrote three separate scripts, each of which set up the batches using array_slice() and removed them from the total items to process. We chose the logging approach and called on a custom script library to handle revisions.
Another script was written to populate an 'updated by a human' date on every node. In that script, we used array_chunk() to create the batches. An additional script to replace external links used a different methodology for architecting the functions, reflecting the judgment of the developer doing the work. As a result, these various scripts did not have a consistent set of steps or functions between them.
Questions related to scripting content changes
When writing scripts to change content on a site programmatically, we have a lot of questions to answer.
- What needs to change?
- Which revisions should change?
- How do you handle errors?
- When should the script run?
- What should be logged and where?
The process may involve only nodes, but might also include terms or users, and the order of operations matters. Regarding revisions, if you just update the default revision, what happens when someone publishes a draft revision that pre-existed the scripted change? Do you ever have old revisions that precede the default revisions brought back and published? If so, maybe you need to update all the revisions.
What do you log and where? Do you log every item processed? If you put it in watchdog and can see the changes in the Recent Logs, that could be sufficient for a smaller site or a smaller change, but what if you are trying to log 10,000 or more changes, which would in most cases, completely fill up the Recent Logs, and may even lead to some logs not being accessible even immediately because they dropped off after reaching the threshold? Perhaps it makes sense to log directly to the terminal, instead, though that puts a lot of the onus of knowing what the logging relates to and whether it looks right on the person who has access to the command line, which may not be the developer themself.
You may want the change to happen on update or after deploy, or you may want to do it manually or have it run on a schedule. Lastly, if you get an error, perhaps you want to stop the script, or maybe it makes more sense to log the error and move on.
As in so many things in life, “it depends” is the right, though not most satisfactory, answer.
Drupal contributed module: codit_batch_operations
Given the complex nature of scripting large content changes, it makes sense to have a flexible tool that allows for repeatable steps in an organized way while logging as much as you can. That’s where the codit_batch_operations
Drupal contributed module comes in. Built directly out of work done for the Department of Veterans Affairs (VA) and the Centers for Medicare and Medicaid Services (CMS), codit_batch_operations
provides a framework for writing scripted changes in a consistently structured way, with the central functionality coming in by way of dependency injection. It also gives developers a host of helper functions to handle
- Getting content by type
- Getting the latest revision
- Getting the default revision and drafts made afterward
- Saving new revisions and more...
Key to the whole module is the log entity that it creates. Instead of logs being transient messages that disappear when the watchdog rolls over, or a terminal instance is closed, the logs persist as long as is needed on the site in the admin section. It comes with a way to set the run configuration to either fail and stop the script execution or proceed on errors, allowing you to log them and follow up afterward. While the log runs and detailed logs are available via the primary module, it also comes with a sub-module that provides a user interface for running scripts. Should the processing time exceed our environmental limits, or we lose connection while running in a terminal, the module will also pick up where we left off in a previous run. Plus, its permission structure means that we can be assured that the right people can take actions appropriate to their role.
The Operation List
The log entity
Lastly, the module lets us specify how this script will be run. We can schedule it to run “on the 2nd of February after 8:00,” along with additional yearly, monthly, or daily frequencies. These can also be stacked to allow a script to run on both Groundhog Day and Independence Day, for instance. If we need it to run only one time, we can do that manually with a drush command or have it fire during an update or deployment. We can also write custom code to have it run based on an event or submit action.
Reflections on Drupal GovCon
Drupal GovCon was a wonderful time to reconnect with a lot of people I’ve only met virtually, both at CivicActions, and at other companies and government agencies. Because the sessions were presented both by people giving talks for the first time and those who are long-time Drupal presenters and contributors, I could see the wide range of Drupal knowledge and experience, even in my own company. It was most heartening to see the representation by those in civil service, making time to keep abreast of the latest developments in open source and Drupal and presenting in their own sessions, too. In short, it means that the ecosystem for quality and transparency in digital services provided to and by the government at multiple levels remains healthy, which ultimately means serving citizens better.