Drupal.org Outage Resolved After Brief Downtime
The Drupal.org website experienced a brief outage recently, causing the site to be down for a few hours. During the downtime, Fran Garcia Linares communicated with users via a Slack group, stating,
"We’ve identified the issue and we’re working on bringing the services back up."
The Drupal team promptly addressed the issue, and the site was restored within a short span. Though the primary server was still down when this report was first filed, the load was swiftly shifted to a secondary server, and the issue was satisfactorily addressed, making the website fully operational. The reason for the primary server's failure is still under investigation, and we have yet to get an update on it.
Apart from Fran, Neil Drumm, Drupal Architect at D.A., and Narayan Newton, Lead Performance Engineer at Tag1 Consulting, helped resolve the issue.
I'm very grateful to the DA staff who responded to the alerts: Fran Garcia-Linares and Neil Drumm—as well as Narayan Newton from Tag1Consulting who resolved this issue, and who help us manage, maintain, and upgrade our infrastructure in general.
We currently run a hybrid infrastructure of virtual machines hosted at the Oregon State University Open Source Lab and a fully modern, k8s-based infrastructure at AWS, a transition which Tag1 Consulting has been managing on our behalf.
Tim Lehnen, CTO of the Drupal Association, responded to The Drop Times in a conversation with Ben Peter, our Community Manager.
The site and the Packagist repository were down for almost an hour.
Earlier, Tim commented on a LinkedIn post by Grzegorz Pietrzak about the outage, briefly explaining the scenario.
Speaking with my Drupal Association CTO hat on - a quick update on what happened:
From ~09:23 UTC / 2:23 am Pacific to ~10:29 UTC / 3:29 am Pacific, Drupal.org experienced an outage. The failure was related to Drupal.org’s high-availability media server pair, which hosts files for our sites. Unfortunately, the primary server in the pair experienced a partial failure, which took down services, but the secondary was not able to take over. To resolve the issue, we were able to manually fail-over to the secondary.
We are currently still running on the secondary and will perform a post-mortem on the primary.
Users are encouraged to follow the Drupal community channel on Slack for any further updates or information.
PS: This is a growing story, and we will be adding more details as the investigation brings in new information. Stay tuned. /Editor
Disclosure: This content is produced with the assistance of AI.