My First Postmortem Report

Debugging project 500 server error

Sun Feb 23 2020 - 2 min read

WordPress website running on a LAMP stack incident report

Sunday, February 23, 2020

Earlier this week we experienced a website service outage that affected all users. This incident occurred on February 19, 2020.

Today we’re providing an incident report detailing the nature of the outage and our response to resolve it.

Issue summary

From 12:15 PM (GMT-5) to 2:45 PM (GMT-5), all visitors to our website receive a 500 error response message. The issue affected 100% of traffic to our website. The root cause was a spelling error in a route file in the WordPress configuration.

Timeline

Root cause and resolution

A configuration change was released to production without first being tested. The change included an invalid route for a PHP file necessary for the WordPress configuration. Typically, every new change needs to be released on a testing environment that replicates our production environment. However, this time it was not tested and was not carefully reviewed by one of our Senior engineers for approval.

Once the error was found using the strace command, we identified the configuration file using the incorrect route and fixed it.

Corrective and preventive measures

It is crucial that every commit pushed to production is first tested in a testing environment. Once everything is confirmed to be correct, it can then be pushed to production.

Actions to prevent future issues:

We are committed to continually and quickly improving our technology and operational processes to prevent this kind of outage in the future.

Regresar

Enjoy this note? Feel free to share!

X logo Y Combinator logo LinkedIn logo