Putting together a blameless postmortem culture is one of the most effective way to improve quality in an engineering organization.
One of the best manager I had once told us that the only sure way not to have an outage is not to deploy to production and I surely expect you to ship and deploy a lot of features. The only thing I care about once we have an outage is to put a postmortem and learn from it.
A good outline for a postmortem outage report:
Root Cause Analysis
Writing a good report, sharing it broadly and following up on next steps will unlock a culture of continuous improvement in your organization.
Mastering Software Engineering Course on Maven
If you liked this article, I will be teaching a “Mastering Software Engineering” course on Maven where I will teach hard-learned lessons I acquired developing large-scale products at companies such as Uber, Airbnb, and Microsoft.
Thanks for reading Software Engineering Tidbits! Subscribe for free to receive new posts and support my work.