Our CTO, Josh Stephens, was quoted in Enterprise Security Tech with thoughts about the recent Microsoft network outage.
One of my favorite news-taglines from the outage came from Richi Jennings at DevOps.com:
Analysis: It’s DNS. It’s always DNS (unless it’s BGP)
Without more details about what happened, which Microsoft has promised to share, it’s hard to know exactly what happened. So, I won’t talk about the outage.
I will say that if an outage can happen to Microsoft, it can happen to anyone.
The rumor is that it happened as the result of a change.
Josh mentioned one thing that benefits an organization of any size making any potentially destructive change.
Do a backup before and after the change. This backup, it needs to be automated so that it’s super easy to do otherwise, under the pressure of the task it will be skipped (or done incompletely).
Here’s the thing most people don’t realize.
The use of the backup is obvious. You have the before, and after, and if the change messes things up, you restore the backup from before the change.
However, in that moment, all you can think is to get the network back. And, once you do, you’re left combing though log files trying to see what happened.
With BackBox, it’s so much easier. Our Backup analysis tools that are critical to performing reliable backups, also assist with post-event analysis.
Did the network fail because the original plan was wrong? Or because of a change misapplied? Was there something else? Was the production configuration the same as the boot-configuration? What commands were actually applied during the upgrade? And, you get these not by crawling though and diffing log files, but with a nice interface in the context of your network or firewall devices.
Things we can learn
This is, of course, a biased list because BackBox delivers a network automation platform. Just because I’m biased, doesn’t mean I’m not right.
- All companies make mistakes.
- Have a plan, and a backup plan to the plan.
- Automate backups before and after any potentially destructive changes, no matter how minor.
- Use automation tools that help you easily understand what went wrong.