Yesterday’s double trouble outage should remind businesses that planning for outages and continuing to operate is not optional.
The first outage was at Microsoft where it’s Active Directory services had some problems. Active Directory is used to “authenticate” users and services, so if it doesn’t work, not much else does.
The good news is that it happened towards the end of the work day (around 5:30 PM Eastern time for about 3 hours or so), so some of the pain was deflected. This particular type of outage is hard to build in redundancy for because it affected the behind the scenes infrastructure.
The second trouble was when 911 services in many communities in 14 states went down around 4:30 PM Mountain time. There was some question about whether these two were related, but based on what we are hearing, that is not the case. Losing 911 services is slightly more important than, saying, losing access to Twitter, even though the current occupant of the White House might disagree with that.
Like many companies, Public Safety Access Points or PSAPs, which is the technical name for 911 call centers, have outsourced some or all of their tech. Both companies involved with yesterday’s 911 outage have recently changed their name – likely to shed the reputations they had before. The company that the PSAPs contract with is Intrado, formerly known as West Safety Communications. Intrado says their outage was the fault of one of their vendors, Lumen. Many of you know Lumen as the company formerly known as Centurylink (actually, it is a piece of Centurylink).
The bottom line here is that whether you are a business selling or servicing widgets or a 911 operator, you are dependent on tech and more and more, you are dependent on the cloud. You are also dependent on third parties.
You need to decide how long you are willing to be down and how often. In general, cloud services are reliable. Some more than others. But you have lost some insight into tradeoffs being made by virtue of moving to the cloud and using third party vendors. These vendors are trying to save money. While you might agree with their decisions, you are never consulted and likely never informed.
You may be okay with this, but it should be a conscious decision, not something that happens accidentally.
Do you have a disaster recovery plan? Or a business continuity plan? When was it last tested? Are you happy with the results?
These outages were relatively short-lived. For most people the Microsoft outage affected them for around 3-4 hours. For the 911 outage, it lasted for around 1-2 hours. But many of these outages have lasted much longer than that.
Have you asked your vendors (cloud or otherwise) about their plans? Do you believe them? Are their meaningful penalties in the contract to cover your losses and your customers’ losses? Are you okay with the inevitable outages?
Consider this outage an opportunity. Credit: Brian Krebs