Then let’s say you have some form of IT failure. How do you communicate with your customers?
At London’s Gatwick airport, apparently your DR plan consists of trotting out a small white board and giving a customer service agent a dry erase marker and a walkie-talkie.
On the bright side, they are using black markers for on time flights and red markers for others.
Gatwick is blaming Vodafone for the outage. Vodafone does contract with Gatwick for certain IT services.
You would think that an organization as large as Gatwick would have a well planned and tested Disaster Recovery strategy, but it would appear that they don’t.
Things, they say, will get back to normal as soon as possible.
Vodafone is saying:
We have identified a damaged fibre cable which is used by Gatwick Airport to display flight information. "Our engineers are working hard to fix the cable as quickly as possible. This is a top priority for us and we are very sorry for any problems caused by this issue.
But who is being blasted in social media as “absolute shambles”, “utter carnage” and “huge delays”? Not Vodafone.
Passengers are snapping cell phone pictures and posting to social media with snarky comments.
Are you prepared for an IT outage?
First of all, there are a lot of possible failures that could happen. In this case, it was a fiber cut that somehow took everything out. Your mission, should you decide to accept it, is to identify all the possible failures. Warning, if you do a good job of brainstorming, there will be a LOT.
Next you want to triage those modes. Some of them will have a common root cause or a common possible fix. Others you won’t really know what the fix is.
You also want to identify the impact of each failure. In Gatwick’s case, the failure of all of the sign boards throughout the airport, while extremely embarrassing and which will generate a lot of ridicule on social media is probably less critical than a failure of the gate management software which would basically stop planes from landing because there would not be a way to get those planes assigned to a gate. A failure of the baggage automation system would stop them from loading and unloading bags, which represents a big problem.
Once you have done all that, you can decide which failures you are willing to live with and which ones are a problem.
Then you can brainstorm ways to mitigate the failure. Apparently, in Gatwick’s case, rounding up a few white boards, felt tip markers and walkie talkies was considered acceptable.
After the beating they took today on social media, they may be reconsidering that decision.
In some cases you may want an automated disaster recovery solution; in other cases, a manual one may be acceptable and in still other ones, having an outage until it is fixed may be OK.
Time may play a factor into this answer also. For example, if the payroll system goes down but the next payroll isn’t for a week, it MAY not be a problem at all, but if payroll has to be produced today or tomorrow, it could be a big problem.
All of this will be part of your business continuity and disaster recovery program.
Once you have this disaster recovery and business continuity program written down, you need to create a team to run it, train them and test it. And test it. And test it. When I was a kid there was a big power failure in the northeast. There was a large teaching hospital in town that lost power, but, unfortunately, no one had trained people on how to start the generators. That meant that for several hours until they found the only guy who knew how to start the generators, nurses were manually running heart lung machines and other critical patient equipment by hand. They fixed that problem immediately after the blackout so the next time it happened, all people saw was a blink of the lights. Test. Test. Test!
If this seems overwhelming, please contact us and we will be pleased to assist you.
Information for this post came from Sky News.