Watch Those SLAs When You Move To The Cloud

Network World wrote about a company that experienced an outage with Microsoft Office 365 cloud email.  Users could not get to their email from Outlook or on their phones for 24 hours and it affected users in the U.S. and overseas (see article).

The company filed a claim with Microsoft for breaching the SLA but Microsoft said that since webmail was still working,the system was not down.  The fact that users could not access mail was apparently not important.

This is not news – vendors have often twisted reality to suit their financial needs, but as more companies move more services to the cloud, as part of your enterprise risk assessment, you need to understand what the impact of an outage is and what your recourse is.  If your cloud vendor goes down for two days and you lose your biggest customer, it is not much consolation that they will give you a 25% credit on your next bill.  You lost a customer that generates a $100k a month and they give you a $2,000 credit.  Woop-ti-do.

The article goes into more detail and this should be part of your enterprise risk assessment, but here is a list of some things to consider when migrating to the cloud:

  1. Read everything the vendor sends – contract, attachments, addendums – everything.
  2. SLA breaches often have to be reported in order to get a credit and have to be reported quickly.
  3. An SLA of 99.9% uptime still allows for 8 hours a year downtime.
  4. Usually, each service has it’s own SLA, so if you have a Virtual Machine in the cloud and it also uses a cloud database, each of those could be down 8 hours and still be within the SLA, even though each outage takes your users down.
  5. Sometimes you have to run virtual machines in more than one region or availability set in order for the vendor to breach the SLA.  Two instances means twice the monthly cost, probably.
  6. Switching from one region to another might cause your application to fail – that is not covered by the SLA.
  7. Maybe the problem is your application or the network or some component that the vendor doesn’t cover under the SLA.  No refund in that case.
  8. The terms often change in real time.  Unless it says that they cannot change the terms except in writing and signed by both parties, you are standing on a floating dock.  Unless you are really big, good luck getting them to agree to that.
  9. Planned downtime often does not count against SLAs, so for example, when Verizon took their cloud down for a planned 48 hour outage, that probably didn’t count against the SLA.
  10. Finally, preview or beta versions usually are not covered by the SLA.  Of course, you should not be using them for production anyway.

The moral of this story is that if your systems are important to your users and your customers, an enterprise risk assessment should be conducted every year or maybe more often.