Tag Archives: disaster recovery

What if Your Payment Processor Shuts Down?

What would happen to your business if your credit card processor shut down?  If you do online bill pay, what would happen if it shut down?

Millions of people and businesses got to figure that one out this month when Paypal’s TIO Networks unit suddenly shut down.  TIO does payment processing, both for merchants and for consumers who use it to pay bills at kiosks in malls, at grocery stores and other locations.

Paypal paid over $230 million for the company earlier this year.

Whether they were aware of the breach at the time that Paypal bought it or not is not clear.

In fact, all that is clear is that over a million and a half users had their information compromised.

Paypal’s decision was, on November 10th, to shut the unit down until they could fix the problems.

The impact of this shutdown varied from group to group.

If you are using the bill pay service at the grocery store, you are likely to go to another location.  Unfortunately, for TIO Networks, many of those customers won’t come back.  While this may be annoying for customers, the annoyance was likely manageable.

For merchants who uses the vendor as a merchant payment processing service and magically, with no notice, the service is shut down, that could be a big problem.

This is especially a problem for organizations that depend on credit cards such as retail or healthcare or many other consumer services.

We often talk about business continuity and disaster recovery plans, but if you operate a business and credit cards are important to you, then your plan needs to deal with how you would handle an outage of your credit card processing service.

In the case of TIO, after about a week they started bringing the service back online for a few people who were most dependent on it.

Things get a bit complicated here.  Most of the time merchant payment processors require businesses to sign a contract for some number of years.  Since the contract was written by lawyers who work for the credit card processor, it likely says that they aren’t responsible if they shut down for a week or two without notice.  It probably even says that they aren’t liable for your losses and you are still required to pay on your contract.

If you switch to a new processor, you may have two contracts,  Now what do you do?

To make things more complicated, if your payment processor is integrated with other office systems or point of sale systems, switching to a new provider is even more difficult.

I don’t have a magic answer for you – unfortunately – but the problem is solvable.  It just requires some work.  Don’t wait until you have an outage – figure it out NOW!

This is why you need to have a written and tested business continuity and disaster recovery program.

Information for this post came from USAToday.

Facebooktwitterredditlinkedinmailby feather

Do You Have a Disaster Recovery Plan for Your Front Door?

The Internet of Things never fails to amaze me.  And make us think outside of the box.

As the British publication The Register said, your smart lock may be knackered.  Google says that knacker means damage severely and I think they are right.

Here is the story.

For AirBnB hosts, one security challenge they have is how do they get keys to their one night renters in a secure manner and how do they stop those renters from making a copy of the key to rob the place later.

There is an answer.  AirBnB has actually partnered with a company that makes smart locks (hence the Internet of Things tie in).  These smart locks have a keypad on the front so that you can set a code, if you want, 5 minutes before your overnight guest arrives and tell them what it is and when they leave, you can change it.

Ignoring for the moment all the security holes in many of these smart locks, in concept it makes perfect sense.

So much sense that AirBnB recommends these $469 locks (and, maybe, gets a cut of the action;  I don’t know).

For AIrBnB homeowners, this makes their life easier.  The lock connects to WiFi which allows you to reset the code remotely, which is convenient for the owner.

It also allows for the manufacturer to download new firmware automatically (because, after all, one of the things that is not high on your priority list is patching your door. Err, door lock).

Again, in concept, I think this automatic patching is THE WAY TO GO.  People are, in general, horrible about patching software.  Whether we are talking about their computer or their phone, they just don’t do it.  So when it comes to the Internet of Things – your dishwasher, refrigerator or front door, it is pretty unlikely that you are going to patch it with any regularity, so automatic patching is good.

EXCEPT … when the manufacturer screws it up.

In this case Lockstate, who makes this formerly smart and now knackered lock, sent the wrong firmware update to some of their locks.  In this case they claim it was only 500 locks, but it certainly makes a point when you are standing on the front step of this home that you rented for hundreds of dollars a night and you can’t get in.

Apparently, they sent the firmware for their 7000i model lock to some of their 6000i model locks and, not surprisingly, it knackered the lock (I like that word).

Lockstate sent an email to the owners of these formerly smart locks and told them that they had two choices.

Option 1 was to take the back of the lock off (where I assume the smart part is) and send it back to the factory and they would either replace it or put the right software in it, making it UNknackered.  This option, they say, would take 5-7 business days.

Option 2 was for the homeowner to ask Lockstate to send you a new lock and then, once you get it, send them back the old lock.  This will take them 14-18 days to ship.

In the mean time, you get to camp out on your front doorstep, I guess.

For AirBnB home owners who may have new guests every night, this could be a problem.  Especially if the owner does not live in the same town in which the home is located.

Ultimately, the AirBnB home owners (and, apparently, they are the only ones affected because this lock was made specifically for AirBnB), will deal with it and in a week or three they will all be laughing about it.

Now to circle around to the title of the post.

As we integrate more so-called smart devices into our lives, we are going to have to create disaster recovery plans and business continuity plans for what happens when these smart devices are not so smart.

For example, let’s assume this was your house and not a rental.  The lock does have a physical key, but since you go in and out all the time using the buttons on the front (or maybe, with different locks, your smart phone), the key is in a junk drawer somewhere inside the house.  And you are standing on the front step.  What do you do?  What is your disaster recovery plan?  How do you get in and out of your house until you can get your lock repaired or replaced?

How long are you willing to be locked out of your house?

Of course, this is only a placeholder for the 20 billion smart Internet of Things devices that we, supposedly, will be using in the next few years.

What happens if they update the software in all of your smart light bulbs and they won’t turn on any more?  Or, maybe, they won’t turn off.  What if a hacker updates your light bulbs and each one of them starts calling 911 continuously (a variant of this actually happened already, so don’t call it far fetched)?

These are maybe simplistic things, but it can get more real.  Your smart car has millions of lines of software in it and it also can update itself.  The possibilities of what an errant or malicious update might do are endless.

Right now we don’t even know what these 20 billion smart devices that we are going to be using ARE, never mind how to deal with all of the potential failure modes.

I can see it now.  You buy your smart light bulb and you open the manual.  In it, in addition to the 40 safety warnings in the manual, is included, at no extra charge, a 20 page disaster recovery plan for dealing with all of the possible disasters that could happen to you and this light bulb.

The possibilities boggle the mind.

Lets assume that, in a few years, you might have a hundred smart devices in your home or apartment.  Along with, of course, a hundred disaster recovery plans.  OMG!

Unfortunately, since cost is the driver in IoT devices, the manufacturers will not put in manual controls to be used in case of emergency,  And, if current IoT security is any harbinger of the future, we know security will be terrible.

So here is one scenario.  A hacker or nation state actor decides to wreak havoc and hacks into some major vendor’s IoT devices and knackers them.  Maybe, all of the smart light bulbs in the country turn off. And won’t turn on.

OK everybody,  Where is your light bulb disaster recovery manual?  Have you practiced your light bulb disaster recovery plan?  Have you implemented your light bulb business continuity plan?

While I am doing this partly tongue in cheek, maybe it isn’t as far fetched as we would like to think.

As hundreds of AirBnB home owners discovered recently, it isn’t that far fetched.

By the way, Lockstate says that they have fixed 60 percent of the dead locks.  I guess the other 40 percent of the home owners are still standing on their front porch.

Information for this post came from The Register.

Facebooktwitterredditlinkedinmailby feather

The Fallout From a Ransomware Attack

We have heard from two big name firms who succumbed to the recent Petya/NotPetya ransomware attack and they provide interesting insights into dealing with the attack.

First a quick background.  A week ago the world was coming to grips with a new ransomware attack.  Initially called Petya because it looked like a strain of the Petya ransomware, but then called NotPetya because it became clear that it was an attempt to look like Petya but really was not the same malware.

One major difference is that it appears that this malware was just designed to inflict as much pain as possible.  And it did.

While we have no idea of all the pain it inflicted, we do have a couple of very high profile pain points.

The first case study is DLA Piper.  DLA Piper is a global law firm with offices in 40 countries and over 4,000 lawyers.

However, last week, this is what employees saw on their screens:

When employees came to work in the London office, they were greeted with this sign in the lobby:

Suffice it to say, this is not what attorneys in the firm needed when they had trials to attend to, motions to file and clients to talk to.

To further their embarrassment, DLA Piper had jumped on the WannaCry band wagon telling everyone how wonderful their cyber security practice was and that people should hire them.  Now they were on the other side of the problem.

In today’s world of social media, that sign in the lobby of DLA Piper’s London office went viral instantly and DLA Piper was not really ready to respond.  Their response said that client data was not hacked.  No one said that it was.

As of last Thursday, 3+ days into the attack, DLA Piper was not back online. Email was still out, for example.

If client documents were DESTROYED in the attack because they were sitting on staff workstations which were attacked, then they would need to go back to clients and tell them that their data wasn’t as safe as the client might have thought and would they please send them another copy.

If there were court pleadings due, they would have to beg the mercy of the court – and their adversaries – and ask for extensions.  The court likely would grant them, but it certainly wouldn’t help their case.

The second very public case is the Danish mega-shipping company A.P. Moller-Maersk.

They also were taken out by the NotPetya malware but in their case they had two problems.

Number one was the computer systems that controlled their huge container ships were down, making it impossible to load or unload ships.

The second problem was that another division of the company runs many of the big ports around the world and those port operations were down as well.  That means that even container ships of competing shipping companies could not unload at those ports.  Ports affected were located in the United States, India, Spain and The Netherlands.  The South Florida Container Terminal, for example, said that it could not deliver dry cargo and no container would be received.  At the JPNT port near Mumbai, India, they said that they did not know when the terminal would be running smoothly.

Well now we do have more information.  As of Monday (yesterday), Maersk said it had restored its major applications.  Maersk said on Friday that it expected client facing systems to return to normal by Monday and was resuming deliveries at its major ports.

You may ask why am I spilling so much virtual ink on this story (I already wrote about it once).  The answer is if these mega companies were not prepared for a major outage then smaller companies are likely not prepared either.

While we have not seen financial numbers from either of these firms as to the cost of recovering from these attacks, it is likely in the multiple millions of dollars, if not more, for each of them.

And, they were effectively out of business for a week or more.  Notice that Maersk said that major customer facing applications were back online after a week.  What about the rest of their application suite?

Since ransomware – or in this case destructoware since there was no way to reverse the encryption even if you paid the ransom – is a huge problem around the world, the likelihood of your firm being hit is much higher than anyone would like.

Now is the time to create your INCIDENT RESPONSE PLAN, your DISASTER RECOVERY PLAN and your BUSINESS CONTINUITY PLAN.

If you get hit with an attack and you don’t have these plans in place, trained and tested, it is not going to be a fun couple of weeks.  Assuming you are still in business.  When Sony got attacked it took them three months to get basic systems back online.  Sony had a plan – it just had not been updated in six years.

Will you be able to survive the effects of this kind of attack?

Information for this post came from Fortune, Reuters and another Reuters article.

Facebooktwitterredditlinkedinmailby feather

How to Spend $100 Million Without Even Trying

UPDATE: The Sun, not always the most reliable information source, is saying the outage and trickle down affected 300,000 passengers and may cost the airline $300+ million.  The CEO, Alex Cruz, allegedly said, when warned earlier about the new system installed last fall, that it was the staff’s fault, not the system’s, that things were not working as desired.   Cruz, trying to rein in the damage, said in an email to staff to stop talking about about what happened.  Others have said that the people at Tata did not have the skills to start up and run the backup system – certainly not the first time you wind up with a bumpy situation when you replace on-shore resources with much lower paid off-shore resources – resources who have zero history in the care and feeding of that particular very complex system.  Even if the folks at Tata were experienced at operating some complex computer system, no two systems are the same and there is so much chewing gum and bailing wire in the airline industry holding systems together, that without that legacy knowledge of that particular system, likely no one could make it work right.  

Of all of the weekends for an airline to have a computer systems meltdown, Memorial Day weekend is probably not the one that you would pick.

Unfortunately for British Airways, they didn’t get to “pick” when the event happened.

 

Early Saturday British Airways had a systems meltdown.  This really is a meltdown since the web site and mobile apps stopped working, passengers could not check in and employees could not manage flights, among other things.

Passengers at London’s two largest airports – Heathrow and Gatwick – were not getting any information from the staff.  Likely this was due to the fact that the systems that the staff normally used to get information were not working.

Initially, BA cancelled all flights out of London until 6 PM on Saturday, but later cancelled all flights out of London all day.

Estimates are that 1,000 flights were cancelled.

Given this is a holiday weekend, likely every flight was full.  If you conservatively assume 100 passengers per flight, cancelling 1,000 flights affected 100,000 passengers.  Given the flights are all full, even if they wanted to rebook people, there probably aren’t available seats during the next couple of days.  That means that for a lot of these passengers, they are going to have to cancel their trips.  Given that the airline couldn’t blame the weather or other natural disasters, they will likely have to refund passengers their money.  This doesn’t mean giving people credit towards a future trip, but rather writing them a check.

In Britain, airlines are required to pay penalties of up to 600 Euros per passenger, depending on the length of the delay and the length of the flight.

In addition they are required to pay for food and drinks and pay for accommodations if the delay is overnight – and potentially multiple nights.

Of course there are IT people working around the clock trying to apply enough Band-Aids to get traffic moving again.

Estimates are, so far, that this could cost the airline $100 million or more.  Another estimate says close to $200 million.  Hopefully they have insurance for this, but carrying $200 million in business interruption insurance is unlikely and many BI policies have a waiting period – say 12 hours – before the policy kicks in.

But besides this being an interesting story – assuming you were not travelling in, out or through London this weekend – there is another side of the story.

First, one of the unions blamed BA’s decision to outsource IT to a firm in India (Tata).  BA said that was not the problem.  It is true that BA has been trying to reduce costs in order to compete with low cost carriers, so who knows.  In any case, when you outsource, you really do need to make sure that you understand the risks and that doesn’t matter whether the outsourcer is local or across the globe.  We may hear in the future what happened, but, due to lawsuits, we may only hear about what happened inside of a courtroom.

Apparently, the disaster recovery systems didn’t come on line after the failure as they should have.  Whether that was due to cost reduction and it’s associated secondary effects or not we may never know.

More importantly, it is certainly clear that British Airways disaster recovery and business continuity plan was not prepared for an event like this.

One one point the CEO of BA was forced to say, on the public media, that people should stay away from the airport.  Don’t come.  Stay home.  From a branding standpoint, it doesn’t get much worse than that.  Fly BA – Please stay home.

As part of the disaster recovery plan, you need to consider contingencies.  In the case of an airline, that includes when you cancel flights, how do you get bags back to your customers.  Today, two days later, people are saying that they still don’t have their luggage and they can’t get BA to answer their phones.  BA is now saying that it could be “Quite a while” before people get their luggage back and if they don’t, that is more cost for BA to cover.

One has to assume that the outcome of all of this will be a lot of lawsuits.

From a branding standpoint this has got to be pretty ugly.  You know that there has been a lot of social media chatter on the horror stories.  In one article that I read, a passenger was talking about taking a trip from London to New York and that all the money they were going to lose for things that they planned on doing when they got to New York.  Whether BA is going to have to pay for all of that is unclear, but likely at least some of it.

You also have to assume that at least some passengers will book their next flight on “any airline, as long as it is not BA”.

To be fair to BA, there have been other, large, airline IT systems failures in the last year, but this one, it’s a biggie.   Likely these failures are, at least in part, due to the complex web of automation that the airlines have cobbled together after years of cost cutting and mergers.  Many of these systems are so old that the people who wrote them are long dead and the computer languages – notably COBOL – are considered dead languages.

The fact that there were no plans (at least none that worked) for how to deal with this – how to manage tens of thousands of tired, hungry, grumpy passengers – is an indication of work for them to do.

But bringing this home, what would happen to your company if the computers stopped working and it took you a couple of days to recover.  I know in retail, where all the cash registers are computerized and nothing has a price on it any more, businesses are forced to close the store.    We saw a bigger version of that at the Colorado Mills Mall in Golden earlier this month.  In that case likely a number of businesses will fail and people will lose their jobs and their livelihoods.

My suggestion is to get people together, think about likely and not so likely events and see how well prepared your company is to deal with each of them.  Food for thought.

Information for this post came from the Guardian here and here The Next Web  and Reuters.

Facebooktwitterredditlinkedinmailby feather

The Cloud is not a Miracle – Do Your Homework

As more people and more businesses embrace the cloud, the opportunity for disaster goes up.

For example, we have seen companies move to the Amazon cloud and then be surprised when their web sites go dark (see this example).

There are no silver bullets when it comes to data center availability and the cloud is not one.

The cloud can both help you and hurt you; good design and architecture still “rules”.

Here is a recent example.

MJ Freeway makes marijuana grow and dispensary software that helps businesses comply with the law and manage their businesses.  They claim to have processed $5 billion in transactions for the MJ industry,

Their solution is cloud based, making it easy for businesses to use their software.  Until they have a problem.

MJ Freeway’s cloud  based solution was hacked, blinding a thousand dispensaries – unable to track sales and manage inventories.  For many of these stores, that means closing the doors until they can get the problem resolved.

But the attack was interesting.  All the data was encrypted, so the hacker could not use the data.  That however, does not appear to be the hacker’s objective.  The attackers targeted live production servers and backup servers at the same time.

Because it took MJ Freeway several hours to discover the attack, the attackers had a head start and because they attacked the primary and backup sites, clients had an outage.

Some customers maintained their own personal, offline backups of their data.  Those customers were able to restore their data as soon as MJ Freeway had a stable web site.  While it was wonderful that these users did not lose any data, they were still down until their vendor could create a stable operating environment.

For users that depended on their cloud service provider to backup their data, they had a bigger problem.  Since the primary and backup web sites were attacked at the same time, no online copies of the data were usable.

The “seed to sale” data was, apparently, corrupted and may not ever be recoverable.  What that means to those dispensaries from a legal standpoint is not clear, but can’t be good.

If the hacker’s objective was to ruin these companies – to bankrupt them – to run them out of business – that may be a great way to do that.

If their objective is just to cause the dispensaries pain – including lost sales, lost customers forever (to competitors), lost business to MJ Freeway, fines for regulatory failures and a host of other costs, the hackers may well have succeeded.

However, this is a great lesson for all businesses – whether you are in a semi-legal business like marijuana or a totally mainstream business like retail or services – the cloud is a wonderful tool.  It is not, however, a silver bullet.

Cloud services go down.  They lose data.  Sometimes they go out of business unexpectedly.  Who is liable typically depends on the terms in the contract.  If the contract was written by the online service provider, you can count on the contract saying that the provider is not responsible for anything.

Plan for a disaster.  Plan for a cyber incident.  WHEN something unexpected happens (notice I said when and not if), you will be in a much better position to deal with it.

Two terms in the disaster recovery business should be in every business that uses cloud services (and others too) lexicon:

RTO – Recovery Time Objective – How long are you willing to be down for.  If the answer is a day or a week, how you prepare for a disaster is different than if the answer is 5 minutes or an hour.

RPO – Recovery Point Objective – How much data are you willing to lose (or how far back in time are you willing to restart at).  If you can lose (and I assume, recreate) a day’s worth of data, it is easier and cheaper to build a disaster recovery plan than if the answer is 15 minutes.

So everyone who signs up for a cloud solution, keep in mind that sometimes, where it is cloudy, it rains and when it does, if you have an umbrella (aka a disaster recovery plan) then you are likely OK;  however, if you don’t have that disaster umbrella, you are going to get wet;  possibly very wet.

As those dispensaries discovered; your profit can go up in smoke and not in a good way.

Information for this post came from Network World.

Facebooktwitterredditlinkedinmailby feather

Internet of Things – The New Hacker Attack Vector

Recently, Brian Krebs (KrebsOnSecurity.com) was hit with a massive denial of service attack.  The site went down – hard – and was down for days.  His Internet Service Provider kicked him off, permanently.  The attack threw over 600 gigabits per second of traffic at the site.  There are very few web sites that could withstand such an attack.

The week after that, there was another denial of service attack – this time against French web hosting provider OVH – that was over 1 terabit per second.  Apparently, OVH was able to deal with it, but these two attacks should be a warning to everyone.

These attacks were both executed using the Mirai botnet.  Mirai used hundreds of thousands to millions of Internet of Things devices to launch this attack.    The originator released the source code to this attack because, he says, that he wants to get out of the business.

While Mirai used to control around 380,000 devices every day, some ISPs have started to take action and the number is now down to about 300,000 a day.

There are a couple of reasons why the Internet of Things presents a new problem.

The first problem is patching.  When was the last time that you patched your refrigerator?  Or TV?  I thought so!  After 10 years of berating users, desktops and laptops are being patched regularly. Phones are being patched less regularly.  Internet of Things devices are patched almost never.

The second problem is numbers.  Depending who you believe, there will be billions of new IoT devices brought online over the next few years.  These range from light bulbs to baby monitors to refrigerators.  The manufacturers are in such a hurry to get products to market and since there is almost no liability for crappy security, the manufacturers are not motivated to worry about security.

Brian Krebs, in a recent post, examined the Mirai malware and identified 68 usernames and passwords hardcoded into this “first generation” IoT malware.  For about 30 of them, he has tied the credentials to specific manufacturers.

This means that with a handful of hardcoded userids and passwords, Mirai was able to control at least hundreds of thousands of IoT devices.

How many IoT devices could a second- or third- generation version of that malware control?

The third problem is the magnitude of these attacks.  While DDoS attack prevention services like Cloudflare and Akamai have been able to handle attacks in the 500 gigabit per second range, if the growth of DDoS attacks continues and we are talking about multi-terabit attacks, how much bandwidth will these providers need to purchase to keep up with the DDoS arms race.  While the cost of bandwidth is coming down, the size of attacks may be going up faster.

Lastly, ISPs – the Internet providers that enable the Internet connection to your home or office are not stepping up to the plate quickly enough to stomp out these attacks.

The ISPs may become more motivated as soon as these rogue IoT devices that are sending out DDoS traffic force the ISPs to buy more bandwidth to keep their customers happy.

Of course, like Brian Krebs, if your company winds up being the target of one of these attacks, your ISP is likely to drop you like a hot potato.  And equally likely, they will not let you back on after the attack is over.

If being able to be connected to the Internet is important to your business – and it is for most companies – you should  have a disaster plan.

The good news is that if your servers are running out of a data center, that data center probably has a number of Internet Service Providers available and you should be able to buy services from a different provider in the same data center within a few days to a week.  Of course, your servers will be dark – down – offline – in the mean time.  Think about what that means to your business.

For your office, things are a lot more dicey.  Many office buildings only have a single service provider – often the local phone company.  Some also have cable TV providers in the building and some of those offer Internet services, but my experience says that switching to a new Internet provider in your office could take several weeks and that may be optimistic.

Having a good, tested, disaster recovery plan in place sounds like a really good idea just about now.

 

Information for this post came from PC World.

The Brian Krebs post can be heard here.

Facebooktwitterredditlinkedinmailby feather