Category Archives: Business Continuity

How to Spend $100 Million Without Even Trying

UPDATE: The Sun, not always the most reliable information source, is saying the outage and trickle down affected 300,000 passengers and may cost the airline $300+ million.  The CEO, Alex Cruz, allegedly said, when warned earlier about the new system installed last fall, that it was the staff’s fault, not the system’s, that things were not working as desired.   Cruz, trying to rein in the damage, said in an email to staff to stop talking about about what happened.  Others have said that the people at Tata did not have the skills to start up and run the backup system – certainly not the first time you wind up with a bumpy situation when you replace on-shore resources with much lower paid off-shore resources – resources who have zero history in the care and feeding of that particular very complex system.  Even if the folks at Tata were experienced at operating some complex computer system, no two systems are the same and there is so much chewing gum and bailing wire in the airline industry holding systems together, that without that legacy knowledge of that particular system, likely no one could make it work right.  

Of all of the weekends for an airline to have a computer systems meltdown, Memorial Day weekend is probably not the one that you would pick.

Unfortunately for British Airways, they didn’t get to “pick” when the event happened.

 

Early Saturday British Airways had a systems meltdown.  This really is a meltdown since the web site and mobile apps stopped working, passengers could not check in and employees could not manage flights, among other things.

Passengers at London’s two largest airports – Heathrow and Gatwick – were not getting any information from the staff.  Likely this was due to the fact that the systems that the staff normally used to get information were not working.

Initially, BA cancelled all flights out of London until 6 PM on Saturday, but later cancelled all flights out of London all day.

Estimates are that 1,000 flights were cancelled.

Given this is a holiday weekend, likely every flight was full.  If you conservatively assume 100 passengers per flight, cancelling 1,000 flights affected 100,000 passengers.  Given the flights are all full, even if they wanted to rebook people, there probably aren’t available seats during the next couple of days.  That means that for a lot of these passengers, they are going to have to cancel their trips.  Given that the airline couldn’t blame the weather or other natural disasters, they will likely have to refund passengers their money.  This doesn’t mean giving people credit towards a future trip, but rather writing them a check.

In Britain, airlines are required to pay penalties of up to 600 Euros per passenger, depending on the length of the delay and the length of the flight.

In addition they are required to pay for food and drinks and pay for accommodations if the delay is overnight – and potentially multiple nights.

Of course there are IT people working around the clock trying to apply enough Band-Aids to get traffic moving again.

Estimates are, so far, that this could cost the airline $100 million or more.  Another estimate says close to $200 million.  Hopefully they have insurance for this, but carrying $200 million in business interruption insurance is unlikely and many BI policies have a waiting period – say 12 hours – before the policy kicks in.

But besides this being an interesting story – assuming you were not travelling in, out or through London this weekend – there is another side of the story.

First, one of the unions blamed BA’s decision to outsource IT to a firm in India (Tata).  BA said that was not the problem.  It is true that BA has been trying to reduce costs in order to compete with low cost carriers, so who knows.  In any case, when you outsource, you really do need to make sure that you understand the risks and that doesn’t matter whether the outsourcer is local or across the globe.  We may hear in the future what happened, but, due to lawsuits, we may only hear about what happened inside of a courtroom.

Apparently, the disaster recovery systems didn’t come on line after the failure as they should have.  Whether that was due to cost reduction and it’s associated secondary effects or not we may never know.

More importantly, it is certainly clear that British Airways disaster recovery and business continuity plan was not prepared for an event like this.

One one point the CEO of BA was forced to say, on the public media, that people should stay away from the airport.  Don’t come.  Stay home.  From a branding standpoint, it doesn’t get much worse than that.  Fly BA – Please stay home.

As part of the disaster recovery plan, you need to consider contingencies.  In the case of an airline, that includes when you cancel flights, how do you get bags back to your customers.  Today, two days later, people are saying that they still don’t have their luggage and they can’t get BA to answer their phones.  BA is now saying that it could be “Quite a while” before people get their luggage back and if they don’t, that is more cost for BA to cover.

One has to assume that the outcome of all of this will be a lot of lawsuits.

From a branding standpoint this has got to be pretty ugly.  You know that there has been a lot of social media chatter on the horror stories.  In one article that I read, a passenger was talking about taking a trip from London to New York and that all the money they were going to lose for things that they planned on doing when they got to New York.  Whether BA is going to have to pay for all of that is unclear, but likely at least some of it.

You also have to assume that at least some passengers will book their next flight on “any airline, as long as it is not BA”.

To be fair to BA, there have been other, large, airline IT systems failures in the last year, but this one, it’s a biggie.   Likely these failures are, at least in part, due to the complex web of automation that the airlines have cobbled together after years of cost cutting and mergers.  Many of these systems are so old that the people who wrote them are long dead and the computer languages – notably COBOL – are considered dead languages.

The fact that there were no plans (at least none that worked) for how to deal with this – how to manage tens of thousands of tired, hungry, grumpy passengers – is an indication of work for them to do.

But bringing this home, what would happen to your company if the computers stopped working and it took you a couple of days to recover.  I know in retail, where all the cash registers are computerized and nothing has a price on it any more, businesses are forced to close the store.    We saw a bigger version of that at the Colorado Mills Mall in Golden earlier this month.  In that case likely a number of businesses will fail and people will lose their jobs and their livelihoods.

My suggestion is to get people together, think about likely and not so likely events and see how well prepared your company is to deal with each of them.  Food for thought.

Information for this post came from the Guardian here and here The Next Web  and Reuters.

Facebooktwitterredditlinkedinmailby feather

Cisco, Juniper Hardware Flaw May “Brick” Firewalls in 18-36 Months

First it was Cisco; now it is Juniper and apparently there are a number of other vendors who will be affected by this flaw.

While no one is saying who the vendor of the flawed hardware inside Cisco and Juniper products is, it is believed that it is Intel’s Atom C2000 chip.  Intel has acknowledged problems with that chip which seem to match the description that Cisco and Juniper are saying exists in their hardware.  Stay tuned.

Cisco has set aside $125 million to pay for repairs for faulty equipment.

So what, exactly, is the problem?

Juniper and Cisco are saying that there is a flaw in a hardware clock component that is used in their switches, routers and security devices that may cause the device to crash and die starting about 18 months.  The device is not rebootable and not recoverable.  It is, as we geeks like to say, “bricked”.

Cisco says certain models of its series 4000 Integrated Service Routers, ASA security devices, Nexus 9000 switches and other devices are affected.

Juniper said that 13 models of switches, routers and other products are affected.

Juniper says it is not possible to fix the devices in the field.  They also said that they started using this component in January 2016, so the 18 month lifetime is rapidly approaching.  They say they are working with affected customers.

HP has announced that some of their products use the Intel C2000 and may be affected as well.   Expect more manufacturers to make announcements as they analyze their product lines.

For users, it seems like if your product is under warranty or a service contract dated as of November 16, 2016, Cisco will replace the device proactively.  They say that they expect the failure rate to have limited failures at 18 months, but a more significant failure rate as it reaches the three year age range.

For customers that are not under warranty or a service contract, well ……… I think you may be on your own.

If you have products that use this component, you should work with your suppliers to understand the risk and figure out how to mitigate it.

 

Information for this post came from Network World and CIO.

[TAG:ALERT]

Facebooktwitterredditlinkedinmailby feather

The Day The Internet Died

Well, not exactly, but close.  And it was not due to pictures of Kim Kardashian.

Here is what happened.

When you type in the name of a website to visit, say Facebook.com, the Internet needs to translate that name into an address.  That address might look like 157.240.2.35 .

The software that translates those names to numbers is called DNS or Domain Name System.  DNS services are provided by many different companies, but, typically, any given web site uses one of these providers.  The big providers work hard to provide a robust and speedy service because to load a single web page may require many DNS lookups.

One provider that a lot of big websites use is called Dyn (pronounced dine).  Today Dyn was attacked by hackers.  The attack technique is called a Distributed Denial of Service Attack or DDoS.  DDoS is a fancy term for drowning a web site in far more traffic than it can handle until it cannot perform the tasks that customers expect it to do.

In this case, customers included sites like Amazon, Paypal, Twitter, Spotify and many others.  These sites were not down, it was just that customers could not get to them.

The attacks started on the east coast, but added the west coast later.  Here is a map that pictures where the worst of the attack was.  In this picture from Downdector.com, red is bad.

ioutage

There were multiple attacks, both yesterday and today.  The attackers would attack the site for a few hours, the attack would let up and then start over again.  For the moment, the attack seems to be over, but that doesn’t mean that it won’t start back up again tomorrow, Monday or in two weeks.

You may remember I wrote about the DDoS attack against Brian Krebs web site and the hosting site OVH.  Those two attacks were massive – 600 gigabits per second in the Krebs attack and over 1 tb per second in the OVH attack.  The attackers used zombie security cameras and DVRs and the Marai attack software to launch these two attacks.

After these attacks, the attacker posted the Mirai software online for free and other attackers have downloaded it and modified it, but it still uses cameras and other Internet of Things devices that have the default factory passwords in place.

As of now, we don’t know how big this attack was, but we do know that at least part of it was based on the Mirai software.  And that it was large.  No, HUGE.

It is estimated that the network of compromised Internet of Things, just in the Mirai network,  includes at least a half million devices.  Earlier reports said that the number of devices participating in this attack was only a fraction of the total 500,000 – which means that the attack could get much bigger and badder.

The problem with “fixing” this problem is that it means one of two things: Fixing the likely millions of compromised Internet of Things devices that are part of some compromised attack network or shutting there devices down – disconnecting them from the Internet.

The first option is almost impossible.  It would require a massive effort to find the owners of all these devices, contact them, remove the malware and install patches if required.  ISPs don’t want to do this because it would be very expensive and they don’t have the margin to do that.

The second option has potential legal problems – can the ISP disconnect those users?  Some people would say that the actions of the infected devices, intentional or not, likely violates the ISP’s terms of service, so they could shut them down.  However, remember, that for most users, if the camera is at their home or business, shutting down the camera would likely meaning kicking everyone at the home or business off the Internet.  ISPs don’t want to do that because it will tick off customers, who might leave.

Since there is no requirement for users to change the default password in order to get their cameras to work, many users don’t change them.  Vendors COULD force the users to create a unique strong password when they install their IoT devices, but users forget them and that causes tech support calls, the cost of which comes out of profit.

As a result of all these unpalatable choices, the problem is likely to continue into the future for quite a while.

Next time, instead of Twitter going down, maybe they will attack the banking infrastructure or the power grid.  The good news is that most election systems are stuck way back in the stone age and they are more likely to suffer from hanging chads than hackers.

Until IoT manufacturers and owners decide to take security seriously – and I am not counting on that happening any time soon – these attacks will only get worse.

So, get ready for more attacks.

One thing to consider.  If your firm is attacked, how does that impact your business and do you have a plan to deal with it?

The thousands of web sites that were down yesterday and today were, for the most part, irrelevant collateral damage to the attacks.  Next time your site could be part of the collateral damage.  Are you ready?

Information for this post came from Motherboard and Wired.

 

Facebooktwitterredditlinkedinmailby feather

Hackers Attack France’s TV5, Almost Destroying It

All 12 channels France’s TV5 Monde were taken off the air one night in April 2015.  The company had just launched a new channel that day and were out celebrating when a flood of text messages told the director-general that all 12 stations had gone dark.

Attackers, claimed to be from the Cyber Caliphate. Since this occurred only a few months after the Charlie Hebdo attack, it certainly could be a follow on attack from Daesh (aka isis).

However, as investigations continued, another possible attacker appeared.

In this particular case, as we saw in the Sony attack, the Sands Casino attack, Saudi Aramco and others, the purpose was destruction, not theft of information.  They did a pretty good job of it.

What was not clear was why TV5 Monde was selected for this special treatment.  The attackers didn’t indicate that they had done anything wrong.

The good news was that since they had just brought a new channel online that day, technicians were still at the company offices.  They were able to figure out what server was in charge of the attack and unplug it.

While unplugging this server stopped the attack, it didn’t bring the TV feeds back on line.  Given that the goal of the attackers was to destroy and without subtlety, they destroyed software and damaged equipment.

From 8:40PM that evening until 5:25 AM the next day, those 12 channels were dark.  At 5:25 AM they were able to get one channel back on the air.

The director-general of TV5 Monde said that had they not gotten those feeds back online, the satellite distribution customers, which is most of their revenue, might have cancelled their contracts, putting the existence of the company in jeopardy.  The rest of the channels did not come back until later that day.

Much later French investigators linked the attack to the Russian hacker group APT28.

To this day, no one knows why TV5 Monde was targeted.

One theory is that it was a test run to see how much damage they could do to an organization and TV5 Monde just happened to be the crash test dummy.

The attackers had been inside TV5 Monde’s network for more than 90 days doing reconnaissance.

Once they had collected enough information, they were able to construct a bespoke (custom) attack to do as much damage as possible.

Certainly we have seen destructive attacks before, such as the ones mentioned above, but we also have seen more cyber-physical attacks such as the power blackout in Ukraine last year, the German steel mill which sustained millions of dollars of damage and the recent incursions into nuclear plants in the United States.

This company survived, even though they had to spend $5 million to repair things and incur additional costs of $3 million a year forever due to new security measures put in place.

The attack route, not surprisingly, was the Internet.  As more and more stuff gets connected – the remote control TV cameras were controlled out the Netherlands for example – the ease of attack becomes much more of a known art.  As hackers conduct test runs, such as the attack on TV5 Monde is thought to have been, they become more confident of their ability to do damage going forward.

The real question is, as your company becomes more and more intertwined with the Internet, whether your organization is vulnerable to an attack – even if all you are is a distraction or collateral damage.  And if you are vulnerable, will you be able to recover and survive?  While the Sony attack was done as a revenge attack, we are seeing other attacks which are just targets of opportunity.

The good news is that TV5 Monde survived, but they were completely disconnected from the Internet for months.  Could your company survive for months without being connected to the Internet?  In their case, once they were reconnected to the Internet, that conversation that many companies have – about security or convenience – became much more clear.  Now it was convenience or survival and survival won.  Every employee has had to permanently change the way that they operate.  Forever!

Information for this post came from BBC.

Facebooktwitterredditlinkedinmailby feather

Internet of Things – The New Hacker Attack Vector

Recently, Brian Krebs (KrebsOnSecurity.com) was hit with a massive denial of service attack.  The site went down – hard – and was down for days.  His Internet Service Provider kicked him off, permanently.  The attack threw over 600 gigabits per second of traffic at the site.  There are very few web sites that could withstand such an attack.

The week after that, there was another denial of service attack – this time against French web hosting provider OVH – that was over 1 terabit per second.  Apparently, OVH was able to deal with it, but these two attacks should be a warning to everyone.

These attacks were both executed using the Mirai botnet.  Mirai used hundreds of thousands to millions of Internet of Things devices to launch this attack.    The originator released the source code to this attack because, he says, that he wants to get out of the business.

While Mirai used to control around 380,000 devices every day, some ISPs have started to take action and the number is now down to about 300,000 a day.

There are a couple of reasons why the Internet of Things presents a new problem.

The first problem is patching.  When was the last time that you patched your refrigerator?  Or TV?  I thought so!  After 10 years of berating users, desktops and laptops are being patched regularly. Phones are being patched less regularly.  Internet of Things devices are patched almost never.

The second problem is numbers.  Depending who you believe, there will be billions of new IoT devices brought online over the next few years.  These range from light bulbs to baby monitors to refrigerators.  The manufacturers are in such a hurry to get products to market and since there is almost no liability for crappy security, the manufacturers are not motivated to worry about security.

Brian Krebs, in a recent post, examined the Mirai malware and identified 68 usernames and passwords hardcoded into this “first generation” IoT malware.  For about 30 of them, he has tied the credentials to specific manufacturers.

This means that with a handful of hardcoded userids and passwords, Mirai was able to control at least hundreds of thousands of IoT devices.

How many IoT devices could a second- or third- generation version of that malware control?

The third problem is the magnitude of these attacks.  While DDoS attack prevention services like Cloudflare and Akamai have been able to handle attacks in the 500 gigabit per second range, if the growth of DDoS attacks continues and we are talking about multi-terabit attacks, how much bandwidth will these providers need to purchase to keep up with the DDoS arms race.  While the cost of bandwidth is coming down, the size of attacks may be going up faster.

Lastly, ISPs – the Internet providers that enable the Internet connection to your home or office are not stepping up to the plate quickly enough to stomp out these attacks.

The ISPs may become more motivated as soon as these rogue IoT devices that are sending out DDoS traffic force the ISPs to buy more bandwidth to keep their customers happy.

Of course, like Brian Krebs, if your company winds up being the target of one of these attacks, your ISP is likely to drop you like a hot potato.  And equally likely, they will not let you back on after the attack is over.

If being able to be connected to the Internet is important to your business – and it is for most companies – you should  have a disaster plan.

The good news is that if your servers are running out of a data center, that data center probably has a number of Internet Service Providers available and you should be able to buy services from a different provider in the same data center within a few days to a week.  Of course, your servers will be dark – down – offline – in the mean time.  Think about what that means to your business.

For your office, things are a lot more dicey.  Many office buildings only have a single service provider – often the local phone company.  Some also have cable TV providers in the building and some of those offer Internet services, but my experience says that switching to a new Internet provider in your office could take several weeks and that may be optimistic.

Having a good, tested, disaster recovery plan in place sounds like a really good idea just about now.

 

Information for this post came from PC World.

The Brian Krebs post can be heard here.

Facebooktwitterredditlinkedinmailby feather

Learning About Ransomware – The Hard Way

A small New England retailer learned about ransomware the hard way.  After an employee clicked on a link, that system was infected with Cryptowall.

The malware encrypted, among other files, the company’s accounting software.

The accounting software did not live on that user’s computer;  it lived on the network, but since that user had access to that network drive,  so the malware was able to encrypt the accounting files.  This is a very common situation with ransomware.  It will attempt to encrypt any files that it can get write access to .

The attackers asked for $500 in bitcoin, which is pretty typical.  It is a number which is low enough that many people will decide it is easier to pay up than to deal with it.

The best protections for ransomware is good backups.  More than one copy and not directly accessible from the system under attack, otherwise the ransomware could encrypt the backups also.

Unfortunately for this company, their backup software had not worked for over two years – and they did not know it.

Believe it or not, we see this a lot.  Either backups don’t work, they do not back up all of the critical data or they are out of date.  In many cases, no one has EVER tried to restore from the backup, so how they find out that the backups don’t work is when they try to restore from them.  If systems are backed up individually, then each and every backup needs to be tested.

So in this case, the business owner paid the ransom.

Unfortunately, ransomware, like most software, has bugs in it so when they attempted to decrypt the files after the ransom was paid, the decryption did not work.

The hackers, concerned that their business model would fail if the victims paid the ransom and did not get their data back, even offered to try and decrypt the files – if the business owner sent the files to the hacker.  The owner declined.

At this point the business owner doesn’t think he can trust his systems, but he doesn’t want to spend $10,00 to rebuild them.

And all because an employee clicked on the wrong link.

Information for this post came from True Viral News.

Facebooktwitterredditlinkedinmailby feather