The Disaster That Hit British Airways

disaster recovery - The Disaster That Hit British Airways

A bit IT disaster hit British Airways past weekend and caused misery for tens of thousands of customers of the airline. Here are some of the resources collected by the guy that runs WServerNews newsletter. Quite a long list of materials to read:

To begin with here are a few background news articles which may or not be relevant to the happenings of last weekend…

British Airways to replace IT workers with Indian recruits flown in on temporary visas (This is Money, October 2015)

http://www.wservernews.com/go/rvpz937g/

Have you experienced IT job losses because of outsourcing? How has this affected your organization and your own job?

GMB takes concerns over British Airways IT outsourcing to MPs (Computer Weekly, January 2016)

http://www.wservernews.com/go/8vc0hki7/

Do you think that unionizing the IT profession can help prevent these kinds of disasters from happening?

Home Office ignores plight of BA techies as job offshoring looms (The Register, June 2016)

http://www.wservernews.com/go/mz5p7nzd/

Interesting that Theresa May was the Secretary for the UK Home Office at the time BA outsourced their IT operations. I wonder whether she would have been voted in as Prime Minister if this disaster had happened earlier in her Home Office watch.

BA faces IT jobs protest over offshoring (Contractor UK, Feb 2016)

http://www.wservernews.com/go/wd3mlm4p/

BA defended its decision to outsource IT operations by saying it was a “very common practice.” Is that really the case? I thought I read somewhere that outsourcing IT is on the decline not the upswing.

Now let’s look at the news as events unfolded last weekend…

British Airways cancels all flights from Gatwick and Heathrow due to IT failure (The Guardian)

http://www.wservernews.com/go/j63azckg/

British Airways faces huge compensation bill following IT crash as stranded passengers claim (The Mirror)

http://www.wservernews.com/go/hss7s6a5/

I wonder what kind of hoops customers have to jump through in the UIK in order to obtain the compensation they’re legally entitled to by law when this sort of event happens. Have any readers done this in the past?

BA’s ‘global IT system failure’ was due to ‘power surge’ (The Register)

http://www.wservernews.com/go/wa3pz7wm/

Five questions for BA over IT crash (BBC)

http://www.wservernews.com/go/akhg7wpp/

The BBC reports that BA says “The root cause was a power supply issue which our affected our IT systems.” It seems to me that the cause of the disaster couldn’t have been a power surge but inadequate systems and/or procedures to handle the possibility of such a power surge happening, right?

British Airways could face £100m compensation bill over IT meltdown (The Guardian)

http://www.wservernews.com/go/kfp35sku/

The Guardian quotes James Walker the chief executive of free flights compensation claim site Resolver as saying “This is not like an ash cloud or traffic controllers’ strike that can’t be predicted. The computer system breaking down is within its control.” Do you think that’s a fair statement given the complexity of the IT systems needed to support the operation of a large airline like BA?

British Airways flights are facing chaos for days after computer meltdown leaves more than 100k stranded (Independent.IE)

http://www.wservernews.com/go/mebifaew/

This article quotes Captain Stephen Wearing who has flown for BA for 29 years saying that last night was “the worst chaos I’ve ever seen”. Do you think our overreliance on IT systems in our modern world is setting us up for even greater chaos?

British Airways boss ‘tries to gag staff’ over IT failure which hit 300,000 passengers after ‘inexperienced (The Sun)

http://www.wservernews.com/go/afjmdjmi/

I’m not sure how reliable The Sun is for news, but I think it highly likely that IT staff are being pressured by BA’s management to keep their mouths shut over all this.

Whistle-blower claims BA travel chaos was down to dodgy computer system – but ‘bosses refused to fix it’ (The Sun)

http://www.wservernews.com/go/38intnng/

Another article from The Sun, but since the quoted source is anonymous I’m not sure if we should trust it.

BA boss ‘won’t resign’ over flight chaos (BBC)

http://www.wservernews.com/go/akvpal9u/

He won’t have to resign, he’ll get booted out for sure by shareholders pressuring BA’s board of directors.

BA flights returning to normal after damaging IT collapse (Reuters via The Daily Star)

http://www.wservernews.com/go/lxke9cee/

Did outsourcing cause the British Airways IT meltdown? (TNW)

http://www.wservernews.com/go/dxk1bamd/

The 64 million dollar question.

Commentary: British Airways has no excuse for the chaos at Heathrow airport (The Financial Times via Channel NewsAsia)

http://www.wservernews.com/go/7vsonbi3/

Anatomy of a very British Airways IT cockup (Ars Technica UK)

http://www.wservernews.com/go/o8va73mm/

Ars Technica tries to go deep but rarely gets there IMO.

What went wrong in British Airways datacenter in May 2017? (UP2V)

http://www.wservernews.com/go/n7txcq9n/

This is a much better analysis article than Ars Technica. It’s worth reading this article from start to finish. Here’s a part that grabbed our attention:

Scottish and Southern Electricity Networks (SEN), which manage the electricity distribution network in the area north of Heathrow where British Airways’ headquarters are located, said its services were running as normal on Saturday morning. “The power surge that BA are referring to could have taken place at the customer side of the meter. SEN wouldn’t have visibility of that,” a spokesman said.

Also check out this part:

From the IT rumour mill. Allegedly, the staff at the Indian data centre were told to apply some security fixes to the computers in the data centre. The BA IT systems have two, parallel systems to cope with updates. What was supposed to happen was that they apply the fixes to the computers of the secondary system, and when all is working, apply to the computers of the primary system. In this way, the programs all keep running without any interruption. What they actually did was apply the patches to _all_ the computers. Then they shutdown and restarted the entire data centre. Unfortunately, computers in these data centres are used to being up and running for lengthy periods of time. That means, when you restart them, components like memory chips and network cards fail. Compounding this, if you start all the systems at once, the power drain is immense and you may end up with not enough power going to the computers — this can also cause components to fail. It takes quite a long time to identify all the hardware that failed and replace it.

Leave a Reply

Your email address will not be published.