The Candid Voice in Retail Technology: Objective Insights, Pragmatic Advice

The Perils Of Outsourcing Risk Management: Lessons From The CrowdStrike Outage

						Username: 
Name:  
Membership: Unknown
Status: Unknown
Private: FALSE
					

Midnight, Friday July 19th, 2024

An ordinary day was starting worldwide. But a minute later it became anything but ordinary. Businesses were coming to life worldwide, and the employees within those companies were accessing their personal computers to start their days. Yet something was happening that caused “The Blue Screen Of Death”. It was the 90s again.

Millions of employees in countless companies couldn’t use their Windows desktops. They were brought to a standstill by what was soon revealed to be a faulty software release by CrowdStrike, a major cybersecurity firm.

What Happened

The faulty update that caused the crash of millions of Windows computers was the fault of CrowdStrike, a company promising to protect against hacking. Their error, based on a flaw in their testing system, led to an impact beyond a hacker’s wildest dreams. Retailers such as Starbucks and Macy’s were affected at some locations, and had to go back to a cash only model. More broadly, the CrowdStrike outage hit whole industries:

  • Globally, major airlines grounded flights as airport operations were affected.
  • Banking systems fell over.
  • In the UK major supermarkets Tesco, Asda, Sainsbury’s, Waitrose, Morrisons, and Lidl were all hit.
  • The London Stock Exchange News Platform was out of action.
  • Huge amounts of money were lost: Upwards of $5 billion, experts say.

CrowdStrike put their hand up early and admitted responsibility for the accident and swear that ‘lessons will be learned’, but their stock still closed 11% down on the day.

What remains a serious concern is that in today’s society, many third-party companies handle some portion of a business’s critical operations. Companies outsource specialized functions to a large degree, and that’s dangerous.

The Allure Of The Cloud

Flexibility and cost savings come from cloud computing, where businesses can outsource not only work but also the specialist practices that go with it. And when it comes to paying the tab, few can beat the cloud for paying per unit of service.

However, “few” does not equate to “none,” which is why, in an age where increasingly more of business relies on cloud service providers (CSPs), threats await just over the horizon.

And that’s a serious concern because it is a matter of trust. Trust is something that is earned by taking good care of the customers in ways big and small. But what is to happen when that trust is breached?

When companies migrate their operations to the cloud, they also shift a portion of their risk management responsibilities to CSPs. This isn’t only about technology but also touches on vital operational functions—as well as data security and compliance. Workloads moving to the cloud are increasingly of a sensitive nature. Yet companies like CrowdStrike promise that ‘next-generation’ security tools and relatively new approaches like 24/7 ‘active monitoring’ can keep the burgeoning cloud safe. But that’s just the opportunity for risk that companies are writing into operational policies when they shift to CSPs.

What We Can Learn

The CrowdStrike service disturbance offers valuable insights into what maturing practices can bring as improvements for cloud security and resilience. Still largely untold in public, the event underscores the need for sound due diligence when choosing providers and services.

The Cloud Security Alliance, or CSA, refers to cloud maturation in four areas:

  • Cloud Security Architecture;
  • Thorough adherence to industry standards;
  • Operational maturity, with updates and not just adherence to standard best practices;
  • Incident response, with the appearance of a sufficiently robust incident response plan as a sign of “good faith” to cloud customers.

Among the first things to be learned from the CrowdStrike outage would be the value of doing one’s due diligence. Would you really share brand-affecting trust with some ‘third party’ and let them conduct your business-critical operations in the middle of your most important business cycle? Wouldn’t you want to do a deep dive into the guts of their operations? Or at least let some delegation from your business do it in your stead?

This is not just about “assessing superficial aspects of an operation’s appearance,” either; it’s about really understanding the gears which internalize the provider’s operation.

The Role Of Service Level Agreements

Service level agreements (SLAs) set out the expectations and responsibilities held between enterprises and their cloud service providers (CSPs). The first thing to say is that SLAs are usually fixed. It is unlikely that you will be able to change the terms of the agreement, unlike an outsourcing contract.

Unlike most contracts, an SLA is not signed and put into a drawer but instead should be thought of—especially from a business perspective—as an online policy of sorts. The better SLAs have many of the same elements as an actual contract.

It is crucial to carry out regular audits and reviews of the CSP’s processes and performance, and the emerging issues that could be affecting them, to maintain the required standards.

A lot can and does change, and it usually seems to happen right after you complete your due diligence and sign the contract. Building on true resilience and redundancy is the only strategy shared with us that really mitigates real risks.

In just the same way, as in the physical world, companies should be aware of single points of failure.

It Is unlikely that you would allow it in your infrastructure, but who is checking the contracts of your service providers? Are they themselves single points of failure?

… And Finally

Businesses are reminded by the CrowdStrike outage that, frankly, they can never completely outsource their risk management. They can’t just shovel it off to pay someone else to do it—like to a cloud service provider (CSP) or managed service provider (MSP).

If a company is going to engage in cloud computing, it is solely responsible for its own risk management in relation to its cloud environment.

Editor’s Note:

Nick Goss is a Premier 100 IT Leader with more than 25 years of experience in B2B SaaS. He specializes in customer-centric digital transformations and driving operational excellence. He has run services supporting over 50 million users, and now works with SaaS companies to align teams and technologies to enhance service quality and reduce costs, while also identifying technical and operational risk. Nick holds a Mini-MBA from Henley Management College and has worked in Silicon Valley, the UK, and Europe. 

 

Newsletter Articles August 1, 2024
Authors
  • Guest ContributorsNick Goss
Related Research