Is cloud infrastructure critical? Prepare now for supplier outages
Organizations of all types and sizes rely on cloud services. With this increased usage comes a dark side: a critical reliance on cloud applications and services that can harm business functions in the event of a cloud failure.
As more organizations move to the cloud, learn about top vendor failures and learn strategies that will help prevent downtime disruptions.
Examples of Cloud Provider Outages
Notable outages from four of the major cloud providers include the following:
- AWS failure. Three outages at AWS in November and December 2021 resulted in extended downtime for many well-known sites and services, including Slack and Epic Games. One outage lasted more than five hours. Amazon said the automated systems cause “unexpected issues” that lead to system downtime.
- Google outage. In February 2021, Google Assistant for home devices, including smart security technology and thermostats, stopped working due to a “limited experience” that was rolled out to a select group of users. In November 2021, Google Cloud Platform suffered a two-hour outage due to a network configuration error, causing downtime on sites including Home Depot, Snapchat, Spotify, and Etsy.
- Meta breakdown. Facebook, Instagram, Messenger, and WhatsApp were down for about six hours in October 2021. Facebook said routing configuration changes were to blame, and many felt that larger-scale changes to the Border Gateway protocol configuration for Facebook resulted in a series of failures.
- Microsoft outages. Azure experienced a six-hour outage in October 2021 that interrupted VM workload services and more. The outage was attributed to an outage condition encountered by VM queries for an artifact. Microsoft 365 has also experienced a number of outages over the past few years, including an Exchange Online outage in April 2021 that affected email delivery and a near complete outage. outage of all Microsoft 365 servicesincluding Exchange, SharePoint, Teams and OneDrive, in September 2020.
Why do these cloud provider outages matter?
Government agencies have shown no indication that a renaming change will happen any time soon, but the topic remains hotly debated as more organizations move their traditionally in-house apps, services and infrastructure to third-party cloud environments. .
Marking cloud providers as critical infrastructure is probably unwarranted if it’s just a matter of losing access to email, collaboration services, or file shares for a relatively short period of time. However, the largest vendors now host IoT platforms, payment processing for global financial organizations, as well as patient data processing and application integration.
Take for example, Azure Health Data Services, used by large organizations such as Humana, SAS and others to process patient and healthcare research data. Similarly, AWS is increasingly targeting the energy sector with products that include oil exploration and drilling models and oil production monitoring. The automotive industry can now leverage Google Cloud’s Connected Car Telemetry Platform to collect and coordinate data from self-driving vehicles and those with telemetry reports for speed, location, camera footage , etc.
How to Avoid Downtime Interruptions
The processing power of the cloud will continue to attract new technology models and use cases. Critical infrastructure industries will inevitably determine that the risk of third-party cloud services is lower than building and maintaining in-house workloads and applications.
For now, organizations of all types need to double down on disaster recovery (DR) and business continuity planning. Some strategic considerations include the following:
- Instead of creating a replica of the cloud infrastructure in the same vendor’s environment, consider a backup cloud infrastructure from a second vendor. However, this model increases complexity and cost.
- Invest in backup products or DR-as-a-service providers that can replicate and store cloud workloads and application data externally to the main cloud services used.
- Push SaaS vendors to offer more flexible and accessible backup options through API integration, where possible.
- Perform in-depth business impact analysis for all major cloud applications, especially SaaS as it is difficult to replicate, to align organizational risk tolerance with cloud usage.