Lessons from the recent AWS & Cloudflare Outages

Stefano Sordini · CEO | Server G33k

Nov 19, 2025 · Connectivity

Over the past few weeks, the internet experienced two major disruptions; first an AWS outage, and yesterday (Tuesday, 18th November 2025) a global Cloudflare incident that took down or severely affected some of the world’s largest online platforms.

Social networks, streaming services, and even AI platforms were unreachable for hours. For most everyday users, it felt like the whole internet simply switched off.

From where I stand as the CEO of a hosting and infrastructure company, I want to share a different perspective. Not a blaming one – because outages, disruptions, and technical issues are a part of life for every provider, small or large. Rather, I want to highlight what these outages should teach all of us about how we build and rely on digital infrastructure today.

Outages Happen — Even to the Biggest Names

To me, the biggest surprise wasn’t that Cloudflare went down. It was how many massive companies went down with it.

Whether it’s AWS, Cloudflare, Google, Microsoft, or any other major provider, no company can guarantee 100% uptime forever. Systems are complex, configurations evolve, and automation can fail like any other human-written code.

As a person who runs a hosting provider for the past twenty years, I know this reality very well.

So I want to be very clear: the goal of this article is not to blame AWS or Cloudflare. Both are industry leaders. Both run incredibly sophisticated global infrastructures. Both responded quickly and transparently to their incidents.

What truly matters is not to prevent an outage from happening (that is not realistic), but how fast the provider acts, how they communicate, and how they ensure the same problem does not happen again.

The Real Problem: Relying on a Single Provider

When Cloudflare went down yesterday, a large portion of the internet collapsed with it. Exactly what happened few weeks earlier, during the AWS outage.

Many of the world’s largest companies had no fallback. No alternative path. No secondary CDN. No load-balancing across providers.

This raises an important question that every CTO, CIO, and business owner should be asking: Where were the Disaster Recovery (DR) and Business Continuity (BCP) plans?

We have been talking about DR and BCP strategies for decades. Yet the last two outages (AWS, Cloudflare) showed that even organizations with budgets in the tens or hundreds of millions still put all their eggs in one basket.

Relying on a single edge provider, single DNS provider, or single cloud platform is like building a skyscraper with only one elevator and hoping it never breaks!

Many companies today believe that if they use AWS or Cloudflare, they are automatically protected from failure. This is a naive and dangerous mindset.

It’s a fact that hyperscale providers give massive reliability, capacity, redundancy, and security.
But they do not replace your own responsibility to build resilience around your application or service!

Let’s go back to the basics and remind you what true business continuity means:

Having a multi-provider DNS cluster
Using more than one CDN or edge security provider
Distributing workloads/servers across more than one cloud/provider
Ensure monitoring systems are independent of your primary provider
Testing failover scenarios regularly

These practices are not just for large enterprises. They are essential for any organization that relies on online platforms to operate, whether it’s a small e-commerce store, a fintech company, a forex broker, or a SaaS platform.

Such Outages are More than Technical Failures

When a global outage happens, the technical explanation is often the simplest part. The real impact is felt at the business level:

Support tickets explode
Customers panic
Payments stop
Users lose access
Deadlines slip
Trust is shaken

Even the biggest companies cannot afford hours of downtime. For smaller businesses, one major outage can cost months of hard-earned revenue, SEO ranking, or customer confidence.

Lessons from the AWS and Cloudflare Incidents

Here are the key lessons I believe every company should take from the recent AWS and Cloudflare outages:

No provider is perfect, so don’t plan as if they are! Your infrastructure and network architecture must assume your provider will eventually fail.never too late
Whether you are fully on cloud or on a hybrid setup, if your entire business goes down because one vendor/provider is offline, your strategy is obviously faulty and needs to be revisited.
Always place your monitoring devices outside your primary vendor. Otherwise, you won’t see the outage until your customers complain.
Communication matters! During the recent outages both AWS and Cloudflare were providing frequent updates through their Status dashboard pages. The least you can do for your customers, during an emergency, is to be transparent and show you are working on resolving the case. Nobody likes being ghosted.

Moving Forward as an Industry

As an infrastructure provider celebrating 20 years in this industry, we have seen the internet evolve from simple hosting to cloud, to multi-cloud, and now to highly distributed architectures.

The last 3 weeks reminded us that despite all the innovation, many organizations are still building on fragile foundations – or better yet, designing infrastructures with the wrong mindset.

I encourage companies, large and small, to revisit their network architectures, re-evaluate their business continuity plans, and ensure they are not over-centralizing their critical paths.

This article is not a sales call. It’s a reminder about things all IT architects should know about already.

It’s never too late though.