Students attempting to submit assignments to Schoology on Tuesday, Oct. 21st were faced with long wait times and error messages as cloud provider Amazon Web Services (AWS) experienced an outage. This 15 hour outage was caused by a single faulty line of code, and stopped service to roughly 50 million websites across the web. AWS hosts a number of the internet’s most traveled websites, from streaming platforms like Netflix and Twitch to major consumer brands like McDonald’s and Apple, and even government agencies like NASA and the Census Bureau. If AWS coughs, the entire internet catches the cold.
As the world becomes more and more digital, an online presence is basically required for any modern business; that’s where cloud providers step in. Cloud providers run the machinery behind the scenes to help a website run. Instead of companies buying and maintaining their own expensive hardware, they can rent it out from providers and let someone else deal with the heavy-lifting; until someone drops the ball.
Some of our school’s digital resources depend on AWS, so when the outage hit, several key platforms, like Powerschools (Schoology) and Khan Academy, went down. Teachers and students suddenly lost access to assignments, calendars, and important study materials.
With the internet being dominated by a handful of cloud providers, a single hiccup can cascade into world-wide chaos. AWS alone hosts roughly 30% of the internet. This concentration means that one failure ends up with millions of unresponsive websites.
When a handful of companies control the majority of the internet’s resources, the overall resilience of our systems are weakened. While minor inconveniences frustrate most users, outages of critical services have much larger scale consequences. For many users, the immediate effects, like streaming services being down or being unable to view your grades, are frustrating but not critical. However, when more important services like supply chain management and telemedicine go offline, the consequences can become far more dire.
Major supply chain management companies, such as Amazon and DHL, rely on AWS to coordinate deliveries and track inventory. When these systems go down, delays and shortages begin to drag down our economy. While the loss of supply chain management affects millions, other services, such as telemedicine, have a greater impact on smaller populations. Telemedicine is often the only readily available medical consultation for those in rural areas, and when AWS, the cloud provider that some of the largest telemedicine providers rely on, goes down, those people lose access to medical help. Other systems, like emergency alert systems, local government operations, and remote learning programs, all experience disruptions during the outage.
In response to Tuesday’s outage, AWS shut down the affected automation tools and is implementing safeguards to prevent recurring large-scale outages. However, the outage on Tuesday wasn’t an isolated event. In 2024, AWS experienced a 7-hour outage affecting one region in North Carolina. The year prior, in 2023, issues with AWS’s Lambda system led to service disruptions in Northern Virginia. These recurring outages emphasize the inherent vulnerabilities in centralized cloud infrastructure.
This outage serves as a warning sign and a reminder of the need for resilience in our digital infrastructure. As businesses become more reliant on cloud providers, it is crucial to consider the potential risks and implement strategies to mitigate them. Diversifying cloud providers, implementing robust fail-safe systems, and keeping some tools offline are all key steps in building a more resilient digital ecosystem.
