Skip to main content

AWS went down hard, yet again - here's what happened

AWS suffered yet another major outage

AWS
(Image: © Shutterstock / tanuha2001)

Cloud computing service AWS has now recovered from a third major outage in as many weeks.

The latest AWS outage began around 4am PT/12pm GMT on December 22, with more than a thousand incident reports flagged on tracker site DownDetector.

Refresh

As mentioned, thousands of complaints have landed on DownDetector, with users across the US, Europe and Asia all reporting AWS issues.

Downdetector services hit by AWS outage

(Image credit: Future / DownDetector)

The official AWS service status dashboard isn't showing any major issues as yet, but the site itself is very slow to load, possibly indicating something is going wrong.

The only issues currently displayed are concerning "AWS Internet Connectivity" across its Northern California and Oregon areas - part of the AWS US-WEST-1 region.

AWS says it is, "investigating Internet connectivity issues to the US-WEST-1 Region."

Not exactly the "happiest place on Earth" at the moment, it seems....

It seems the issues are affecting both the US-WEST-1 and US-WEST-2 AWS regions - two huge areas for the company, and home to a huge number of customers.

AWS says it may have the issue in hand - the latest update on the AWS Status Dashboard notes:

"We have identified the root cause of the Internet connectivity to the US-WEST-1 Region and have taken steps to restore connectivity. We have seen some improvement to Internet connectivity in the last few minutes but continue to work towards full recovery."

Downdetector outage reports on AWS services

(Image credit: DownDetector)

Big update - AWS says the issue with the US-WEST-1 region in Northern California is now fixed!

"We have resolved the issue affecting Internet connectivity to the US-WEST-1 Region," the AWS status page reports. "Connectivity within the region was not affected by this event. The issue has been resolved and the service is operating normally."

And there you have it - the Oregon region is resolved too.

"We have resolved the issue affecting Internet connectivity to the US-WEST-2 Region," says AWS. "Connectivity within the region was not affected by this event. The issue has been resolved and the service is operating normally."

Well that was a wild ride wasn't it?

In case you're just joining us - two major AWS regions, US-WEST-1 and US-WEST 2 both suffered "internet connectivity" issues.

AWS says that the issues have now been fixed, so fingers crossed that's the end of the updates from us - thanks for reading TechRadar Pro!

With all systems now green, at least according to the AWS dashboard, AWS added a bit of context to the second major outage in as many weeks. The US-WEST-1 and WEST-2 regions were impacted by identical issues. We'll let them explain it: 

"Between 7:14 AM PST and 7:59 AM PST, customers experienced elevated network packet loss that impacted connectivity to a subset of Internet destinations. Traffic within AWS Regions, between AWS Regions, and to other destinations on the Internet was not impacted. 

"The issue was caused by network congestion between parts of the AWS Backbone and a subset of Internet Service Providers, which was triggered by AWS traffic engineering, executed in response to congestion outside of our network. 

"This traffic engineering incorrectly moved more traffic than expected to parts of the AWS Backbone that affected connectivity to a subset of Internet destinations. The issue has been resolved, and we do not expect a recurrence."

It sounds like the trouble started with AWS traffic engineering, which saw a heavy load of traffic coming its way, but then made the wrong call and moved too much of it to the AWS Backbone, which got in the way of Internet connectivity for some of your favorite destinations.

By now, things should be working smoothly in most of your AWS-backed systems, but we've still seen a handful of reports on Twitter of intermittent, extended outages (Oculus VR Headset connectivity, anyone?). Maybe all will be fully resolved by the morning.

If you can believe it, AWS is down yet again. Judging by the Status Dashboard the problem has to do with a single data center facility in the US-EAST-1 Region.

Here's the latest from Amazon:

 "We continue to make progress in restoring power to the affected data center within the affected Availability Zone (USE1-AZ4) in the US-EAST-1 Region. We have now restored power to the majority of instances and networking devices within the affected data center and are starting to see some early signs of recovery."

If you can believe it, AWS is down yet again. Judging by the Status Dashboard the problem has to do with a single data center facility in the US-EAST-1 Region.

Here's the latest from Amazon:

 "We continue to make progress in restoring power to the affected data center within the affected Availability Zone (USE1-AZ4) in the US-EAST-1 Region. We have now restored power to the majority of instances and networking devices within the affected data center and are starting to see some early signs of recovery."

In comparison to the previous two outages, the issue appears to be relatively minor.

"Customers experiencing connectivity or instance availability issues within the affected Availability Zone, should start to see some recovery as power is restored to the affected data center," writes Amazon.

Yup.

But even if this outage is comparatively minor, it's clearly affecting a number of a major services - especially in the US. Customers are reporting issue with Slack, Hulu, the Epic Games Store and more.

Here's a snapshot of the DownDetector homepage:

Downdetector

(Image credit: Downdetector)

The volume of reports on DownDetector appear to be tailing off slightly, from a peak around an hour ago, which is consistent with the messaging coming out of AWS.

In the meantime, we're in touch with AWS to see if we can find out anything more.

Word from Asana suggests its collaboration platform was also caught up in the outage, but only briefly.

"This incident has now been resolved, and all customers should once again be able to access Asana. Once again, our apologies for the inconvenience," wrote the firm, in a status post.

Bad news, GIF fans - image-sharing service Imgur is also down.

Here's a screen-capture of the Imgur homepage right now:

Imgur

(Image credit: Imgur)

The latest from the AWS Status Dashboard is that the issue has now been resolved, which means affected services should begin to come back online shortly.

"We have now restored power to all instances and network devices within the affected data center and are seeing recovery for the majority of EC2 instances and EBS volumes within the affected Availability Zone," writes AWS.

The company goes on to say that "all services are starting to see meaningful recovery".

Separately, we've had a contact at AWS confirm the problem has now been addressed and affected services are beginning to recover accordingly.

In a post to its own status page, Slack has confirmed that most features affected by the AWS outage are now fully functional once again. However, users are still encountering errors when uploading files to chats and channels.

Although Amazon has now restored power to the affected facility, the company says it is experiencing slower than usual recovery times as a result of network connectivity issues.

"We believe we understand why this is the case and are working on a resolution. Once resolved, we expect to see faster recovery."

It appears AWS is still struggling to remedy the connectivity issues diagnosed earlier, but the company predicts services will recover soon.