The S3 outage: Why a multi-cloud strategy makes sense for critical deployments

How do you ensure another S3 outage does not take down your app?

Last week was another rude shock for a lot of people who take public cloud infrastructure for granted. With Amazon’s popular cloud storage service, S3 (and a few other services) unavailable for a few hours in one of the most popular data centres in the US, the internet erupted in disbelief, anger, and some mockery as well. Hidden in the barrage of ‘I-told-you-so’s in the aftermath of the incident was an indicator of how the cloud, or rather its adoption, has evolved.

Internet technologies have come a long way in a short span of time because each new technology stood on the shoulder of giants. The Facebook app on your phone does not have to worry about the particular vagaries of your internet connection, it just assumes that a network connection is available and knows what to do if it isn’t. The application has abstracted out the internals of networking technologies and simply uses them. We have reached a similar stage with applications deployed on the cloud i.e. completely commoditized or abstracted infrastructure.

Products should now be agnostic to underlying infrastructure. An application should have the ability to be multi-cloud by default. If you can’t just throw more infrastructure to ‘instantly’ handle 2 times the traffic normally seen by your application, you are doing this wrong. If your provider’s highly available service is suddenly no longer available, your application should be resilient enough to continue running on redundant infrastructure or gracefully handle this failure. It does not matter where that infrastructure is sourced from — your provider’s yet another highly available data centre or a rival provider — your application should just not have to care. Incidentally, the application design ‘maturity’ needed to achieve the above is same as that needed to solve commonly seen problems plaguing developer teams striving for agility (

Well, if all this is so well known, why are modern applications still susceptible to outages at the infrastructure layer? Firstly, till about a few months ago, it was hard to design or engineer applications to be truly independent of infrastructure. In the absence of good tooling or technologies that could enforce this design, reliance on convention meant that this was one of the first long-term benefits to be sacrificed at the altar of speed. This is rapidly changing with the emergence of technologies like Docker and Kubernetes that are making a multi-cloud strategy viable — It is not that difficult anymore to engineer applications that can be migrated across providers with little effort.

Secondly, it’s hard for a fledgling software company to ignore economically priced managed services offered by infrastructure providers and instead try to build these services on their own. Why would anyone want to hire and manage database administrators (and infrastructure) when you could outsource this to a service like Amazon RDS for something as low as $30 per month? The flip-side to this convenience is deep coupling with the infrastructure provider (and it’s not just small organizations that are affected — Netflix, Slack, Trello etc. all reported varying levels of degradation in service levels last week). Fortunately for all concerned, such incidents have proved to be a shot in the arm for vendor agnostic products and services. There are multiple compelling options for every proprietary service offered by a cloud provider.

For e.g. here are some alternatives to a popular AWS services:

In conclusion, cloud infrastructure adoption is bound to evolve in the direction of applications going ‘cloud-native’ and infrastructure becoming less and less important in architecture patterns.

That applications built using the Hasura platform need to be infrastructure agnostic has been one of our core guiding principles, making it the fastest and the most convenient way to build, deploy and manage cloud-native applications on any infrastructure. (Update: We have written about a sample deployment that we were able to migrate effortlessly from Microsoft Azure to Google Cloud, because of the use of Kubernetes. Check it out here).

Hasura is an open-source engine that gives you realtime GraphQL APIs on new or existing Postgres databases, with built-in support for stitching custom GraphQL APIs and triggering webhooks on database changes.