Jeff Barr of AWS just published a blog post about Amazon’s July 11 Prime Day and the AWS resources it used during that day.
Barr doesn’t spend much time on Prime Day itself, other than to note it’s a big day for Amazon. However, if you want to know more about Prime Day, here are some articles: SPS Commerce, Business Insider, and CNBC. My favorite stat: the 12 minutes it took one order to go from mouseclick to customer delivery. I can’t even put on my shoes and get in the car in 12 minutes!
Barr spends most of the post discussing how AWS prepares for Prime Day and the AWS resources it consumes.
In the preparation section of the post, he notes that AWS conducts meticulous planning to analyze its readiness for the day. Each team goes through a questionnaire/checklist to ensure its services are up to scratch for the day.
After planning and preparation, AWS executes what it calls “Game Day,” — a set of drills to ensure operational practices are ready to respond to Prime Day challenges — resource failures, application outages, and the like. Similar to Netflix’s Chaos Monkey, AWS deliberately fails resources to see if the system is resilient and can bring new resources on line quickly.
So far, so good. The planning and testing makes sense and, presumably, came together well. One unaddressed topic in this section of the post is how much collaboration there was between Amazon and AWS in preparing for Prime Day. I wouldn’t have thought this project would be treated by Amazon as a hands-off situation, given the importance of Prime Day.
The really interesting part of the blog post, though, is Barr’s recital of the resources consumed by Prime Day:
It’s difficult to wrap one’s head around some of these numbers. 3.34 trillion DynamoDB requests? 419 billion API calls? 52 PB of EBS traffic? That is a gigantic amount of resource use. Simply astonishing.
The other astonishing thing? The effect Prime Day had on every other AWS customer: None.
During Prime Day, I saw not a single complaint from any AWS user about resources being unavailable, AWS running slowly, API calls backing up — nothing.
It should give everyone an idea of the size of AWS’s resource pool that something like Prime Day can occur and not affect any other AWS customer. Clearly, AWS has huge amounts of computing capacity within its data centers, and, of course, capacity is always growing. In his 2016 “state of the infrastructure” talk, James Hamilton let drop that several (i.e., at least three) of AWS’s availability zones have over 300,000 servers (!) in them.
And that tells us something critical about AWS and cloud computing. I would include Microsoft and Google in this as well (in fact, I refer to them as AMG to highlight their uniqueness in the cloud space).
Simply stated, AMG are the only entities spending enough to deal with today’s — and tomorrow’s — workloads.
Let me contrast those Prime Day stats with a recent experience of mine. I was evaluating a cloud provider (I won’t name the provider to avoid embarrassing it) — one that proclaims itself a significant presence in the market.
I did my testing just fine. But because of my account, I received service notifications. And I got several that discussed how the provider ran into API capacity problems which might have caused response issues or even timeouts.
This is 2017. API capacity problems? This is fundamental capability. How is this provider going to compete with a company that can swallow Prime Day without a hiccup?
Not only is (are?) AMG the big three, over time the distance between AMG and the rest of the CSP industry is going to widen. Only these three have the deep pockets to keep up the punishing level of investment necessary to be a major player in this market.
You may recall the spate of “we’re going to invest $1 billion in our cloud offering” announcements a couple of years ago. Over time, it became clear that, while in traditional vendor businesses $1 billion was a lot of money, in the CSP business it isn’t even table stakes.
This is a market where the rich get richer. While Prime Day is a huge undertaking, it points the way toward an entire range of business offerings that depend on, require, scale. Machine learning is an obvious example, but social, video, robots, medicine, and autonomous vehicles are other examples of computing-consuming offerings.
This is all part of the shift our economy and society is undergoing from atoms to bits, aka digital transformation. AMG is smart enough, and rich enough, to observe this trend and aggressively prepare to be dominant vendors in the new digital world.