There is a story that has been going around about a physicist, a chemist, and an economist who were stranded on a desert island with no implements and a can of food. The physicist and the chemist each devised an ingenious mechanism for getting the can open; the economist merely said, “Assume we have a can opener”!
This has come to be known in the canon as “assume a can opener” — a catchphrase used to mock economists and other theorists who base their conclusions on impractical or unlikely assumptions.
Another place impractical assumptions are used to justify unwarranted conclusions is the field of cloud computing — specifically, the widely proclaimed notion that, while public cloud computing is all well and good, it’s really inferior to what can be accomplished with a company’s own infrastructure.
The only problem is, once one digs into the proclamation, it’s invariably based on unspoken and specious assumptions — and once one examines the assumptions, the entire notion crumbles, shown up for what it is: reasoning designed to rationalize a technology choice that restricts future flexibility and business opportunity.
Here are the five most common private infrastructure “can opener” assumptions:
I’ve heard countless people maintain that they can implement a private cloud that is cheaper than a public alternative. Many claim they have case studies and TCO evaluations that prove their point.
I happen to think the issue of cloud economics is incredibly important. In fact, for all the talk about how public cloud computing provides agility, I believe its real benefit is economic: agility at a reasonable price.
Agility was always within reach of IT organizations; it just would have taken an army of sys admins and a ton of pre-provisioned infrastructure, which was so unfathomable that no one ever fantasized it. Public cloud computing made previously unaffordable agility into something available to everyone.
So if someone can run a private cloud cheaper than AMG, obviously it’s a more desirable option.
The only problem is this: whenever I dig into the business cases that “prove” private cost advantage, inevitably they are based on a couple of racks of hyperconverged servers on which the private cloud is installed, while everything required to make those servers work is left out of the economic analysis.
This approach assumes the following are available at no charge:
And that doesn’t even begin to address required capital, the cost of capital, and the opportunity cost of deploying the capital to a private cloud and not, say, to geographic expansion, increased marketing, or product/service innovation.
Any economic analysis that leaves out a significant part of the solution’s cost structure is hopelessly flawed. Of course, this shouldn’t be surprising, as IT economics are typically horrendously managed, with costs commonly assigned to different groups with no shared pooling to allow a “single view of truth.”
Five years ago, Zynga was the poster child for private cloud advocates. While it got started using AWS, eventually it decided that it could run its own cloud cheaper. I can’t tell you how many presentations I sat through from legacy vendors who cited Zynga as proof as to why enterprises should build their own cloud, using, of course, technology from the legacy vendor.
Today, Zynga no longer serves that role (it kind of fell off the private cloud pedestal). Instead, the shining example of a company that built its own private cloud and saved a ton of money is Dropbox.
In the company’s recent IPO filing, the company stated it had saved $75 million by building its own storage infrastructure. This led to a frenzy of articles about how companies could save money with DIY — like here, here, here, here, here, here, and here.
A huge sigh of relief. A new champion of running one’s own infrastructure has come to the fore, thereby assuaging any anxiety for companies questioning their ongoing commitment to on-prem tin.
The only problem with this scenario? All those quick to praise Dropbox’s decision failed to drill down into what bringing it off required. A Wired article covering Dropbox’s migration noted that the guy hired to run the project turned down a job on Google’s Spanner database team. I used to run part of a database company’s engineering team, and I am in awe of Spanner. It’s amazing technology, and a wonder in its own right. And one of that team built Dropbox’s infrastructure.
Running a cloud environment is not trivial. Creating a control plane that can operate services and keep them robust in the face of infrastructure failure is incredibly difficult. Managing infrastructure redundancy to ensure availability requires world-class operations — not to mention effective cost tracking to ensure efficiency (see assumption one, above). Deploying and updating individual services like managed databases and reliable queues means the organization needs sophisticated software chops.
In other words, world-class talent. If you can’t access, effectively recruit, afford to pay, and motivate with difficult engineering challenges, you aren’t going to attract that kind of talent.
Another reason put forward as to why IT organizations should run their own infrastructure is that “most applications don’t need cloud capabilities.”
This usually takes the form of acknowledging that cloud computing offers great scalability, supports elasticity, and allows immediate access to resources, but goes on to state that, really, most applications don’t fit that profile.
Just this week, Michael Dell offered the perfect illustration of this attitude, when he said that for “predictable workloads – which are for most companies 85 percent to 90 percent of their workloads – an on-premises solution is much more cost effective.”
In other words, the world of IT consists of two types of applications: a minority of unruly ones, that experience erratic workloads and thereby need that public cloud stuff, and well-behaved ones that chug along with consistent operation, and can be more cheaply run on-prem (and yes, the article goes on to cite Dropbox as proving the thesis).
I’ve heard this argument time and again. I will make one observation: I’ve never seen the putative perfect application in the real world.
Instead, I’ve seen applications that consume far more resources than predicted. I’ve seen operations teams struggle to find spare capacity for QA and staging. I’ve seen application downtime because a server breaks and it takes three days to get a part out to the location.
Simply put: the perfect application doesn’t exist. We live in a messy world littered with unpredictability, failure, and limited budget. Rather than wishing for a perfect world where nothing ever goes wrong and then treating inevitable problems as out-of-band disruptions, a much better approach is to use an operational environment resilient to these problems, i.e., a public cloud.
There’s an even worse aspect to this “perfect application” world. It inevitably buttresses a continuation of traditional application design and operational approaches. Monolith applications with poor code management/integration and extended deployment periods, accompanied by frequent manual rollbacks and finger-pointing.
This forces groups that want to build modern, digital apps to create their own application pipeline and development processes — so you end up with two different sets of IT practices. If this sounds to you like Gartner’s bi-modal IT model, you’d be right.
The very same people who proclaim that most applications don’t need that cloud thing express horror at a bifurcated IT world, but fail to recognize this world is a direct outgrowth of maintaining traditional application assumptions and practices.
This “can opener” is the logical complement to the last assumption. Many people talk about applications as if all they contain is homegrown code.
While self-developed code is critical to the business value of an application, in most modern applications homegrown software makes up a small percentage of all the code in an application. The rest is components, utilities, or integrated services.
Cloud providers often provide this kind of supporting functionality as services. Example include managed databases (like Spanner), push notifications, message queues, key management, and the like. And, of course, the services are integrated with the rest of the cloud’s offerings, reducing “glue” work.
IT organizations that want to run an application on their own infrastructure take on responsibility for the supporting pieces as well. Obviously that’s a lot of work — and it can be a much larger burden than it might seem on first glance.
Many of the pieces are open source, which require installation, configuration, patching, and integration with other application components. This forces the IT organization to devote expensive talent to low-value engineering. In effect, the organization needs to act as its own distro creator for every piece of open source in the application stack. And system integrator to wire all of them together.
The typical application of today is assembled by linking together a ton of code components, with business logic inserted where it adds value. The question confronting IT organizations is how much responsibility to take on for the supporting componentry. The larger question is how best to apply limited talent (see assumption 2 above) to applications and allow it to focus on the most critical parts of an application — the business logic. Everything else is plumbing, and one must question how much value there is in applying scarce talent to plumbing.
Underlying all of these assumptions is a vision of technology as a world in stasis. Write some code for an application that doesn’t change much, runs in a satisfyingly even load pattern, doesn’t require a lot of other components and integration, leverages Olympian talent, and operates in a non-cost allocated world.
Unfortunately, the reality is that technology is undergoing an explosion of change. I’m a huge fan of Ray Kurzweil, whose theory of accelerating returns posits we are in the early stages of exponential change, with Moore’s Law levels of improvement bleeding into every aspect of our world.
The exemplar of this is machine learning — applying analytical techniques to large bodies of data to winkle out patterns and provide predictions. Machine learning gets better as more data is available for analysis. And in the future, combining data sets to increase predictive power will improve machine learning even further.
The already-mentioned Spanner is another example of changing software technology. It uses dedicated fiber optic lines and atomic clocks placed in data centers to provide hitherto unimaginable relational database scaling and distribution.
For an IT leader to operate as if the right path forward is to operate one’s own infrastructure isn’t just wrong, it irresponsible. That approach will consign his or her organization the field of IT also-rans. Still worse, it will leave the overall company incapable of competing in the emerging elements of its industry, where innovation and margin are located.
So, the next time someone confidently proclaims that all of that cloud stuff is fine, but mostly the organization should run its own infrastructure because it’s the best place for most applications, remember the humble can opener and the benighted economist.