The evolution of cloud adoption
When I first started consulting in the cloud computing space, most of the client requests were for either a TCO (total cost of ownership) or an ROI analysis, for a cloud initiative or overall cloud strategy. Many leaders had a hard sell to convince their CEO and board that cloud computing was the way forward. The cloud, like virtualization before it, was viewed as a cost-saving measure achieved by better utilization of resources. At that time, about 80% of the requests were focusing on private cloud while only 20% were for the public cloud, almost exclusively AWS. In November 2013, at the annual re:Invent conference, AWS announced a wide variety of enterprise-grade security features. Almost immediately their phone rang off the hook with clients asking for public cloud implementations. A year later our work requests were completely flipped, with over 80% for public cloud and 20% for private cloud.
Why the private cloud isn’t
From 2005-2012, many large enterprises focused their cloud efforts on building a private cloud. Security and regulatory uncertainty made them believe they needed to retain complete control of their computing environments. The traditional hardware vendors were more than happy to condone this point of view: “Yes, buy all of our latest stuff and you will be able to able to gain all the advantages you would get by going to a public cloud!” Quite a few Fortune 500 companies invested hundreds of millions of dollars to build world-class data centers they promised would, through the power of virtualization and automation, deliver the same benefits as going to the cloud. While such efforts were often declared successes (they did tend to save money on hardware), they fell well short of turning the company into the next Uber or Amazon.
We have seen some companies jump all-in to the public cloud with positive results. While the adoption of public cloud increased, companies moved or built new workloads in the cloud at rates much faster than they had traditionally deployed software. However, two common antipatterns emerged.
Developers, business units and product teams now had access to on demand infrastructure and leveraged the cloud to get product out the door faster than ever. Since cloud was new to the organization, there was no set of guidelines or best practices. Development teams were now taking on many responsibilities that they never had before. They were delivering value to their customers faster than ever before, but often exposing the organization to more security and governance risks than before and delivering less resilient products. Another issue was that each business unit or product team was reinventing the wheel: buying their favorite logging, monitoring, and security third party tools. They each took a different approach to designing and securing the environment and often implemented their own CI/CD toolchains, with very different processes.
Management, infrastructure, security, and/or GRC teams put the brakes on access to the public cloud. They built heavily locked down cloud services and processes that made developing software in the cloud cumbersome, destroying one of the key value propositions of the cloud—agility. We have seen companies take 3-6 months to provision a virtual machine in the cloud, something that should only take 5-10 minutes. The command and control cops would force cloud developers to go through the same ticketing and approval processes required in the datacenter. These processes were often decades old, designed when deployments occurred 2-3 times a year and all infrastructure was physical machines owned by a separate team. Ken relates this story from his very early experiences with cloud adoption:
The company had decided on an aggressive cloud adoption plan. Being a large SAP shop, they were very excited about the elastic nature of the cloud. SAP environments across testing, staging, development, etc. can be very expensive, so there are rarely as many as teams would like. Shortly after the non-production environments moved to the cloud, I had the cloud infrastructure team show me how they had automated the provisioning of an entire SAP environment. What had previously taken months could now be done in a few hours! With excitement in my stride, I strolled over to the testing team.
Shared Responsibility: The Datacenter Mindset versus the Cloud Mindset
A common mistake that companies make is they treat the cloud just like a datacenter. They think in terms of physical infrastructure instead of leveraging cloud infrastructure as a utility, like electricity or water. There are two major capabilities that are different in the cloud when compared to most on-premises infrastructures. First, in the public cloud, everything is software defined and software addressable (that is, it has an API). This creates an incredible opportunity to automate, streamline, and secure the systems. While software-defined everything has made significant strides in the datacenter in the last decade, most of our clients still have major components that must be configured and cared for manually. The second major difference in the public cloud is the inherent design for multi-tenancy. This “from the ground up” view of multi-tenancy has driven a great level of isolated configuration in the cloud.
Here’s an example. In most companies, there are one or two engineers who are allowed to make DNS changes. Why is that? Because the tooling we often use on-premises does not isolate workloads (or teams) from each other. This means that if we let Joe manage his own DNS, he might accidentally change Sue’s DNS, causing disruption. So we have made sure that only David and Enrique are allowed to change DNS for everyone in the whole company. In contrast, in the cloud, everyone’s accounts are naturally isolated from each other. Joe can have full authority over his DNS entries while he might not even be able to browse, let alone change, Sue’s entries. This core difference is often overlooked and is one of the key facets that allows for self-service capability in the public cloud.
Enterprises who have been building and running datacenters for many years often have a challenge shifting their mindset from procuring, installing, maintaining, and operating physical infrastructure to a cloud mindset where infrastructure is consumed as a service. (Randy Bias has memorably described the difference between cloud and physical servers as being like the difference between pets and livestock and pets; one is named and cared for personally, the other is numbered and replaceable.)
You might also think of An analogy we like to use is buying a house versus renting onea house. The analogy really boils down to assets that are purchased versus assets that are rented and the responsibilities that go along with each. When you buy a house, you are investing in both property and physical structure(s) on that property. You are responsible for maintaining the house, landscaping, cleaning, and everything else that comes with home ownership. When you rent, you are paying for the time that you inhabit the rental property. It is the landlord’s responsibility to maintain it. The biggest difference between renting and buying is what you, as the occupant of the house, have control over. (And just as people get more emotionally attached to their owned homes than to their rented apartments, plenty of infrastructure engineers have true emotional attachments to their servers and storage arrays.)
When you leverage the cloud, you are renting time in the cloud provider’s “house.” What you control is very different than what you control in your own datacenter . For people who have spent a lot of their career defining, designing, and implementing processes and technologies for the controls they are responsible for in their datacenter, shifting some of those controls to a third party can be extremely challenging
The two groups who probably struggle the most to grasp the cloud shared-responsibility model are auditors and GRC teams. These teams have processes and controls for physically auditing datacenters. When you pause to think about it, physically evaluating a datacenter is a bit of a vestigial process. Sure, 50 years ago nearly all IT processes (including application development) probably happened in the datacenter building, but today, many datacenters run with skeleton crews. IT processes are distributed in many locations, often globally. But the auditors expect to be able to apply these exact processes and controls in the cloud. The problem is, they can’t. Why? Because these datacenters belong to the cloud service providers (CSPs), who have a duty to make sure your data is safe from their other clients’ data. Would you want your competitor walking on the raised floor at Google where your software is running? Of course not. That’s just one simple example.
At one meeting we attended, a representative of one of the CSPs was explaining how they handle live migrations of servers that they can run at any time during the day with no impact to the customers. The client was adamant about getting all of the CSP’s logs to feed into their company’s central logging solution. With the shared-responsibility model, the CSP is responsible for logging and auditing the infrastructure layer, not the client. The client was so used to being required to store this type of information for audits that they simply would not budge. We finally had to explain that in the new shared responsibility model, that data would no longer be available to them. We asked where they stored the logs for failed sectors in their disk array and how they logged the CRC (error correction) events in their CPU. Of course, they didn’t.
We explained to the client that they would have to educate their audit team and adjust their processes. To be clear, the policy that required the client to store those logs is still valid. How you satisfy that policy in the cloud is completely different. If the auditors or GRC teams cannot change their mindset and come up with new ways to satisfy their policy requirements, they might as well not go the public cloud. But does an auditor of a GRC team really want to hold an entire company back from leveraging cloud computing? Should the auditor be making technology decisions at all? A key task in the cloud modernization journey is the education of these 3rd party groups that have great influence in the enterprise. As technology becomes more capable and automated, the things that we have to monitor will change—because the risk profile has changed fundamentally.
In the datacenter world, teams are traditionally organized around skill domains as they related to infrastructure. It is common to find teams responsible for storage, for network, for servers, for operating systems, for security, and so forth. In the cloud, much of this infrastructure is abstracted and available to the developers as an API call. The need to create tickets to send off to another team to perform a variety of tasks to stand up physical infrastructure like a SAN (storage area network) simply does not exist in the public cloud. Developers have access to storage as a service and can simply write code to provision the necessary storage. This self-service ability is crucial to enabling one of the prizes of cloud transformation: higher-velocity IT.
Networking teams in the datacenter leverage third-party vendors who provide appliances, routers, gateways, and many other important tools required to build a secure, compliant, and resilient network. Many of these features are available as a service in the cloud. For areas where the cloud providers don’t provide the necessary network security functionality, there are many third-party SaaS or pay-as-you-go solutions available, either directly from the vendor or from the CSP’s marketplace. Procuring these solutions in the cloud when they are consumed as SaaS, PaaS, or IaaS is different than how similar tools in the datacenter are procured. In the public cloud, there are usually no physical assets being purchased. Gone are the days of buying software and paying 20-25% of the purchase price for annual maintenance. In the cloud you pay for what you use, and pricing is usually consumption based.
Use what you have vs. Use what you need
Before cloud computing was an option, almost all of the development we were involved in was deployed within the datacenters that our employers and clients owned. Each piece of the technology stack was owned by specialists for that given technology. For databases, there was a team of DBAs (database administrators) who installed and managed software from vendors like Oracle, Microsoft, Neteeza, and others. For middleware, there were system administrators that installed and managed software like IBM’s Websphere, Oracle’s Weblogic, Apache Tomcat, and others. The security team owned various third-party software solutions and appliances. The network team owned a number of both physical solutions and software solutions and so forth. Whenever development wanted to leverage a different solution from what was offered in the standard stack, it took a significant amount of justification for the following reasons:
The solution had to be purchased up front.
The appropriate hardware had to be procured and implemented.
Contractual terms had to be agreed upon with the vendor.
Annual maintenance fees had to be budgeted for.
Employees and/or consultants needed to be trained or hired to implement and manage the new stack component.
Adopting new stack components in the cloud, if not constrained by legacy thinking or processes, can be accomplished much quicker, especially when these stack components are native to the CSP. Here are some reasons why:
No procurement is necessary if solution is available as a service.
No hardware purchase and implementation are necessary if the service is managed by the CSP.
No additional contract terms should be required if the proper master agreement is set up with the CSP.
There are no annual maintenance fees in the pay-as-you go model.
The underlying technology is abstracted and managed by the CSP, so the new skills are only needed at the software level (how to consume the API, for example).