As cloud computing becomes the go-to platform for hosting modern software workloads, organizations are confronting a new wrinkle in their cloud strategy options: being cloud agnostic. Thinking about your existing cloud workloads, how easy would it be to move them to a different platform or provider? Would it be possible without excessive engineering labor and manual refactoring? Even if it's possible, is it a good idea to aim for cloud agnosticism?
There are some subtle differences between being cloud agnostic and being multicloud.
Multicloud means an organization simply utilizes multiple cloud providers, regardless of the context. For example, a company might run their primary compute workloads on Google's GKE while storing analytics reports in AWS S3. That company can now say with certainty that they are a multicloud organization, but it doesn't mean they can shift their workloads from one provider to another.
Being cloud agnostic refers to an organization’s ability to migrate their primary production workloads to another cloud provider. This is more complicated than it sounds.
For instance, a high-throughput data processing application that's deployed to AWS and depends on Amazon Kinesis Data Streams (KDS) for its streaming data can't easily be migrated to another cloud provider, as that service is proprietary and unique to AWS. Significant portions of the architecture would have to be redesigned to become cloud agnostic. Conversely, an application that exists primarily in Docker containers could be deployed to multiple platforms as a cloud-agnostic solution without excessive boilerplate configuration, as most providers have multiple standardized options for container orchestration and deployment.
Now that you understand a bit about what cloud agnostic means, let’s look at why companies want to be cloud agnostic.
Although a cloud-agnostic infrastructure requires a larger upfront investment in terms of design and planning, it yields several benefits that generally fall into two categories: choice and risk.
The benefit of choice empowers engineering organizations to make decisions about their infrastructure based on the functionality and offerings that best suit their needs. In the context of vendors, cloud-agnostic workloads mean an organization possesses vendor independence, and can migrate their workloads elsewhere in the face of price increase or other issues. The primary workloads can run on any platform with minimal overhead.
The benefit of risk refers to the way that cloud-agnostic architectures act as a kind of safety valve, allowing organizations to make a quick pivot to another platform if the need arises. If any one vendor experiences an outage, security compromise, or an overall degradation of performance, application workloads can be moved seamlessly.
Despite the obvious benefits, there are also challenges to consider before implementing a cloud-agnostic design.
Some of the challenges discussed below might not be clear until you’ve started to implement a cloud-agnostic design. Of course, at that point, you’ll have to switch to firefighting mode. Considering some of the primary challenges early on will allow for the development of a cloud-agnostic strategy that minimizes the impact of any issues that do arise.
Identity and Access Management (IAM) serves as both a general term and the exact name of the service offered by AWS. At a high level, IAM is a system meant to control access to cloud resources. It utilizes concepts like roles and policies to define specific access and authorization configurations, thus ensuring that users can only access the appropriate resources in the appropriate contexts.
Unfortunately, each cloud provider has a different implementation of IAM. For example, while AWS utilizes a role-based access control (RBAC) model for AWS IAM, via roles and JSON policy objects, GCP depends on Google accounts as its foundational layer. Even if an application is deployed using purely cloud-agnostic tools and concepts, it will still have some lock-in dependency to the underlying IAM system of the given provider. This means that migrating workloads will require either refactoring IAM configurations or designing some kind of shim or abstraction layer.
Going cloud agnostic could also potentially mean your organization misses out on the operational economies of scale offered by cloud providers. There are still a multitude of advantages to be gained by going with a cloud-native application versus sticking with your traditional, legacy infrastructure. But if you decide against a managed service, the provider won’t be there to do the operational heavy lifting, removing the burden of SLAs and uptime from your in-house operations staff.
There are countless hours of engineering design and development poured into every managed cloud service, which allows them to offer an economy of scale and level of performance that most enterprise organizations could not match with their own infrastructure.
Engineering acumen is a "people problem" that might not present itself until your engineering team has to scale their deployment—and headcount to match. If a cloud-agnostic design is a pivot from an existing single-provider architecture, suddenly your engineering staff only has, at best, 50% of the knowledge needed to effectively operate and develop your production software infrastructure. This has the potential to lead to additional costs in hiring, training, and potential turnover, as some engineers simply prefer to focus on one knowledge domain. In the meantime, operational excellence metrics like SLA could suffer.
If you decide to adopt a cloud-agnostic strategy, you will need to be selective in what technologies and services you adopt for a given architecture or application stack. Even choosing to go with a managed service doesn't preclude compatibility with being cloud agnostic; several managed services offer the same API and interfaces of the FOSS tools they were built on.
Although it's not an iron-clad requirement, one of the primary ways to empower cloud agnosticism is to package the primary application code in some kind of container, such as Docker. The most popular container runtimes are now widely accepted as industry standards and can be packaged and deployed to a variety of services and orchestration platforms. The Kubernetes platform highlights the flexibility of this strategy.
Assuming an application is developed and run in containers, it can be deployed to a Kubernetes cluster just about anywhere. Taking AWS and GCP as the primary examples, they both offer ways to run self-managed Kubernetes clusters on top of vanilla computing resources, or managed hosting options in EKS and GKE, respectively. Further expanding on the managed theme, Google now offers Anthos, a managed Kubernetes cluster that can be deployed both in cloud and on-premises, providing a bridge for enterprise organizations that are still in the early stages of their cloud adoption path.
For overall deployment of infrastructure, choosing an infrastructure as code (IaC) tool like Terraform not only adheres to the principles of cloud-native architecture, it can also provide a centralized abstraction layer for creating cloud-agnostic architecture. While cloud providers offer their own IaC tools, like GCP Deployment Manager and AWS Cloudformation, they are only capable of provisioning resources for their host platforms. Terraform is open-source and has provider extensions for a broad variety of cloud providers and platforms. Once an engineering team skills up on the syntax of Terraform's HCL language, they can write automation for any provider that Terraform supports.
If your organization is willing to deal with the added complexity and upfront investment of developing a cloud-agnostic strategy, you’re likely to see benefits over a longer time horizon. You will be uniquely positioned to quickly mitigate the operational impact of any single provider outage or security issue, weathering the storm while non-agnostic competitors lose valuable traffic and trust.
On the other hand, your team might be better served taking advantage of provider-specific services, eschewing long-term benefits for fast iteration and a quicker time-to-value cycle. If you have a smaller development team without a dedicated operations staff, you can still immediately capitalize on the reduced operational overhead offered by managed services.
Ultimately, the decision to go cloud agnostic depends on a balance of short-term and long-term goals, as well as your engineering resources and plans. Smaller teams would do well to utilize managed services, allowing them to go to market as quickly as possible. Conversely, larger enterprise organizations can bring their considerable resources to bear, crafting a cloud-agnostic design that will provide longer-term benefits.