As container adoption continues to grow, we thought it’d be interesting to take a look at the hosted Kubernetes pricing options from each of the big three cloud providers. The Kubernetes services across the cloud providers are Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), and Google Kubernetes Engine (GKE). In this blog, we’ll take a closer look at each offering, their similarities and differences, and their pricing. Note: all pricing data points are as of this writing in April 2020.
AWS Cloud Container Services (EKS)
Amazon Elastic Kubernetes Service (Amazon EKS) is AWS’s service to manage and deploy containers via Kubernetes container orchestration service. Pricing is $0.10 per hour for each EKS cluster you create – you can use a single cluster to run multiple applications using Kubernetes namespaces and IAM security policies. You can run EKS on AWS using EC2 or Fargate. You can also run on-prem with AWS Outposts.
If you use EC2, you would pay for the resources you created to run your Kubernetes worker nodes. This is on demand: you only pay for what you use, as you use it. You pay per cluster and underlying resource. If you choose to run your EKS clusters on Fargate, it will remove the need to provision and manage servers. With Fargate you can specify and pay for resources per application – pricing is based on the vCPU and memory resources used from the time you start to download your container image until the Amazon EKS pod terminates (minimum 1-minute charge).
EKS worker nodes are standard Amazon EC2 instances – you are billed for them based on normal EC2 prices.
With the $0.10 price per hour, you’d be spending $72 per month for a cluster running the full month. It’s important to note that this price is the necessary cost to operate a cluster note – you still have to pay for the computation costs on top of this (e.g. EC2 instance hours or Fargate compute resources).
Azure Cloud Container Services (AKS)
Azure Kubernetes Service (AKS) is Azure’s free fully managed solution to manage and deploy containers via Kubernetes container orchestration service. You pay only for the VM instances, storage, and networking resources used for the Kubernetes cluster, per-second, with no additional charge.
How can you save on VM nodes using AKS? Use Reserved VM instances. (Check out the 10 things you should know before purchasing an Azure Reserved VM instances). You can pay up-front for either a 1- or 3-year term. For example, if you look at a D4 v3 node instance, the pay as you go price per hour is $0.192, a commitment of 1-year the price would be $0.1145, and a 3-year commitment would be $0.0738. So with a 1-year commitment, you would see savings of about 40% and for a 3-year commitment, you’d see savings of about 62%.
Unlike AWS and GCP, there is no charge for cluster management. Users only pay for the nodes that the containers are built on.
Google Cloud Container Services (GKE)
Google Kubernetes Engine (GKE) is Google Cloud’s fully managed solution to manage and deploy containers via Kubernetes container orchestration service. A GKE environment is made up of multiple machines grouped together to form a cluster. In GKE, a cluster is the foundation. A cluster consists of at least one cluster master and multiple worker machines called nodes.
Starting in June 2020, GKE will charge $0.10 per cluster per hour as a cluster management fee. This fee applies to clusters of any size (one zonal cluster per billing account is free). The fee will not apply to Anthos GKE clusters.
How will this fee affect your bill? If you have the smallest instance size, (1vCPU, 3.75RAM) you will be paying about $73 per cluster per month.
Google will also be introducing a service level agreement for GKE that guarantees 99.95% availability for regional clusters and 99.5% for zonal clusters.
Cloud Pricing Comparison Chart
For the example below, assume you need 80 CPU and 320 GB RAM for 1 year to run your cluster. You’d need 20 of each instance giving you 175,200 compute hours per year. Here’s what that would look like across the cloud providers.
So, EKS is the most expensive, but only by about 5% per year from the least expensive option, GKE.
Overall, AWS is the most popular cloud to run containers and Kubernetes. According to a survey from Forbes, 29% of the respondents use AWS EKS, 28% use Google GKE and 25% use Azure AKS.
Don’t forget about your free options! In Azure and Google Cloud, you have the option to start a free account on the platform and with that, you get access to AKS and GKE for 12 months for free.
Do you know the difference between Azure “deallocate VM” and “stop VM” states? They are similar enough that in conversation, I’ve noticed some confusion around this distinction.
If your VM is not running, it will have one of two states – Stopped, or Stopped (deallocated). Essentially, if something is “allocated” – you’re still paying for it. So while deallocating a virtual machine sounds like a harsh action that may be permanently deleting data, it’s the way you can save money on your infrastructure costs and eliminate wasted Azure spend with no data loss.
Azure’s Stopped State
When you are logged in to the operating system of an Azure VM, you can issue a command to shut down the server. This will kick you out of the OS and stop all processes, but will maintain the allocated hardware (including the IP addresses currently assigned). If you find the VM in the Azure console, you’ll see the state listed as “Stopped”. The biggest thing you need to know about this state is that you are still being charged by the hour for this instance.
Azure’s Deallocated State
The other way to stop your virtual machine is through Azure itself, whether that’s through the console, Powershell, or the Azure CLI. When you stop a VM through Azure, rather than through the OS, it goes into a “Stopped (deallocated)” state. This means that any non-static public IPs will be released, but you’ll also stop paying for the VM’s compute costs. This is a great way to save money on your Azure costs when you don’t need those VMs running, and is the state that ParkMyCloud puts your VMs in when they are parked.
Which State to Choose?
The only scenario in which you should ever choose the stopped state instead of the deallocated state for a VM in Azure is if you are only briefly stopping the server and would like to keep the dynamic IP address for your testing. If that doesn’t perfectly describe your use case, or you don’t have an opinion one way or the other, then you’ll want to deallocate instead so you aren’t being charged for the VM.
If you’re looking to automate scheduling when you deallocate VMs in Azure, ParkMyCloud can help with that. ParkMyCloud makes it easy to identify idle resources using Azure Metrics and to automatically schedule your non-production servers to turn off when they are idle, such as overnight or on weekends. Try it for free today to save money on your Azure bill!
Google Cloud has always had a knack for non-standard virtual machines, and their option of creating Google preemptible VMs is no different. Traditional virtual machines are long-running servers with standard operating systems that are only shut down when you say they can be shut down. On the other hand, preemptible VMs last no longer than 24 hours and can be stopped on a moment’s notice (and may not be available at all). So why use them?
Google Cloud Preemptible VM Overview
Preemptible VMs are designed to be a low-cost, short-duration option for batch jobs and fault-tolerant workloads. Essentially, Google is offering up extra capacity at a huge discount – with the tradeoff that if that capacity is needed for other (full-priced) resources, your instances can be terminated or “preempted”. Of course, if you’re using them for batch processing, being preempted will slow down your job without completely stopping it.
You can create your preemptible VMs in a managed instance group in order to easily manage a collection of VMs as a single entity – and, if a VM is preempted, the VM will be automatically recreated. Alternatively, you can use Kubernetes Engine container clusters to automatically recreate preempted VMs.
Preemptible VM Pricing
Pricing is fixed, not variable, and you can view the preemptible price alongside the on demand prices in Google’s compute pricing list and/or pricing calculator. Prices are 70-80% off on demand, and upward of 50% off even compared to a 3-year committed use discount.
Google does not charge you for instances if they are preempted in the first minute after they start running.
Note: Google Cloud Free Tier credits for Compute Engine do not apply to preemptible instances.
Use Cases for Google Preemptible VMs
As with most trade-offs, the biggest reason to use a preemptible VM is cost. Preemptible VMs can save you up to 80% compared to a normal on-demand virtual machine. (By the way – AWS users will want to use Spot Instances for the same reason, and Azure users can check out Low Priority VMs). This is a huge savings if the workload you’re trying to run consists of short-lived processes or things that are not urgent and can be done any time. This can include things like financial modeling, rendering and encoding, and even some parts of your CI/CD pipeline or code testing framework.
How to Create a Google Preemptible VM
To create a preemptible VM, you can use the Google Cloud Platform console, the ‘gcloud’ command line tool, or the Google Cloud API. The process is the same as creating a standard VM: you select your instance size, networking options, disk setup, and SSH keys, with the one minor change that you enable the ‘preemptible’ flag during setup. The other change you’ll want to make is to create a shutdown script to decide what happens to your processes and data if the instance is stopped without your knowledge. This script can even perform different actions if the instance was preempted as opposed to shut down from something you did.
One nice benefit of Google preemptible VMs is the ability to attach local SSD drives and GPUs to the instances. This means you can get added extensibility and performance for the workload that you are running, while still saving money. You can also have preemptible instances in a managed instance group for high scalability when the instances are available. This can help you process more of your jobs at once when the preemptible virtual machines are able to run.
FAQs About Google Preemptible Instances
How long do GCP preemptible VMs last?
These instances can last up to 24 hours. If you stop or start an instance, the 24-hour counter is reset because the instance transitions into a terminated state. If an instance is reset or other actions that keep it in a running state, the 24-hour clock is not reset.
Is pricing variable?
No, pricing for preemptible VMs is fixed, so you know in advance what you will pay.
What happens when my instance is preempted?
When your instance is preempted, you will get a 30 second graceful shutdown period. The instance will get a preemption notice in the form of an ACPI G2 Soft Off signal. You can use a shutdown script to complete cleanup actions before the instance stops. If an instance does not stop after 30 seconds, it will get an ACPI G3 Mechanical Off signal to the operating system, and terminate it. You can practice what this looks like by stopping the instance.
By using managed instance groups, you can automatically recreate your instances if capacity is available.
How often are you actually preempted?
Google reports an average preemption rate from 5-15% per day per project, with occasional spikes depending on time and zone. This is not a guarantee, though, and you can be preempted at any time.
How does Google choose which instances to preempt?
Google avoids preempting too many instances from a single customer, and preempts new instances over older instances whenever possible – this is to avoid losing work across your cluster.
How to Use Google Preemptible VMs to Optimize Costs
Our customers who have the most cost-effective use of Google resources often mix Google preemptible VMs with other instance types based on the workloads. For instance, production systems that need to be up 24/7 can buy committed-use discounts for up to 57% savings on those servers. Non-production systems, like dev, test, QA, and staging, can use on-demand resources with schedules managed by ParkMyCloud to save 65%. Then, any batch workloads or non-urgent jobs can use Google preemptible instances to run whenever available for up to 80% savings. Questions about optimizing cloud costs? We’re happy to help – email us or use the chat client on this page (staffed by real people, including me!).
Further reading on Google Cloud cost optimization:
More than 90% of organizations will use public cloud services this year, fueled by record cloud computing growth. In fact, public cloud customers will spend more than $50 billion on Infrastructure as a Service (IaaS) from providers like AWS, Azure, and Google. While this growth is due in large part to wider adoption of public cloud services, much of it is also due to growth of infrastructure within existing customers’ accounts. Unfortunately, the growth in spending often exceeds the growth in business. That’s because a huge portion of what companies are spending on cloud is wasted.
Cloud Computing Growth in 2020
Before we get to the waste, let’s look a little closer at that growth in the cloud market. Gartner recently predicted that cloud services spending will grow 17% in 2020, to reach $266.4 billion.
While Software as a Service (SaaS) makes up the largest market segment at $116 billion, the fastest growing portion of cloud spend will continue to be Infrastructure as a Service (IaaS), growing 24% year-over-year to reach $50 billion in 2020.
Typically, we find that about ⅔ of enterprise’s average public cloud bill is spent on compute, which means about $33.3 billion this year will be spent on compute resources.
Unfortunately, this portion of a cloud bill is particularly vulnerable to wasted spend.
Growth of Cloud Waste
As cloud computing growth continues and cloud users mature, you might hope that this $50 billion is being put to optimal use. While we do find that cloud customers are more aware of the potential for wasted spending than they were just a few years ago, this does not seem to be correlated with cost optimized infrastructure from the beginning – it’s simply not a default human behavior. We frequently run potential savings reports for companies interested in using ParkMyCloud, to find out whether or not they will benefit from using the product. Invariably, we find wasted spend in these customers’ accounts. For example, one healthcare IT provider was found to be wasting up to $5.24 million annually on their cloud spend, an average of more than $1,000 per resource per year.
Here’s where the total waste is coming from:
Idle resourcesare VMs and instances being paid for by the hour, minute, or second, that are not actually being used 24×7. Typically, these are non-production resources being used for development, staging, testing, and QA. Based on data collected from our users, about 44% of their compute spend is on non-production resources. Most non-production resources are only used during a 40-hour work week, and do not need to run 24/7. That means that for the other 128 hours of the week (76%), the resources sit idle, but are still paid for.
So, we find the following wasted spend from idle resources:
$33.3 billion in compute spend * 0.44 non-production * 0.76 of week idle = $11 billion wasted on idle cloud resources in 2020.
Another source of wasted cloud spend is overprovisioned infrastructure — that is, paying for resources are larger in capacity than needed. That means you’re paying for resource capacity you’re rarely, or never, using.
About 40% of instances are sized at least one size larger than needed for their workloads. Just by reducing an instance by one size, the cost is reduced by 50%. Downsizing by two sizes saves 75%.
The data we see in ParkMyCloud’s users’ infrastructure confirms this, and in the problem may be even larger. Infrastructure managed in our platform has an average CPU utilization of 4.9%. Of course, this could be skewed by the fact that resources managed in ParkMyCloud are more commonly for non-production resources. However, it still paints a picture of gross underutilization, ripe for rightsizing and optimization.
If we take a conservative estimate of 40% of resources oversized by just one size, we find the following:
$33 billion in compute spend * 0.4 oversized * 0.5 overspend per oversized resource = $6.6 billion wasted on oversized resources in 2020.
The Extent of Wasted Cloud Spend
Between idle and overprovisioned resources alone, that’s $17.6 billion in cloud spend that will be completely wasted this year. And the potential is even higher. Other sources of waste include orphaned volumes, inefficient containerization, underutilized databases, instances running on legacy resource types, unused reserved instances, and more. Some of these result in significant one-off savings (such as deleting unattached volumes and old snapshots) whereas others can deliver regular monthly savings.
That’s a minimum of about $5 million wasted per day, every day this year, that could be reallocated toward other areas of the business.
It’s time to end wasted cloud spend. Join ParkMyCloud in taking a stand against it today.
As you accelerate your organization’s containerization in the cloud, key stakeholders may worry about putting all your eggs in one cloud provider’s basket. This combination of fears – both a fear of converting your existing (or new) workloads into containers, plus a fear of being too dependent on a single cloud provider like Amazon AWS, Microsoft Azure, or Google Cloud – can lead to hasty decisions to use less-than-best-fit technologies. But what if using more of your chosen cloud provider’s features meant you were less reliant on that cloud provider?
The Core Benefit of Containers
Something that can get lost in the debate about whether containerization is good or worthwhile is the feature of portability. When Docker containers were first being discussed, one of the main use cases was the ability to run the container on any hardware in any datacenter without worrying if it would be compatible. This seemed to be a logical progression from virtual machines, which had provided the ability to run a machine image on different hardware, or even multiple machines on the same hardware. Most container advocates seem to latch on to this from the perspective of container density and maximizing hardware resources, which makes much more sense in the on-prem datacenter world.
In the cloud, however, hardware resource utilization is now someone else’s problem. You choose your VM or container size and pay just for that size, instead of having to buy a whole physical server and pay for the entirety of it up-front. Workload density still matters, but is much more flexible than on-prem datacenters and hardware. With a shift to containers as the base unit instead of Virtual Machines, your deployment options in the cloud are numerous. This is where container portability comes into play.
The Dreaded “Vendor Lock-in”
Picking a cloud provider is a daunting task, and choosing one and later migrating away from it can have enormous impacts of lost money and lost time. But do you need to worry about vendor lock-in? What if, in fact, you can pivot to another provider down the road with minimal disruption and no application refactoring?
Implementing containerization in the cloud means that if you ever choose to move your workloads to a different cloud provider, you’ll only need to focus on pointing your tooling to the new provider’s APIs, instead of having to test and tinker with the packaged application container. You also have the option of running the same workload on-prem, so you could choose to move out of the cloud as well. That’s not to say that there would be no effort involved, but the major challenge of “will my application work in this environment” is already solved for you. This can help your Operations team and your Finance team to worry less about the initial choice of cloud, since your containers should work anywhere. Your environment will be more agile, and you can focus on other factors (like cost) when considering your infrastructure options.
There is a growing job function among companies using public cloud: the Enterprise Cloud Manager.We did a study on ParkMyCloud users which showed that a growing proportion of them have “cloud” or the name of their cloud provider such as “AWS” in their job title. This indicates a growing degree of specialization for individuals who manage cloud infrastructure as demonstrated by their cloud computing job titles. And, in some companies, there is a dedicated role for cloud management – such as an Enterprise Cloud Manager.
Why would you need an Enterprise Cloud Manager?
The world of cloud management is constantly changing and becoming increasingly complex even for the best cloud manager.Recently, the increased adoption of hybrid and multi-cloud environments by organizations to take advantage of best-of-breed solutions, make it more confusing, expensive, and even harder to control. If someone is not fully versed in this field, they may not always know how to handle problems related to governance, security, and cost control. It is important to dedicate resources in your organization to cloud management and related cloud job roles. This chart from Gartner gives us a look at all the things that are involved in cloud management so we can better understand how many parts need to come together for it to run smoothly.
Having a role in your organization that is dedicated to cloud management allows others, who are not specialized in that field, to focus on their jobs, while also centralizing responsibility. With the help of an Enterprise Cloud Manager, responsibilities are delegated appropriately to ensure cloud environments are handled according to best practices in governance, security, and cost control.
After all, just because you adopt public cloud infrastructure does not mean you have addressed any governance or cost issues – which seems rather obvious when you consider that there are sub-industries created around addressing these problems, but you’d be surprised how often eager adopters assume the technology will do the work and forget that cloud management is not a technological but a human behavior problem.
Cohesively, businesses with a presence in the cloud, regardless of their size, should also consider adopting the functionalities of a Cloud Center of Excellence (CCoE) – which, if the resources are available, can be like an entire department of Enterprise Cloud Managers. Essentially, a CCoE brings together cross-functional teams to manage cloud strategy, governance, and best practices, and serve as cloud leaders for the entire organization.
The role of an Enterprise Cloud Manager or cloud center of excellence (or cloud operations center or cloud enablement team, whatever you want to call it) is to oversee cloud operations. They know all the ins and outs of cloud managementso they are able to create processes for resource provisioning and services. Their focus is on optimizing their infrastructure which will help streamline all their cloud operations, improve productivity, and optimize cloud costs.
Moreover, the Enterprise Cloud Manager can systematize the foundation that creates a CCoE with some key guiding principles like the ones outlined by AWS Cloud Center of Excellence here.
With the Enterprise Cloud Manager leadership, DevOps, CloudOps, Infrastructure, and Finance teams within the CCoE can ensure that the organization’s diverse set of business units are using a common set of best practices to spearhead their cloud efforts while keeping balanced working relationships, operational efficiency, and innovative thinking needed to achieve organizational goals.
A Note on Job Titles
It’s worth noting that while descriptive, the “Enterprise Cloud Manager” title isn’t necessarily something widely adopted. We’ve run across folks with titles in Cloud Manager, Cloud Operations Manager, Cloud Project Manager, Cloud Infrastructure Manager, Cloud Delivery Manager, etc.
If you’re on the job hunt, we have a few other ideas for cloud and AWS jobs for you to check out.
Automation Tools are Essential
With so much going on in this space, it isn’t possible to expect just one person or a team to manage all of this by hand – you need automation tools. The great thing is that these tools deliver tangible results that make automation a key component for successful enterprise cloud operations and work for companies of any size. Primary users can be people dedicated to this full time, such as an Enterprise Cloud Manager, as well as people managing cloud infrastructure on top of other responsibilities.
Why are these tools important? They provide two main things: visibility and action to act on those recommendations. (That is, unless you’re willing to let go of the steering wheel and let the platform make the decisions – but most folks aren’t, yet.) Customers that were once managing resources manually are now saving time and money by implementing an automation tool. Take a look at the automation tools that are set up through your cloud vendor, as well as third-party tools that are available for cost optimization and beyond. Setting up these tools for automation will lessen the need for routine check-ins and maintenance while ensuring your infrastructure is optimized.
Do we really need this role?
To put it simply, if you have more than a handful of cloud instances: yes. If you’re small, it may be part of someone’s job description. If you’re large, it may be a center of excellence.
But if you want your organization to be well informed and up to date, then it is important that you have the organizational roles in place to oversee your cloud operations – an Enterprise Cloud Manager, CCoE and automation tools.