As container technology moves past something new and into the mainstream, users are concerned about the next step: container optimization. In our conversations with customers and potential customers, containers have been a consistent topic for the last few years, typically focused on production environments. However, recent conversations have become more focused, specifically on how to optimize container spending.
Kubernetes – which seems to be the most popular of container services among our customer base – does allow for a number of ways to optimize for costs and to maximize performance. We have identified five specific opportunities ripe for container optimization. Take a look at these within your own environments.
1) Rightsize Your Pods
Kubernetes Pods are the smallest deployable computing units in the Kubernetes container environment. It is a common practice to use a standard template for limits and requests for pod provisioning. If requests describe the minimal requirement for the CPU and memory for a pod to be scheduled on a node, the limits describe the max amount of CPU and memory the pod can consume on that specific node. Typically engineers set the initial limits by using a rule of thumb, such as doubling it just to be on the safe side and then planning to change it later once they have some data to look at. As with many things in life, “later” rarely happens. As a result, the footprint of the cluster inflates over time, exceeding the actual demand for the services running inside the cluster.
Just think about it, if every pod is over-provisioned by 50% and the cluster is always is 80% full, that means that 40% of the cluster capacity is allocated but not used, or simply put — wasted.
2) Turn Off Idle Pods
Many standard instances/VMs and databases in non-production environments are idle outside of working hours and can be turned off or “parked”. The same case exists for Pods, which in non-production environments can and should be scheduled in the same way.
3) Rightsize Your Nodes
Too many worker nodes are the wrong size and type. Kubernetes permits co-allocating the applications on the same nodes, which can dramatically reduce the cloud bill. Yet, incorrectly sized instances and volumes can lead to the inflation of the cost of Kubernetes clusters. Rightsizing could save up to 50% (particularly if no previous action has been taken to rightsize your nodes.)
Another thing to consider is that smaller nodes have a higher relative OS footprint and increase management overhead. The smaller the node, the higher the number of stranded resources. Stranded resources are CPU or memory which are idle, yet cannot be allocated to any of the pods, because the pods which are to be scheduled are too big to claim it. If a pod’s sizes are close to the size of the node (server) the percentage of the resources which are stranded gets higher.
4) Consider Storage Opportunities
Out of the box, containers lose their data when they restart. This is fine for stateless components but becomes an issue when a persistent data store is required. One place to look for additional container optimization opportunities is the overprovisioning of persistent storage (EBS, Azure Storage Disks, etc) related to your containers. There are a number of options to optimize container storage, particularly virtualized storage that can be shared by multiple containers, and which persists over time, without being destroyed when individual containers are destroyed. There are a few different persistent-storage plugins and plugin-driven storage solutions available from third-party vendors.
5) Review Purchasing Options
All of the preceding options related to the actual configuration of your container infrastructure. Just as important as this is ensuring that your purchasing options closely align with your needs. Ensuring the correct instance/VM purchase type for your containerized infrastructure is critical to ensuring flexibility and maximizing ROI. Carefully analyze your purchasing options (e.g. on-demand, reservations and spot) to select the right option for your workload, both in terms of size and usage schedule. Note that reserved instances are not always the best option for resources that can be scheduled to be turned off. Leverage cost optimization tools to support the earlier options for instance scheduling and rightsizing. Such tools can often change the equation and help avoid lock-ins and upfront commitments.
Container Optimization is Just Another Kind of Resource Optimization
The opportunities to save money through container optimization are in essence no different than for your non-containerized resources. Native tools, from either the cloud provider or open source, can help with this, but their capabilities are limited. For a fully optimized environment, you’ll want to take advantage of the growing ecosystem of specific cost optimization tools.
Stay tuned for news from ParkMyCloud on this front coming soon!
The next plain on the cost optimization frontier for ParkMyCloud is cloud sizing. We have been working on product features around resource sizing that will deliver greater automation in the management of cloud infrastructure. A key part of this effort has involved analysis of cloud usage patterns across our entire user base. We’ve identified some interesting patterns and correlations in cloud sizing and usage.
vCPU Utilization Patterns: Lower than Expected
One data point that caught our attention was vCPU metric data, specifically the very low average (and peak) utilization we see in our users’ infrastructure. We know anecdotally that a large proportion of what users manage in our platform consists of non-production instances used for development, staging, testing, and data analytics workloads, many of which do not need to run 24/7/365. But even bearing this in mind, we see a surprisingly low vCPU utilization. Based on our most recent analysis of instances from across the four public cloud providers we support, some 50% of instances had an average vCPU of only 2% and a peak of 55%. Even at the 75th percentile, average utilization was only 7%, albeit with a peak of 98%.
What leads to these cloud sizing decisions?
Of course, when selecting instance sizes and types, vCPU is not the only consideration. To make an accurate assessment of the match between workload and instance type, there are several data points to consider, including memory, network, disk, etc. We have no visibility into the specific workloads on these instances and why they were chosen, but we can make some educated guesses about why this systematic overprovisioning of instances is occurring.
A few potential reasons include:
- A need to provision instances with larger vCPUs in order to access instances with the required memory
- A need to provision larger storage-optimized instances where the focus is is high data IOPS
- Using some other ‘rule of thumb’ when provisioning such as the not-so-tried-and-tested ‘determine what I think I need then double it’ rule.
Clearly, there are a number of options which drive the performance and cost of cloud instances (VMs) including: the number of processor cores; the amount of RAM, storage capacity and storage performance, etc. Just focusing on one of these factors might not be overly useful, other than that we observe such extreme underutilization of one of these key components.
How much do cloud sizing choices matter?
Given the sheer volume of workloads moving to public cloud — some 80% of enterprises reported moving workloads to cloud in 2017 — it is critical to accurately determine, monitor and then optimize your compute resources is critical. If you think there’s a problem with improper cloud sizing in your environment, you may want to check out our recently published cloud waste checklist to identify other problem areas and take action to reduce costs.
There are many reasons why this “supersize me” approach to cloud sizing is occurring. We would be interested to get your take. How does your team determine compute requirements for cloud workloads? Are there other reasons why you might deliberately choose to oversize a resource? Comment below to let us know.
Given that spring is very much in the air – at least it is here in Northern Virginia – our attention has turned to tidying up the yard and getting things in good shape for summer. While things are not so seasonally-focused in the world of cloud, the metaphor of taking time out to clean things up applies to unused cloud resources as well. We have even seen some call this ‘cloud pruning’ (not to be confused with the Japanese gardening method).
Cloud pruning is important for improving both cost and performance of your infrastructure. So what are some of the ways you can go about cleaning up, optimizing, and ensuring that our cloud environments are in great shape?
Delete Old Snapshots
Let’s start with focusing on items that we no longer need. One of the most common types of unused cloud resources is old Snapshots. These are your old EBS volumes on AWS, your storage disks (blobs) on Azure, and persistent disks on GCP. If you have had some form of backup strategy then it’s likely that you will understand the need to manage the number of snapshots you keep for a particular volume, and the need to delete older, unneeded snapshots. Cleaning these up immediately helps save on your storage costs and there are a number of best practices documenting how to streamline this process as well as a number of free and paid-for tools to help support this process.
Delete Old Machine Images
A Machine Image provides the information required to launch an instance, which is a virtual server in the cloud. In AWS these are called AMIs, in Azure they’re called Managed Images, and in GCP Custom Images. When these images are no longer needed, it is possible to deregister them. However, depending on your configuration you are likely to continue to incur costs, as typically the snapshot that was created when the image was first created will continue to incur storage costs. Therefore, if you are finished with an AMI, be sure to ensure that you also delete its accompanying snapshot. Managing your old AMIs does require work, but there are a number of methods to streamline these processes made available both by the cloud providers as well as third-party vendors to manage this type of unused cloud resources.
With the widespread adoption of containers in the last few years and much of the focus on their specific benefits, few have paid attention to ensuring these containers are optimized for performance and cost. One of the most effective ways to maximize the benefits of containers is to host multiple containerized application workloads within a single larger instance (typically large or x-large VM) rather than on a number of smaller, separate VMs. In particular, this is something you would could utilize in your dev and test environments rather than in production, where you may just have one machine available to deploy to. As containerization continues to evolve, services such as AWS’s Fargate are enabling much more control of the resources required to run your containers beyond what is available today using traditional VMs. In particular, the ability to specify the exact CPU and memory your code requires (and thus the amount you pay) scales exactly with how many containers you are running.
So alongside pruning your trees or sweeping your deck and taking care of your outside spaces this spring, remember to take a look around your cloud environment and look for opportunities to remove unused cloud resources to optimize not only for cost, but also performance.
I have recently spent an increasing amount of time discussing (arguing) about whether the cost per instance in cloud computing is going up or down. The reason for this is that while objective analysis by reputable third parties shows that computing costs are reducing, what we observe from our own standpoint is that the average cost per instance that customers are managing in the ParkMyCloud platform is actually increasing. Following on from a recent blog by our CTO (The Cost of Cloud Computing Is, in Fact, Dropping Dramatically) we decided to undertake some more detailed analysis to look at this phenomenon.
We identified a cohort of our customers who had been with ParkMyCloud for at least one full year and looked at what happened to their average cost per instance over a one-year time period. What we discovered was that the average cost per instance, as charged by the cloud provider, had indeed increased from $214 to $329 per instance per month for our customers using Amazon, Microsoft and Google clouds – a 65% increase. Set against the backdrop of the reported falling costs of cloud computing, this clearly seems to be an anomaly. Or is it?
Digging a little deeper, we discovered that two-thirds of our customers were spending an increased amount per instance per month over the last 12 months and only one third were paying the same amount or less than before. Interestingly, of those who saw a price increase, one third saw their average cost per instance increase by more than 25%.
So what do we think is happening? One possible explanation is something we will refer to as The Apple Upgrade Syndrome. Each time there is an iPhone upgrade cycle, Apple’s product marketing gurus carefully price the new products — and they also adjust the pricing on their older products. When we walk into the Apple Store to peruse the new offerings, we have a clear choice of either purchasing the previous flagship model at a discounted price, or the new, sexy upgraded model at a price premium. A rational actor should buy the discounted model, which just the day before was hundreds of dollars more. But that’s not what most of us do. What we want is the new model with the additional bells and whistles (e.g. face tracking technology and studio lighting settings for the camera) and are willing to pay the extra. As a result, despite the overall cost of mobile computing falling, your monthly phone bill keeps increasing.
We believe that the same phenomenon is at work in cloud computing when the new generations of instances are released, and the cloud computing buyers decide to trade-up to these new more powerful instances (e.g. more cores, more memory, etc.), despite the fact that previous generations of instances might actually have their prices reduced. So while Amazon, Microsoft or Google might pronounce a “25 percent improvement in price-performance” for a new generation of instances, the reality is that new instances cost more and therefore drive up the monthly spend.
Next, we’ll share a more in-depth analysis that will review the instance types driving these increases. At the end of the day, we are all likely correct. The cost of cloud computing is indeed going down, but the average cost per instance is actually going up.