Today we’re going to look at an interesting trend we are seeing toward the use of custom machine types in Google Cloud Platform. One of the interesting byproducts of managing the ParkMyCloud platform is that we get to see changes and trends in cloud usage in real time. Since we’re directly at the customer level, we can often see these changes before they are spotted by official cloud industry commentators. They start off as small signals in the noise, but practice has allowed us to see when something is shifting and a trend is emerging – as is the case with these custom machine types.
Over the last year, the shift to greater use of Custom Machine Types (launched in 2016 on Google Compute Engine) and to a lesser extent Optimized EC2 instances (launched in 2018 on AWS) are just such a signal that we have observed growing in strength. Interestingly, Microsoft has yet to offer their equivalent version on Azure.
What do GCE custom machine types let you do?
Custom machine types let you build a bespoke instance to match the specific needs of your workload. Many workloads can be matched to off-the-shelf instance types, but there are many workloads for which it is now possible to build a better matching machine which delivers a more cost effective price. The growth in adoption of this particular instance type supports the case and likely benefits of their availability.
So what are the benefits of these new customized machines? First, they provide a granular level of control to match the needs of your specific application workloads. In practice, this leads to compromise as you select the closest instance type to your optimal configuration. Such compromises typically lead to over-provisioning, a situation we see across the board among our customer base. We analyzed usage of the instances in our platform this summer, and found that across all the instances in our platform, the average vCPU utilization was less than 10%!
Secondly, they allow you to finely tune your machine to maximize the cost effectiveness of your infrastructure. Google claims savings of up to 50% when utilizing their customized options compared to traditional predefined instances, which we believe to be a reasonable assessment as we see the standard instance types are often massively overprovisioned.
On GCE, the variables that you can configure include:
- Quantity and type of vCPU’s;
- Quantity and type of GPU;
- Memory size (albeit there are some limits on the maximum per vCPU).
Sustained Use Discounts and Preemptible VM Discounts are also available for these customized instances on GCE which also make this an attractive option.
On AWS, customized options are currently more limited and include only the number and type of vCPU’s, and the options are focused on per-core licensed software problems, rather than cost optimization. It will be interesting if they follow Google and open up cost-based customization options in the coming months, and allow the effective unbundling of fixed off-the-shelf instance types.
Should you use custom machine types?
So just because customization is an option, is this something you should actually pursue? In fact, you will pay a small premium compared to the size of standard instances/VMs, albeit you can optimize for specific workloads, which oftentimes will mean an overall lower cost. To make such an assessment will require that you examine your applications resource use and performance requirements. Such determinations require that you carefully analyze your infrastructure utilization data. This quickly gets complex, although there are a number of platforms which can support thorough analytics and data visualization. Ideally, such analytics would be combined with the ability to recommend specific cost-effective customized instance configurations as well as automate their provisioning.
Watch this space for more news on custom machine types!
The next plain on the cost optimization frontier for ParkMyCloud is cloud sizing. We have been working on product features around resource sizing that will deliver greater automation in the management of cloud infrastructure. A key part of this effort has involved analysis of cloud usage patterns across our entire user base. We’ve identified some interesting patterns and correlations in cloud sizing and usage.
vCPU Utilization Patterns: Lower than Expected
One data point that caught our attention was vCPU metric data, specifically the very low average (and peak) utilization we see in our users’ infrastructure. We know anecdotally that a large proportion of what users manage in our platform consists of non-production instances used for development, staging, testing, and data analytics workloads, many of which do not need to run 24/7/365. But even bearing this in mind, we see a surprisingly low vCPU utilization. Based on our most recent analysis of instances from across the four public cloud providers we support, some 50% of instances had an average vCPU of only 2% and a peak of 55%. Even at the 75th percentile, average utilization was only 7%, albeit with a peak of 98%.
What leads to these cloud sizing decisions?
Of course, when selecting instance sizes and types, vCPU is not the only consideration. To make an accurate assessment of the match between workload and instance type, there are several data points to consider, including memory, network, disk, etc. We have no visibility into the specific workloads on these instances and why they were chosen, but we can make some educated guesses about why this systematic overprovisioning of instances is occurring.
A few potential reasons include:
- A need to provision instances with larger vCPUs in order to access instances with the required memory
- A need to provision larger storage-optimized instances where the focus is is high data IOPS
- Using some other ‘rule of thumb’ when provisioning such as the not-so-tried-and-tested ‘determine what I think I need then double it’ rule.
Clearly, there are a number of options which drive the performance and cost of cloud instances (VMs) including: the number of processor cores; the amount of RAM, storage capacity and storage performance, etc. Just focusing on one of these factors might not be overly useful, other than that we observe such extreme underutilization of one of these key components.
How much do cloud sizing choices matter?
Given the sheer volume of workloads moving to public cloud — some 80% of enterprises reported moving workloads to cloud in 2017 — it is critical to accurately determine, monitor and then optimize your compute resources is critical. If you think there’s a problem with improper cloud sizing in your environment, you may want to check out our recently published cloud waste checklist to identify other problem areas and take action to reduce costs.
There are many reasons why this “supersize me” approach to cloud sizing is occurring. We would be interested to get your take. How does your team determine compute requirements for cloud workloads? Are there other reasons why you might deliberately choose to oversize a resource? Comment below to let us know.
Given that spring is very much in the air – at least it is here in Northern Virginia – our attention has turned to tidying up the yard and getting things in good shape for summer. While things are not so seasonally-focused in the world of cloud, the metaphor of taking time out to clean things up applies to unused cloud resources as well. We have even seen some call this ‘cloud pruning’ (not to be confused with the Japanese gardening method).
Cloud pruning is important for improving both cost and performance of your infrastructure. So what are some of the ways you can go about cleaning up, optimizing, and ensuring that our cloud environments are in great shape?
Delete Old Snapshots
Let’s start with focusing on items that we no longer need. One of the most common types of unused cloud resources is old Snapshots. These are your old EBS volumes on AWS, your storage disks (blobs) on Azure, and persistent disks on GCP. If you have had some form of backup strategy then it’s likely that you will understand the need to manage the number of snapshots you keep for a particular volume, and the need to delete older, unneeded snapshots. Cleaning these up immediately helps save on your storage costs and there are a number of best practices documenting how to streamline this process as well as a number of free and paid-for tools to help support this process.
Delete Old Machine Images
A Machine Image provides the information required to launch an instance, which is a virtual server in the cloud. In AWS these are called AMIs, in Azure they’re called Managed Images, and in GCP Custom Images. When these images are no longer needed, it is possible to deregister them. However, depending on your configuration you are likely to continue to incur costs, as typically the snapshot that was created when the image was first created will continue to incur storage costs. Therefore, if you are finished with an AMI, be sure to ensure that you also delete its accompanying snapshot. Managing your old AMIs does require work, but there are a number of methods to streamline these processes made available both by the cloud providers as well as third-party vendors to manage this type of unused cloud resources.
With the widespread adoption of containers in the last few years and much of the focus on their specific benefits, few have paid attention to ensuring these containers are optimized for performance and cost. One of the most effective ways to maximize the benefits of containers is to host multiple containerized application workloads within a single larger instance (typically large or x-large VM) rather than on a number of smaller, separate VMs. In particular, this is something you could utilize in your dev and test environments rather than in production, where you may just have one machine available to deploy to. As containerization continues to evolve, services such as AWS’s Fargate are enabling much more control of the resources required to run your containers beyond what is available today using traditional VMs. In particular, the ability to specify the exact CPU and memory your code requires (and thus the amount you pay) scales exactly with how many containers you are running.
So alongside pruning your trees or sweeping your deck and taking care of your outside spaces this spring, remember to take a look around your cloud environment and look for opportunities to remove unused cloud resources to optimize not only for cost, but also performance.
I have recently spent an increasing amount of time discussing (arguing) about whether the cost per instance in cloud computing is going up or down. The reason for this is that while objective analysis by reputable third parties shows that computing costs are reducing, what we observe from our own standpoint is that the average cost per instance that customers are managing in the ParkMyCloud platform is actually increasing. Following on from a recent blog by our CTO (The Cost of Cloud Computing Is, in Fact, Dropping Dramatically) we decided to undertake some more detailed analysis to look at this phenomenon.
We identified a cohort of our customers who had been with ParkMyCloud for at least one full year and looked at what happened to their average cost per instance over a one-year time period. What we discovered was that the average cost per instance, as charged by the cloud provider, had indeed increased from $214 to $329 per instance per month for our customers using Amazon, Microsoft and Google clouds – a 65% increase. Set against the backdrop of the reported falling costs of cloud computing, this clearly seems to be an anomaly. Or is it?
Digging a little deeper, we discovered that two-thirds of our customers were spending an increased amount per instance per month over the last 12 months and only one third were paying the same amount or less than before. Interestingly, of those who saw a price increase, one third saw their average cost per instance increase by more than 25%.
So what do we think is happening? One possible explanation is something we will refer to as The Apple Upgrade Syndrome. Each time there is an iPhone upgrade cycle, Apple’s product marketing gurus carefully price the new products — and they also adjust the pricing on their older products. When we walk into the Apple Store to peruse the new offerings, we have a clear choice of either purchasing the previous flagship model at a discounted price, or the new, sexy upgraded model at a price premium. A rational actor should buy the discounted model, which just the day before was hundreds of dollars more. But that’s not what most of us do. What we want is the new model with the additional bells and whistles (e.g. face tracking technology and studio lighting settings for the camera) and are willing to pay the extra. As a result, despite the overall cost of mobile computing falling, your monthly phone bill keeps increasing.
We believe that the same phenomenon is at work in cloud computing when the new generations of instances are released, and the cloud computing buyers decide to trade-up to these new more powerful instances (e.g. more cores, more memory, etc.), despite the fact that previous generations of instances might actually have their prices reduced. So while Amazon, Microsoft or Google might pronounce a “25 percent improvement in price-performance” for a new generation of instances, the reality is that new instances cost more and therefore drive up the monthly spend.
Next, we’ll share a more in-depth analysis that will review the instance types driving these increases. At the end of the day, we are all likely correct. The cost of cloud computing is indeed going down, but the average cost per instance is actually going up.