The next plain on the cost optimization frontier for ParkMyCloud is cloud sizing. We have been working on product features around resource sizing that will deliver greater automation in the management of cloud infrastructure. A key part of this effort has involved analysis of cloud usage patterns across our entire user base. We’ve identified some interesting patterns and correlations in cloud sizing and usage.
vCPU Utilization Patterns: Lower than Expected
One data point that caught our attention was vCPU metric data, specifically the very low average (and peak) utilization we see in our users’ infrastructure. We know anecdotally that a large proportion of what users manage in our platform consists of non-production instances used for development, staging, testing, and data analytics workloads, many of which do not need to run 24/7/365. But even bearing this in mind, we see a surprisingly low vCPU utilization. Based on our most recent analysis of instances from across the four public cloud providers we support, some 50% of instances had an average vCPU of only 2% and a peak of 55%. Even at the 75th percentile, average utilization was only 7%, albeit with a peak of 98%.
What leads to these cloud sizing decisions?
Of course, when selecting instance sizes and types, vCPU is not the only consideration. To make an accurate assessment of the match between workload and instance type, there are several data points to consider, including memory, network, disk, etc. We have no visibility into the specific workloads on these instances and why they were chosen, but we can make some educated guesses about why this systematic overprovisioning of instances is occurring.
A few potential reasons include:
- A need to provision instances with larger vCPUs in order to access instances with the required memory
- A need to provision larger storage-optimized instances where the focus is is high data IOPS
- Using some other ‘rule of thumb’ when provisioning such as the not-so-tried-and-tested ‘determine what I think I need then double it’ rule.
Clearly, there are a number of options which drive the performance and cost of cloud instances (VMs) including: the number of processor cores; the amount of RAM, storage capacity and storage performance, etc. Just focusing on one of these factors might not be overly useful, other than that we observe such extreme underutilization of one of these key components.
How much do cloud sizing choices matter?
Given the sheer volume of workloads moving to public cloud — some 80% of enterprises reported moving workloads to cloud in 2017 — it is critical to accurately determine, monitor and then optimize your compute resources. If you think there’s a problem with improper cloud sizing in your environment, you may want to check out our recently published cloud waste checklist to identify other problem areas and take action to reduce costs.
There are many reasons why this “supersize me” approach to cloud sizing is occurring. We would be interested to get your take. How does your team determine compute requirements for cloud workloads? Are there other reasons why you might deliberately choose to oversize a resource? Comment below to let us know.