Spot instances and similar “spare capacity” models are frequently cited as one of the top ways to save money on public cloud. However, we’ve noticed that fewer cloud customers are taking advantage of this discounted capacity than you might expect.
We say “spot instances” in this article for simplicity, but each cloud provider has their own name for the sale of discounted spare capacity – AWS’s spot instances, Azure’s spot VMs and Google Cloud’s preemptible VMs.
Spot instances are a type of purchasing option that allows users to take advantage of spare capacity at a low price, with the possibility that it could be reclaimed for other workloads with just brief notice. In AWS, for example, the customer makes a Spot Request that essentially includes a “maximum bid” for how much they are willing to pay for a spot instance. If the current spot price is at or below this bid price, then the spot instance is started. When demand for cloud resources increases, the Spot Price increases, and shortly after it exceeds the customer bid price, the instance is terminated. This allows cloud vendors to deploy unused resources for a significantly lower cost, but requires that workloads are designed to be resilient against interruptions. Could this requirement be driving users away?
Spot Instances in Each Cloud
Variations of spot instances are offered across different cloud providers. AWS has Spot Instances while Google Cloud offers preemptible VMs and as of March of this year, Microsoft Azure announced an even more direct equivalent to Spot Instances, called Azure Spot Virtual Machines.
Spot VMs have replaced the preview of Azure’s low-priority VMs on scale sets – all eligible low-priority VMs on scale sets have automatically been transitioned to Spot VMs. Azure Spot VMs provide access to unused Azure compute capacity at deep discounts. Spot VMs can be evicted at any time if Azure needs capacity.
AWS spot instances have variable pricing. Azure Spot VMs offer the same characteristics as a pay-as-you-go virtual machine, the differences being pricing and evictions. Google Preemptible VMs offer a fixed discounting structure. Google’s offering is a bit more flexible, with no limitations on the instance types. Preemptible VMs are designed to be a low-cost, short-duration option for batch jobs and fault-tolerant workloads.
Adoption of Spot Instances
Our research indicates that less than 20% of cloud users use spot instances on a regular basis, despite spot being on nearly every list of ways to reduce costs (including our own).
While applications can be built to withstand interruption, specific concerns remain, such as loss of log data, exhausting capacity and fluctuation in the spot market price.
In AWS, the issue in the market occurs when the price of a spot instance can rise beyond its typical historic price. This can make it difficult for a customer to judge the best bid price to use. If the spot price is the same as the on-demand price, it defeats the purpose of using the Spot Instance. AWS addresses this problem with the notion of a Spot Fleet, in which you specify a certain capacity of instances you want to maintain. If the Spot instances are terminated, the Spot Fleet will automatically backfill the fleet with on-demand instances, allowing you to take advantage of whatever discounts you can, while maintaining your operations.
In any given zone, another potential issue is that capacity of an instance type could be completely exhausted. If capacity is exhausted it prevents applications from running if they are dependent on a specific instance type or zone. Not to turn into a commercial for Spot Fleet, but this is addressed as well, by allowing you to specify a range of instance types that would be acceptable for your workload.
Is “Eviction” Driving People Away?
There is one main caveat when it comes to spot instances – they are interruptible. All three major cloud providers have mechanisms in place for these spare capacity resources to be interrupted, related to changes in capacity availability and/or changes in pricing.
This means workloads can be “evicted” from a spot instance or VM. Essentially, this means that if a cloud provider needs the resource at any given time, your workloads can be kicked off. You are notified when an AWS spot instance is going to be evicted: AWS emits an event two minutes prior to the actual interruption. In Azure, you can opt to receive notifications that tell you when your VM is going to be evicted. However, you will have only 30 seconds to finish any jobs and perform shutdown tasks prior to the eviction making it almost impossible to manage. Google Cloud also gives you 30 seconds to shut down your instances when you’re preempted so you can save your work for later. Google also always terminates preemptible instances after 24 hours of running. All of this means your application must be designed to be interruptible, and should expect it to happen regularly – difficult for some applications, but not so much for others that are rather stateless, or normally process work in small chunks.
Companies such as Spot – recently acquired by NetApp (congrats!) – help in this regard by safely moving the workload to another available spot instance automatically.
Our research has indicated that fewer than one-quarter of users agree that their spot eviction rate was too low to be a concern – which means for most, eviction rate is a concern. Of course, it’s certainly possible to build applications to be resilient to eviction. For instance, applications can make use of many instance types in order to tolerate market fluctuations and make appropriate bids for each type.
AWS also offers an automatic scaling feature that has the ability to increase or decrease the target capacity of your Spot Fleet automatically based on demand. The goal of this is to allow users to scale in conservatively in order to protect your application’s availability.
Early Adopters of Spot and Other Innovations May be One and the Same
People who are hesitant to build for spot more likely use regular VMs, perhaps with Reserved Instances for savings. It’s likely that people open to the idea of spot instances are the same who would be early adopters for other tech, like serverless, and no longer have a need for Spot.
For the right architecture, spot instances can provide significant savings. It’s a matter of whether you want to bother.