Spot instances and similar “spare capacity” models are frequently cited as one of the top ways to save money on public cloud. However, we’ve noticed that fewer cloud customers are taking advantage of this discounted capacity than you might expect.
We say “spot instances” in this article for simplicity, but each cloud provider has their own name for the sale of discounted spare capacity – AWS’s spot instances, Azure’s spot VMs and Google Cloud’s preemptible VMs.
Spot instances are a type of purchasing option that allows users to take advantage of spare capacity at a low price, with the possibility that it could be reclaimed for other workloads with just brief notice.
In the past, AWS’s model required users to bid on Spot capacity. However, the model has since been simplified so users don’t actually have to bid for Spot Instances anymore. Instead, they pay the Spot price that’s in effect for the current hour for the instances that they launch. The prices are now more predictable with much less volatility. Customers still have the option to control costs by providing a maximum price that they’re willing to pay in the console when they request Spot Instances.
Spot Instances in Each Cloud
Variations of spot instances are offered across different cloud providers. AWS has Spot Instances while Google Cloud offers preemptible VMs and as of March of this year, Microsoft Azure announced an even more direct equivalent to Spot Instances, called Azure Spot Virtual Machines.
Spot VMs have replaced the preview of Azure’s low-priority VMs on scale sets – all eligible low-priority VMs on scale sets have automatically been transitioned to Spot VMs. Azure Spot VMs provide access to unused Azure compute capacity at deep discounts. Spot VMs can be evicted at any time if Azure needs capacity.
AWS spot instances have variable pricing. Azure Spot VMs offer the same characteristics as a pay-as-you-go virtual machine, the differences being pricing and evictions. Google Preemptible VMs offer a fixed discounting structure. Google’s offering is a bit more flexible, with no limitations on the instance types. Preemptible VMs are designed to be a low-cost, short-duration option for batch jobs and fault-tolerant workloads.
Adoption of Spot Instances
Our research indicates that less than 20% of cloud users use spot instances on a regular basis, despite spot being on nearly every list of ways to reduce costs (including our own).
While applications can be built to withstand interruption, specific concerns remain, such as loss of log data, exhausting capacity and fluctuation in the spot market price.
In AWS, it’s important to note that while spot prices can reach the on-demand price, since they are driven by long-term supply and demand, they don’t normally reach on-demand price.
A Spot Fleet, in which you specify a certain capacity of instances you want to maintain, is a collection of Spot Instances and can also include On-Demand Instances. AWS attempts to meet the target capacity specified by using a Spot Fleet to launch the number of Spot Instances and On-Demand Instances specified in the Spot Fleet request.
To help reduce the impact of interruptions, you can set up Spot Fleets to respond to interruption notices by hibernating or stopping instances instead of terminating when capacity is no longer available. Spot Fleets will not launch on-demand capacity if Spot capacity is not available on all the capacity pools specified.
AWS also has a capability that allows you to use Amazon EC2 Auto Scaling to scale Spot Instances – this feature also combines different EC2 instance types and pricing models. You are in control of the instance types used to build your group – groups are always looking for the lowest cost while meeting other requirements you’ve set. This option may be a popular choice for some as ASGs are more familiar to customers compared to Fleet, and more suitable for many different workload types. If you switch part or all of your ASGs over to Spot Instances, you may be able to save up to 90% when compared to On-Demand Instances.
Another interesting feature worth noting is Amazon’s capacity-optimized spot instance allocation strategy. When customers diversify their Fleet or Auto Scaling group, the system will launch capacity from the most available capacity pools, effectively decreasing interruptions. In fact, by switching to capacity-optimized allocation users are able to reduce their overall interruption rate by about 75%.
Is “Eviction” Driving People Away?
There is one main caveat when it comes to spot instances – they are interruptible. All three major cloud providers have mechanisms in place for these spare capacity resources to be interrupted, related to changes in capacity availability and/or changes in pricing.
This means workloads can be “evicted” from a spot instance or VM. Essentially, this means that if a cloud provider needs the resource at any given time, your workloads can be kicked off. You are notified when an AWS spot instance is going to be evicted: AWS emits an event two minutes prior to the actual interruption. In Azure, you can opt to receive notifications that tell you when your VM is going to be evicted. However, you will have only 30 seconds to finish any jobs and perform shutdown tasks prior to the eviction making it almost impossible to manage. Google Cloud also gives you 30 seconds to shut down your instances when you’re preempted so you can save your work for later. Google also always terminates preemptible instances after 24 hours of running. All of this means your application must be designed to be interruptible, and should expect it to happen regularly – difficult for some applications, but not so much for others that are rather stateless, or normally process work in small chunks.
Companies such as Spot – recently acquired by NetApp (congrats!) – help in this regard by safely moving the workload to another available spot instance automatically.
Our research has indicated that fewer than one-quarter of users agree that their spot eviction rate was too low to be a concern – which means for most, eviction rate is a concern. Of course, it’s certainly possible to build applications to be resilient to eviction. For instance, applications can make use of many instance types in order to tolerate market fluctuations and make appropriate bids for each type.
AWS also offers an automatic scaling feature that has the ability to increase or decrease the target capacity of your Spot Fleet automatically based on demand. The goal of this is to allow users to scale in conservatively in order to protect your application’s availability.
Early Adopters of Spot and Other Innovations May be One and the Same
People who are hesitant to build for spot more likely use regular VMs, perhaps with Reserved Instances for savings. It’s likely that people open to the idea of spot instances are the same who would be early adopters for other tech, like serverless, and no longer have a need for Spot.
For the right architecture, spot instances can provide significant savings. It’s a matter of whether you want to bother.