Large companies have traditionally had an impressive list of batch workloads, which run at night, when people have gone home for the day. These include such things as application and database backup jobs; extraction, transform, and load (ETL) jobs; disaster recovery (DR) environment checks and updates; online analytical processing (OLAP) jobs; and monthly/ quarterly billing updates or financial “close”, to name a few.
Traditionally, with on-premise data centers, these workloads have run at night to allow the same hardware infrastructure that supports daytime interactive workloads to be repurposed, if you will, to run these batch workloads at night. This served a couple of purposes:
- It avoided network contention between the two workloads (as both are important), allowing the interactive workloads to remain responsive.
- It avoided data center sprawl by using the same infrastructure to run both, rather than having dedicated infrastructure for interactive and batch.
Things Are Different with Public Cloud
As companies move to the public cloud, they are no longer constrained by having to repurpose the same infrastructure. In fact, they can spin up and spin down new resources on demand in AWS, Azure or Google Cloud Platform (GCP), running both interactive and batch workloads whenever they want.
Network contention is also less of concern, since the public cloud providers typically have plenty of bandwidth. The exception of course is where batch workloads use the same application interfaces or APIs to read/write data.
So, moving to public cloud offers a spectrum of possibilities, and you can use one or any combination of them:
- You can run batch nightly using similar processes as you do in your online data centers, but on separate provisioned instances/virtual machines. This probably results in the least effort to moving batch to the public cloud, the least change to your DevOps processes, and perhaps saves you some money by having instances sized specifically for the workloads and being able to leverage cloud cost savings options (e.g., reserved instances);
- You can run batch on separately provisioned instances/virtual machines, but concurrently with existing interactive workloads. This will likely result in some additional work to change your DevOps processes, but offers more freedom and similar benefits mentioned above. You will still need to pay attention to application interfaces/APIs the workloads may have in common; or
- At the extreme end of the cloud adoptions spectrum, you could use cloud provider platform as a service (PaaS) offerings, such as AWS Batch, Microsoft Azure Batch or GCP Cloud Dataflow, where batch is essentially treated as a “black box”. A detailed comparison of these services is beyond the scope of this blog. However, in summary, these are fully managed services, where you queue up input data in an S3 bucket, object blob or volume along with a job definition, appropriate environment variables and a schedule and you’re off to races. These services employ containers and autoscaling/resource groups/instance groups where appropriate, with options to use less expensive compute in some cases. (For example, with AWS Batch, you have the option of using spot instances.)
The advantage of this approach is potentially faster time to implement and (maybe) less expensive monthly cloud costs, because the compute services run only at the times you specify. The disadvantages of this approach may be the degree of operational/configuration control you have; the fact, that these services may be totally foreign to your existing DevOps folks/processes (i.e., there is a steep learning curve); and it may tie you to that specific cloud provider.
A Simple Alternative
If you are looking to minimize impact to your DevOps processes (that is, the first two approaches mentioned above), but still save money, then ParkMyCloud can help.
Normally, with the first two options, there are cron jobs scheduled to kick-off batch jobs at the appropriate times throughout the day, but the underlying instances must be running for cron to do its thing. You could use ParkMyCloud to put parking schedules on these resources, such they are turned OFF for most of the day, but are turned ON just-in-time to still allow the cron jobs to execute.
We have been successfully using this approach in our own infrastructure for some time now, to control a batch server used to do database backups. This would, in fact, provide more savings than AWS reserved instances.
Let’s look at specific example in AWS. Suppose you have an m4.large server you use run batch jobs. Assuming Linux pricing in us-east-1, this server costs $0.10 per hour, or about $73 per month. Suppose you have configured cron to start batch jobs at midnight UTC and that they normally complete 1 to 1-½ hours later.
You could purchase a Reserved Instance for that server, where you either pay nothing upfront or all upfront and your savings would be 38%-42%.
Or, you could put a ParkMyCloud schedule where the instance is only ON from 11 pm-1 am UTC, allowing enough time for the cron jobs to start and run. The savings in that case would be 87.6% (including the cost of ParkMyCloud) without the need for a one year commitment. Depending on how many batch servers you run in your environment and their sizes, that could be some hefty savings.
Public cloud will offer you a lot of freedom and some potentially attractive cost savings as you move batch workloads from on premise. You are no longer constrained by having the same infrastructure serve two vastly different types of workloads — interactive and batch. The savings you can achieve by moving to public cloud can vary, depending on the approach you take and the provider/service you use.
The approach you take, depends on the amount of process change you’re willing to absorb in your DevOps processes. If you are willing to throw caution to the wind, the cloud provider PaaS offerings for batch can be quite compelling.
If you wish to take a more cautious approach, then we engineered ParkMyCloud to park servers without the need for scripting, or the need for you to be a DevOps expert. This approach allows you to achieve decent savings, with minimal change to your DevOps batch processes and without the need for Reserved Instances.
Back in May 2017 I wrote a very popular blog about Cutting through the AWS and Azure Cloud Pricing Confusion.
Since ParkMyCloud also provides cost control for Google Cloud Platform (GCP) resources, I thought it might be useful to compare AWS vs Google Cloud pricing. An addition I will take a look at the terminology, and billing differences NOTE: There are other “services” involved, such as networking, storage and load balancing, when looking at your overall bill. I am going to be focused mainly on compute charges in this article.
AWS and GCP Terminology Differences
As mentioned before, in AWS, their compute service is called “Elastic Compute Cloud” (EC2). The virtual servers are called “Instances”.
In GCP, the service is referred to as “Google Compute Engine” (GCE). The servers are called also called “instances”. However, in GCP there are “preemptible” and non-preemptible instances. Non-preemptible instances are the same as AWS “on demand” instances.
Preemptible instances are similar to AWS “spot” instances, in that they are a lot less expensive, but can be preempted with little or no notice. The difference is that GCP preemptible instances can actually be stopped without being terminated. That is not true for AWS spot instances.
Flocks of these instances spun up from a snapshot according scaling rules are called “auto scaling groups” in AWS.
The similar concept can be created within GCP using “instance groups”. However, instance groups are really more of a “stack”, which are created using an “instance group template”. As such, they are more closely related to AWS CloudFormation stacks.
AWS and GCP Compute Sizing
Both AWS and GCP have a dizzying array of instance sizes to choose from, and doing an apples-to-apples comparison between them can be quite challenging. These predefined instance sizes are based upon number of virtual cores, amount of virtual memory and amount of virtual disk.
They have different categories.
- Free tier – inexpensive, burst performance (t2 family)
- General purpose (m3/m4 family)
- Compute optimized (c4 family)
- GPU instances (p2 family)
- FPGA instances (f1 family)
- Memory optimized (x1, r3/r4 family)
- Storage optimized (i3, d2 family)
GCP offers the following predefined types:
- Free tier – inexpensive, burst performance (f1/g1 family)
- Standard, shared core (n1-standard family)
- High memory (n1-highmem family)
- High CPU (n1-highCPU family)
However, GCP also allows you to make your own custom machine types, if none of the predefined ones fit your workload. You pay for uplifts in CPU/Hr and memory GiB/Hr. You can also add GPUs and premium processors as uplifts.
Both providers take marketing liberties with things like memory and disk sizes. For example, AWS lists its memory size in GiB (base2) and disk size in GB (base10).
GCP reports its memory size and disk size as GB. However, to make things really confusing this is what they say on their pricing page: “Disk size, machine type memory, and network usage are calculated in gigabytes (GB), where 1 GB is 2^30 bytes. This unit of measurement is also known as a gibibyte (GiB).”
This, of course, is pure nonsense. A gigabyte (GB) is 10^9 bytes. A gibibyte (GiB) is 2^30 bytes. The two are definitely NOT equal. It was probably just a typo.
If you look at what is actually delivered, neither seems to match what is shown on their pricing pages. For example, an AWS t2.micro is advertised as having 1 GiB of memory. In reality, it is 0.969 GiB (using “top”).
For GCP, their f1.micro is advertised as “0.6 GB”. Assuming they simply have their units mixed up and “GB” should really be “GiB”, they actually deliver 0.580 GiB. So, both round up, as marketing/sales people are apt to do.
With respect to pricing, this is how the two seem to compare, by looking at some of the most common “work horses” and focusing on CPU, memory and cost. (One would have to run actual benchmarks to more accurately compare):
The bottom line:
In general, for most workloads AWS is less expensive on a CPU/Hr basis. For compute intensive workloads, GCP instances are less expensive
Also, as you can see from the table, both providers charge uplifts for different operating systems, and those uplifts can be substantial! You really need to pay attention to the fine print. For example, GCP charges a 4 core minimum for all their SQL uplifts (yikes!). And, in the case of Red Hat Enterprise Licensing (RHEL) in GCP, they charge you a 1 hour minimum for the uplifts and in 1 hour increments after that. (We’ll talk more about how the providers charge you in the next section.)
AWS vs. Google Cloud Pricing – Examining the Differences
Cost/Hr is only one aspect of the equation, though. To better understand your monthly bill, you must also understand how the cloud providers actually charge you. AWS prices their compute time by the hour, but requires a 1 hour minimum. If you start an instance and run it for 61 minutes then shut it down, you get charged for 2 hours of compute time.
Google Compute Engine pricing is also listed by the hour for each instance, but they charge you by the minute, rounded up to the nearest minute, with a 10 minute minimum charge. So, if you run for 1 minute, you get charged for 10 minutes. However, if you run for 61 minutes, you get charged for 61 minutes. On the surface, this sounds very appealing (and makes me want to wag my finger at AWS and say, “shame on you, AWS”).
You also really need to pay attention to the use case and the comparable instance prices. Let me give you a concrete example. So, here is a graph of 6 months worth of data from an m4.large instance. Remember that our goal at ParkMyCloud is to help you “park” non-production instances automatically, when they are not being used, to save you money.
This instance is on a ParkMyCloud parking schedule, where it is RUNNING from 8:00 a.m. to 7:00 p.m. on weekdays and PARKED evenings and weekends. This instance, assuming Linux pricing, costs $0.10 per hour in AWS. From November 6, 2016 until May 9, 2017, this instance ran for 111,690 minutes. This is actually about 1,862 hours, but AWS charged for 1,922 hours and it cost $192.20 in compute time.
Why the difference? ParkMyCloud has a very fast and accurate orchestration engine, but when you start and stop instances, the cloud provider and network response can vary from hour-to-hour and day-to-day, depending on their load, so occasionally things will run that extra minute. And, even though this instance is on a parking schedule, when you look at the graph, you can see that the user took manual control a few times, perhaps to do maintenance. Stuff happens!
What would it have cost to run the similar instance in GCP? If you look at the comparable GCP instance, (the n1-standard-2), it costs $0.1070/hour. So, this workload running in GCP would have cost $199.18 (not including Sustained Use Discounts). Since this instance really only ran 42.6% of the time (111,690 minutes out of 262,140 minutes), it would qualify for a partial Sustained Use Discount. With those discounts the actual cost would have been about $182.72. This is about $10 cheaper than AWS, even though per hour cost for AWS was lower). That may not seem much, but if you have hundreds or thousands of instances, it adds up.
AWS Reserved Instances vs GCP Committed Use
Both providers offer deeper discounts off their normal pricing, for “predictable” workloads that need to run for sustained periods of time, if you are willing to commit to capacity consumption upfront. AWS offers Reserved Instances. Google offers Committed Use Discounts (currently in beta). An in-depth comparison of these is beyond the intent of this blog (and you have already been very patient, if you made it this far). Therefore, I’ll reserve that discussion for a future blog.
If you are new to public cloud, once you get past all the confusing jargon, the creative approaches to pricing and the different ways providers charge for usage, the actual cloud services themselves are much easier to use than legacy on-premise services.
The public cloud services do provide much better flexibility and faster time-to-value. The cloud providers simply need to get out of their own way. Pricing is but one example where AWS and GCP could stand to make things a lot simpler, so that newcomers can make informed decisions.
When comparing AWS vs. Google Cloud pricing AWS oEC2 n-demand pricing may on the surface appear to be more competitive than GCPPpricing for comparable compute engine’s. However, when you examine specific workloads and factor in Google’s more enlightened approach to charging for CPU/Hr time and their use of Sustained Use Discounts, GCP may actually be less expensive. AWS really needs to get in-line with both Azure and Google, who charge by the minute and have much smaller minimums. Nobody likes being charged extra for something they don’t use.
In the meantime, ParkMyCloud will continue to help you turn off non-production cloud resources, when you don’t need them and help save you a lot of money on your monthly cloud bills, regardless of which public cloud provider you use.
Amazon Web Services shared today that users can now start and stop RDS instances – check out the full announcement on their blog.
This is good news for cost-conscious engineering teams. Until now, databases were generally left running 24×7, even if they were only used during working hours for testing and staging purposes. Now, they can be turned off, so you’re not charged for time you’re not using. Nice!
Keep in mind that stopping the RDS instances will not bring the cost to zero – you will still be charged for provisioned storage, manual snapshots and automated backup storage.
Now, what if you want to start and stop RDS instances on an automated schedule to ensure they’re not left running when they’re not needed? Coming soon, you’ll be able to with ParkMyCloud!
Start and Stop RDS Instances on a Schedule with ParkMyCloud
Since ParkMyCloud was first released, customers have been asking us for the ability to park their RDS instances in the same way that they can park EC2 instances and auto scaling groups.
The logic to start/stop RDS instances using schedules is already in the production code for ParkMyCloud. We have been patiently waiting for AWS to officially announce this capability, so that we could turn the feature ON and release it to the public. That day is finally here!
Our development team has some final end-to-end testing to complete, just to make sure everything works as expected. Expect RDS parking to be released within a couple of weeks! Let us know if you’d like to be notified when this is released, or if you’re interested in beta testing the new functionality.
We’re excited about this opportunity to give ParkMyCloud users what they’re asking for. What else would you like to see for optimal cost control? Comment below to let us know.
Before I try to break down the AWS and Azure cloud pricing jargon, let me give you some context. I am a crusty, old CTO who has been working in advanced technology since the 1980’s. (That’s more than 18 Moore’s Law cycles for processor and chipset fans, and I have lost count of how many technology hype cycles that has been.)
I have grown accustomed to the “deal of a lifetime” on the “technology of the decade” coming around about once every week. So, you can believe me, when I tell you have a very low BS threshold for dishonest sales folks and bogus technology claims. Yes, I am jaded.
My latest venture is a platform, ParkMyCloud, that brings together multiple public cloud providers. And I can tell you first hand that it is not for the faint-of-heart. It’s like being dropped off in the middle of the jungle in Papua, New Guinea. Each cloud provider has its own culture, its own philosophy, its own language and customers, its own maturity level and, worst of all — its own pricing strategy — which makes it tough for buyers to manage costs. I am convinced that the lowest circles of hell are reserved for people who develop cloud service pricing models. AWS and Azure cloud pricing gurus, beware. And reader, to you: caveat emptor.
AWS and Azure Terminology Differences
Case in point: You have probably read the comparisons of various services across the top cloud providers, as people try to wrap their minds around all the varying jargon used to describe pretty much the same thing. For example, let’s just look at one service: Cloud Computing.
In AWS, servers are called Elastic Compute Cloud (EC2) “Instances”. In Azure they are called “Virtual Machines” or “VMs”. Flocks of these spun up from a snapshot according scaling rules are called “auto scaling groups” in AWS. The same things are called “scale sets” in Azure.
Of course cloud providers had to start somewhere, then they learned from their mistakes and improved. When AWS started with EC2, they had not yet released virtual private clouds (VPCs), so their instances ran outside of VPCs. Now all the latest stuff runs inside of VPCs. The older ones are called, “classic” and have a number of limitations.
The same thing is true of Azure. When they first released, their VMs were not set up to use what is now their Resource Manager or be managed in Resource Groups (the moral equivalent of CloudFormation Stacks in AWS). Now, all of their latest VMs are compatible with Resource Manager. The older ones are called, you guessed it … “classic”.
(What genius came up with the idea to call the older versions of these, the ones you’re probably stranded with and no longer want, “classic”?)
Both AWS and Azure have a dizzying array of instances/VMs to choose from, and doing an apples-to-apples comparison between them can be quite daunting. They have different categories: General purpose, compute optimized, storage optimized, disk optimized, etc.
Then within each one of those, there are types or sizes. For example, in AWS the tiny, cheap ones are currently the “t2” family. In Azure, they are the “A” series. On top of that there are different generations of processors. In AWS, they use an integer after the family type, like t2, m3, m4 and there are sizes, t2.small, m3.medium, m4.large, r16.ginormus (OK, I made that one up).
In Azure, they use a number after the family letter to connote size, like A0, A1, A2, D1, etc. and “v1”, “v2” after that to tell what generation it is, like D1v1, D2v2.
The bottom line: this is very confusing for folks moving their workloads to public cloud from on-premise data centers (yet another Wonderland of jargon and confusion in its own right). How does one decide which cloud provider to use? How does one even begin to compare prices with all of this mess? Cheer up … it gets worse!
AWS and Azure Cloud Pricing – Examining Differences in Charging
To add to that confusion, they charge you differently for the compute time you use. What do I mean? AWS prices their compute time by the hour. And by hour, they mean any fraction of an hour: If you start an instance and run it for 61 minutes then shut it down, you get charged for 2 hours of compute time.
Microsoft Azure cloud pricing is listed by the hour for each VM, but they charge you by the minute. So, if you run for 61 minutes, you get charged for 61 minutes. On the surface, this sounds very appealing (and makes me want to wag my finger at AWS and say, “shame on you, AWS”).
However, you really have to pay attention to the use case and the comparable instance prices. Let me give you a concrete example. I mentioned my latest venture, ParkMyCloud, earlier. We park (schedule on/off times) for cloud computing resources in non-production environments (without scripting by the way). So, here is a graph of 6 months worth of data from an m4.large instance somewhere in Asia Pac. The m4 processor family is based on the Xeon Broadwell or Haswell processor and it is one of the most commonly used instance types.
This instance is on a ParkMyCloud parking schedule, where it is RUNNING from 8:00 a.m. to 7:00 p.m. on weekdays and PARKED evenings and weekends. This instance, assuming Linux pricing, costs $0.125 per hour in AWS. From November 6, 2016 until May 9, 2017, this instance ran for 111,690 minutes. This is actually about 1,862 hours, but AWS charged for 1,922 hours and it cost $240.25 in compute time.
Why the difference? ParkMyCloud has a very fast and accurate orchestration engine, but when you start and stop instances, the cloud provider and network response can vary from hour-to-hour and day-to-day, depending on their load, so occasionally things will run that extra minute. And, even though this instance is on a parking schedule, when you look at the graph, you can see that the user took manual control a few times. Stuff happens!
What would the cost have been if AWS charged the same way as Azure? It would have only cost $232.69. Well, that’s not too bad over the course of six months, unless you have 1,000 of these. Then it becomes material.
However, I wouldn’t rush to judgment on AWS. If you look at the comparable Azure VM, the standard pricing DS2 V2, also running Linux, costs $0.152/hour. So, this same instance running in Azure would have cost $290.39. Yikes!
Therefore, in my particular use case, unless the Azure cloud pricing drops to make their CPU pricing more competitive, their per minute pricing really doesn’t save money.
The ironic thing about all of this, is that once you get past all the confusing jargon and the ridiculous approaches to pricing and charging for usage, the actual cloud services themselves are much easier to use than legacy on-premise services. The public cloud services do provide much better flexibility and faster time-to-value. The cloud providers simply need to get out of their own way. Pricing is but one example where AWS and Azure need to make things a lot simpler, so that newcomers can make informed decisions.
From a pricing standpoint, AWS on-demand pricing is still more competitive than Azure cloud pricing for comparable compute engine’s, despite Azure’s more enlightened approach to charging for CPU/Hr time. That said, AWS really needs to get in-line with both Azure and Google, who charge by the minute. Nobody likes being charged extra for something they don’t use.
In the meantime, ParkMyCloud will continue to help you turn off non-production cloud resources, when you don’t need them and help save you a lot of money on your monthly cloud bills. If we make anything sound more complex than it needs to, call us out. No hiding behind jargon here.
Waste not, want not. That was one of the well-healed quips of one the United States’ Founding Fathers, Benjamin Franklin. It couldn’t be more timely advice in today’s cloud computing world – the world of cloud waste. (When he was experimenting with static electricity and lightning, I wonder if he saw the future of Cloud? :^) )
Organizations are moving to the Cloud in droves. And why not? The shift from CapEx to monthly OpEx, the elasticity, the reduced deployment times and faster time-to-market: what’s not to love?
The good news: the public cloud providers have made it easy to deploy their services. The bad news: the public cloud providers have made it easy to deploy their services…really easy.
And, experience over the past decade has shown that leads to cloud waste. What is “cloud waste” and where does it come from? What are the consequences? What can you do to reduce it?
What is Cloud Waste?
“Cloud waste” occurs when you consume more cloud resources than you actually need to run your business.
It takes several forms:
- Resources left running 24×7 in development, test, demo, and training environments where they don’t need to be running 24×7. (Thoughts of parents yelling at children to “turn the lights out” if they are the last one in a room.) I believe this is bad habit that was reinforced by the previous era of on premise data centers. The thinking: It’s a sunk cost any, why bother turning it off? Of course, it’s not a sunk cost anymore.
This manifests itself in various ways:
- Instances or VMs which are left running, chewing up $/CPU-Hr costs and network charges
- Orphaned volumes (volumes not attached to any servers), which are not being used and incurring monthly $/GB charges
- Old snapshots of those or other volumes
- Old, out-of-date machine images
However, cloud consumers are not the only ones to blame. The public cloud providers are also responsible when it comes to their PaaS (platform as a service) offerings for which there is no OFF switch (e.g., AWS’ RDS, Redshift, DynamoDB and others). If you deliver a PaaS offering, make sure it has an OFF switch.
- Resources that are larger than needed to do the job. Many developers don’t know what size instance to spin up to do their development work, so they will often spin up larger ones. (Hey, if 1 core and 4 GB of RAM is good, then 16 cores and 64 GB of RAM must be even better, right?) I think this habit also arose in the previous era of on-premise data centers: “We already paid for all this capacity anyway, so why not use it?” (Wrong again.)
This, too, rears its ugly head in several ways:
- Instances or VMs which are much larger than they need to be
- Block volumes which are larger than they need to be
- Databases which are way over-provisioned compared to what their actual IOPS or sequential throughput requirements actually are.
Who is Affected by Cloud Waste?
The consequences of cloud waste are quite apparent. It is killing everyone’s business bottom line. For consumers, it erodes their return on assets, return on equity and net revenue. All of these ultimately impact earnings per share for their investors as well.
Believe it or not, it also hurts the public cloud providers and their bottom line. Public cloud providers are most profitable when they can oversubscribe their data centers. Cloud waste forces them to build more, very expensive data centers than they need to, killing their oversubscription rates and hurting their profitability as well. This is why you see cloud providers offering certain types of cost cutting solutions. For example, AWS offers Reserved Instances, where you can pay up front for break in on-demand pricing. They also offer Spot Instances, Auto-Scaling Groups and Lambda. Azure also offers price breaks to their ELA customer and Scale Sets (the equivalent of ASGs).
How to Prevent Cloud Waste
So, what can you do to address this? Ultimately, the solution to this problem exists between your ears. Most of it is common sense: It requires rethinking… rewiring your brain to look at cloud computing in a different way. We all need to become honorary Scotsmen (short arms and deep pockets… with apologies to my Scottish friends).
- When you turn on resources in non-production environments, turn on the minimum size needed to get the job done and only grudgingly move up to the next size.
- Turn stuff off in non-production environments, when you are not using it. And for Pete’s sake, when it comes to compute time, don’t waste your time and money writing your own scripts…that just exacerbates the waste. Those DevOps people should spend that time on your bread and butter applications. Use ParkMyCloud instead! (Okay, yes, that was a shameless plug, but it is true.)
- Clean up old volumes, snapshots and machine images.
- Buy Reserved Instances for your production environments, but make sure you manage them closely, so that they actually match what your users are provisioning, otherwise you could be double paying.
- Investigate Spot fleets for your production batch workloads that run at night. It could save you a bundle.
These good habits, over time, can benefit everyone economically: Cloud consumers and cloud producers alike.