Our CTO, Dale, presented the other day on 5 Ways to Control Your AWS Spend (or, How to Make Your CFO Happy). Check out the recording below, and comment if you have any questions we didn’t already address!
0:29 Dale’s intro to the webinar
1:11 AWS Elastic Compute Cloud (EC2)
- AWS has grown immensely, and cracked a $10 billion/year run rate, growing 70% year-over-year, which is staggering when you think about it for a company of their size1They recently doubled their compute power, and now boast 10x more compute power than their nearest 14 competitors combined.
- EC2 makes up almost 70% of AWS’s revenue, or almost $7 billion, and is growing at 88% year-over-year growth rate. A little over half of that compute power supports nonproduction workloads, such as development, testing, QA, staging, training and sandbox environments and comprises about $3.5 B, or almost 1/3 of AWS’s revenue.
- Many of these environments run 24 x 7, even though they are not being used. It would be like you leaving your car running all the time, even when it’s at home in your garage, which is insane. Are there services which AWS provides to help you reduce your spend? Are there other ways outside of AWS?
- That’s the focus of this webinar: how can we unlock savings from these non-production environments?
2:34 AWS Has Reduced Prices to Drive Demand
- EC2 has come a long way, since it was first introduced in 2006. There was one Purchasing Option (Pay as you go, or On-Demand), andthere were only a couple of Instance Types and 1 Region. Interestingly, most of AWS’ customers were developers and most of the workloads were non-production.
- Today, looking at the current generation of EC2 instances, AWS has 40 Instance Types running in 13 Regions. They actually have 4 additional regions that are about to become generally available. And, as you saw, about half of their workloads are production environments, which is pretty amazing in that short amount of time.
- AWS has done a great job of making their services, especially EC2, easy to adopt. Perhaps a little too good! As customer adoption increased, so did their “monthly sticker shock”.
- AWS realized early on that it’s important to keep their customers happy and their services “sticky”. So, besides a number of price cuts on their On-Demand instances, AWS introduced Reserved Instances, Spot Instances and Auto Scaling Groups to help their customers save money. These moves, along with the wealth of new services every year, has rocketed EC2 to its current levels.
- AWS also has a vested interest in having their customers save money. How? By reducing waste they [AWS] avoid building new data centers, allowing them to oversubscribe the infrastructure they currently have in place, making them more profitable.
- Reserved Instances, Spot Instances and Auto Scaling Groups will definitely save money in both production and non-production environments, but like many things there are tradeoffs to consider.
- Each of these Instance Purchasing Options and Auto Scaling Groups could easily fill their own series of seminars. Time will only permit me to focus in on these options at a high level, as they relate to non-production cost savings.
- So, with that backdrop, let’s look at our first way to save money: EC2 Reserved Instances.
4:36 How Reserved Instances Work
- A Reserved Instance is a contract or a commitment – where you agree pay now to reserve capacity for a set period of time: 1 year or 3 years.
- With that commitment, and your agreement to pay that contract upfront, partially upfront or monthly, AWS agrees to give you a discount. In general, that discount, as you will see on the next slide, increases the more you pay upfront and the longer the commitment time period.
- There are also some caveats you must be aware of, if you are not already:
- It is very much a use it or lose it proposition. It’s like those annoying gift cards that have a hidden monthly fee and have a zero balance by the time you get around to using them. If you’re not careful, Reserved Instances can be that way if you don’t manage them properly.
- Managing these contracts can be very complex. In fact, a whole industry of analytics applications has grown up around helping you track Reserved Instances.
- These contracts are specific to a Region, Availability Zone, Instance Type (e.g., m4.large), Platform Type (e.g., Linux or Windows) and Tenancy. As you launch instances, AWS automatically (and randomly) attempts to match what is launched to the contracts you have in place.
- If there is a match, they apply the benefit. If there is no match, AWS decrements the contract amount and your ROI decreases.
- The nightmare scenario is when your users are launching the types of instances for which you DON’T have contracts. You end up essentially paying twice: once for the RI’s you paid for and then for the new instances.
- That said, How much can you save with Reserved Instances?
6:42 Reserved Instance Savings
- I have two graphs here. These are for an m4.large Instance Type, running Linux, in the US-East-1 Region. The graph on the left is for a 1-year commitment; the graph on the right is for a 3-year commitment. Notice that there is no green bar for the 3-year term, because AWS only allows the No Upfront option for the 1-year term.
- The purple bar shows the On-Demand pricing in both.
- Notice that for the 1-year commitment, you’ll save between 31% and 43% in this particular case, and that the savings improves as you pay more upfront.
- For example, for the longer 3-year commitment, the savings improves to between 60% and 64%, which is not bad.
- However, what happens if AWS cuts their On-Demand price as they have done in the past? For example, in 2014, AWS dropped their price by 30%.
- If that happens, then the savings you hoped to achieve evaporates. The longer the commitment, the greater the chance that this will happen, which is why more companies make the shorter 1 year commitment and settle for the lower savings.
- That said, as long as you are aware of these caveats, the best use for Reserved Instances is in Production.
- However, we can do better than that for non-production!
8:00 2. How Spot Instances Work
- Here’s how they work:
- AWS runs a “spot market” on spare EC2 capacity. Anyone familiar with derivatives markets or energy trading should be familiar with the concept.
- This spare capacity is bundled in Spot Pools, based on Instance Type, Platform and Availability Zone.
- You place a bid. If there is spare capacity and your bid price is above the current spot price, your request is fulfilled and your instances start. They will keep running as long as there is capacity and your bid price stays above the market price.
- As we will see on the next slide, you can reap some great savings. However, there are risks involved.
- If there is no spare capacity for the instance type you want, you may have to way a very long time before your request is filled.
- As soon as the market price rises above your bid price, or if they run out of capacity, then your instances terminate abruptly after a 2 minute warning.
- Of course, building applications that can withstand instance termination is the best way to mitigate this. However, the spot market has led to all sorts of creative mitigation strategies, including:
- Using a mixture of on-demand and spot instances with persistent requests
- Avoiding the latest and greatest EC2 types and settling for older types, with more stable pricing
- Being flexible in the Instances Types you use. In fact, AWS has come out with Spot Fleets: the ability launch a mix of different Spot Instances in one request
9:46 Spot Instance Savings
- What are the potential savings with Spot?
- Here is an example of an m4.large Instance Type running in Northern Virginia, running Linux.
- You are looking at a 3 month price history, where the price has been quite low about $0.014 per hour for months.
- If you had been willing to pay $0.03 per hour, your instance would have run for over 3 months.
- You are billed on the Spot Price, not the Bid Price, so, mitigation strategies aside, you would have saved about 89%.
- In fact, that’s pretty typical. Savings in the 70% to 90% range are not uncommon for Spot Instances.
10:27 Where Are Spot Instance Being Used?
- So, even with the risks mentioned, Spot instances are used in both production and non-production.
- They are NOT generally used in interactive production workloads, such as web applications, nor are they used in real-time production workloads.
- However, they are used a lot in scientific research such as high-performance scientific computing, analytics batch jobs and in batch video processing.
- In non-production, Spot Instances are used for performance and scalability testing.
11:04 How Auto Scaling Groups Work
- The third way you can save money in AWS is by leveraging Auto Scaling Groups.
- An auto scaling group allows you to scale the number of instances up or down automatically:
- To the right is a simple web application, leveraging several web servers, sitting behind an elastic load balancer
- You provide a launch configuration (one or more AMIs)
- You set the minimum, desired and maximum number of instances. In my example, I set the minimum to 2 nodes, the desired to 4 and the max to 10.
- You provide the CloudWatch metrics to use (e.g., CPU utilization, disk I/O, etc.) and the thresholds you want, and use those to cause the system scales up and down automatically, by either the number of nodes or the percent capacity you swant
- The cool thing about Auto Scaling Groups is that they can leverage any or all of the Purchasing Options discussed, making them quite flexible – On Demand, Reserved Instances, and Spot.
- They can quickly scale-up to meet demand, for example if you’re a web commerce company and Christmas hits, you can scale up to hit the Christmas rush, then when it’s over, quickly scale down again to save money.
- They can be used to providefault tolerance for applications.
- While they can help save money in both production and non-production, the amount of savings is rather hard to pin down, as it depends heavily on Instances Types used, whether they are On-Demand, Reserved Instances or Spot, and what the scaling policies/rules are.
- That said, for Production, Auto Scaling Groups + On-Demand (backed by Reserved Instances) is probably the best bet.
- For non-production, particularly if you’re doing performance and scalability testing, Auto-Scaling Groups and spot instances are the way to go.
12:58 Scheduling “On/Off” Times with Scripting
- We talked about the fact that people often leave non-production environments running, even when they are not being used.
- The best way to save money is to simply turn this stuff off when not in use. Which is our fourth approach – which is easier said than done.
- Why? AWS does not offer a “parked” state that’s off by default, and from our discussions with them, they have no plans to do so.
- Despite the variety of Cloud Analytics platforms out there telling people to stop to doing that, those platforms don’t actually do anything to help them with those recommendations, and they can be costly if you don’t already own one.
- So, what do people do when there is a lack of viable options?
- When the going gets tough, the tough start scripting!
13:56 The Problem with Scripting
- I used to be one of those people – as a recovering command line interface guy, who has done his fare share of scripting, I get the appeal:
- You’re in control of your own destiny
- It’s an opportunity to get your hands dirty
- At the end of the process you have the satisfaction of actually building something from start to finish that actually can provide some cost-savings.
- However, scripting is just NOT cost-effective. Why?
- Building it is only half the battle. You now have to maintain those scripts as the environment changes, even with the help of something like Chef or Puppet there’s added cost
- If you don’t keep up with your environment, then you miss stuff you could have turned off. You then miss out on a lot of savings.
- Then, your boss taps you on the shoulder and wants to know why you are wasting your time on scripting up this stuff, when you are supposed to be working on more mission critical work, like the applications that actually earn money for your company. So, you have to come up with a way to quantify the savings, which takes time.
- Heaven forbid that he likes your report, because now he is going to want to see a report every week. Who knows how long that will take.
- Are you really going to have time to keep up with the changes in AWS, or add other service providers, like if your company decides to expand beyond AWS to something like Azure or Google?
- And, what is the opportunity cost of not having you work on the company’s main mission? That probably makes these other costs pale in comparison!
15:38 ParkMyCloud: Purpose-Built to Save Money
- So, let’s talk about the fifth and better way to save money in non-production environments, let’s talk about my company and our application, ParkMyCloud.
- ParkMyCloud is purpose-built to do one thing really well: Schedule on/off times for EC2 instances WITHOUT SCRIPTING and without being a DevOps expert. We call that “Instance Parking”. It’s like NEST for the Cloud, which is what we see ourselves becoming for public cloud
- Think of “Parked” as a new instance “state” between Running and Stopped, and it’s under scheduled control.
- Depending on the instance and schedule used, ParkMyCloud can achieve savings of between 50% and 73%, making it better than Reserved Instances for non-production, without an annual commitment or an upfront payment
- It provides almost the savings of Spot instances without the risk of abrupt instance termination
- Let’s look at this in comparison with Reserved Instances a little more closely, to show you why I think ParkMyCloud is better for non-production environments.
16:34 Reserved Instance Savings
- Here is that graph I showed you before, except now we have added the ParkMyCloud savings.
- Here we used a ParkMyCloud schedule where instances were ON 12 hours & OFF 12 hours on weekdays and OFF on weekends. When you do the math, that results in a downtime of 64%.
- To achieve that level of savings with Reserved Instances, you would have had to commit to a 3-year contract and pay the whole thing upfront.
- Also, remember what happens when AWS cuts their On-Demand prices? Your Reserved Instance savings is decimated.
- Not so with our application: Since there is no annual commitment nor upfront payment, you would just ride the new price curve, which would be 64% below the new On-Demand price.
- And unlike Reserved Instance management, ParkMyCloud is simple to use.
17:25 Create Schedules
- You create a parking schedule – once again I mentioned you can do this without scripting, you just click on the on/off times, set the proper timezone, and give it a name and description, save it …
17:41 Apply them to Non-Production Instances
- Then attach it to one or more of your non-production instances in your dashboard.
- In fact, we even recommend instances to park, based on criteria you provide.
17:53 Reap the Savings
- Once the parking schedules are applied, we predict your savings for the next 30 days.
- Leave the schedules in place and we’ll also show you your actual savings month-to-date.
18:08 Without Breaking the Bank
- We do this for about $3 per instance per month.
- For the folks who script, do you think you can maintain your scripts that inexpensively? From everyone we’ve talked to, from our large customers, the answer is no.
18:23 ParkMyCloud Product Demo
- So I come into my environment here. The first thing, if you’ve already parked something, you’ll see what your savings is, projected over the next 30 days.
- This is an environment where I’ve got 126 instances. There are a couple of things I want you to notice right away: what you’re looking at is an environment that’s not just one AWS account and not just one region. You’re actually looking at 126 instances spread out over 4 AWS accounts, spread out over all the regions, and you get all that in one view. That’s one thing that’s different between ParkMyCloud and the AWS console, that single-pane-of-glass view.
- The other thing you’ll notice is that these things are organized in teams. I have 4 teams in here. You have an unlimited number of teams you can add to the platform. We use teams to organize users and instances.
19:54 Demo: Keyword Recommendations
- The other thing I want you to notice is that we’re showing you live, the 30-day project savings – that’s based on a schedule configuration we currently have in place. Here’s an example with the demo team. I’ve got some keywords here recommending that I can park some things, so let me go ahead and show you this. We give you a series of keywords to help you determine candidate instances when you first log in. We give you 6-7 keywords like dev, test, QA, staging, training, demo, things like that. I’ve added a “parkmycloud-yes” because I know my instances have that, and you an change these, and delete our keywords if you don’t want them, and add your own.
20:44 Demo: Creating and Attaching Parking Schedules
- In my particular case, I’ve got 96 instances here recommended for parking, and I know right away there’s a bunch of demo instances. So I’m going to go ahead and select the demo team, and take all these instances, and put them on a schedule using a bulk action. We give you a few schedules in the platform, and we allow you to add your own. In this particular case, suppose I don’t like any of these schedules. I’ll call this the acme webinar. I’ll select the time zone, and let’s say I want it on 7am – 7 pm on weekdays. I’ll come down here and select what I want off and what I want on, in this case I want it to start at 7 am and go off at 7 pm, and off on weekends. I can go ahead and create and attach that.
- Immediately, you can see my forward prediction of savings has gone up commensurate with that.
22:02 Demo: Snoozing Schedules to Work Outside Normal Hours
- Now suppose you have an instance that’s parked in here, and I come in on the weekend and I need to do work. I can log in to the application here, click on the toggle button. It will warn me that there’s a schedule attached and give me the option to do something, which is snooze it. I can snooze the schedule, and pick a set amount of time until this time and date. In this case, I want to run it for an hour, select that, hit okay. It’s snoozed the schedule to move it out of the way, and now it’s going out and starting the instance so I can do work. The cool thing about it is, if you then are done with your work, you can just walk away if you want to. The snooze will expire, and the schedule will kick in and park it again.
- One of the unintended consequences is that some of our large customers have decided to use an “off” calendar – something that turns their non-production environments from the “on” default state to the “off’ default state 24×7. They’ve told their developers to just log in to the platform and snooze the schedule for the amount of time they’re going to work. As a result, they’re maximizing their savings. We thought that was so cool that we actually added that “always off” schedule as one of the default ones.
- You can see here in the schedule menu, or when you look in the tool tip for the schedule, you’ll see if it actually is snoozed, it will tell you when it will expire.
24:07 Multiple Teams Demo
- We handle multiple teams but also multiple users and multiple accounts. That’s a big benefit, because we can use that construct to hide instances from people so they only see what they need to see. We can also add multiple accounts – here I have 4 AWS accounts – and at any time, do a manual ingest. We do an automatic ingest once every 6 hours.
- Here’s an example. I can take this instance, select it, and move it to any of the other teams if I want to. I’ll move it from the dev team to the demo team. Now any users on the demo team can see that amongst the other instances they have.
25:40 Summary – Unlocking EC2 Non-Production Savings
- In summary, we have talked about 5 ways to reduce costs in AWS, focusing on non-production environments:
- Reserved Instances, as compared to On Demand. They’re a little bit more risky, if AWS cuts your price, the savings goes away, so there’s no protection against price drops. But the savings are pretty good. Even with a 1-year contract the savings are 31-43%.
- Spot Instances can save a lot, routinely 70-90%. However, because of potentially long delays in request fulfillment, termination of instances on short notice and the need for complex mitigation strategies, these are much higher risk, but there are definite use cases in both production and non-production.
- Auto-Scaling Groups allow you to leverage all of the other instance options, allowing you to drive up availability and scalability in both production and non-production. However, the cost savings are difficult to pin down, as they are very configuration specific.
- We talked about scheduling on/off times with scripting, but suggested that approach is not cost-effective, so it is not shown here.
- We talked about ParkMyCloud, which provides better savings than Reserved Instances in non-production environments, without the need for annual commitments or upfront payments, like Reserved Instances; and without risks of Spot. The downside is that it is limited to non-production, on-demand instances. It cannot park auto scaling groups (yet).
- I hope you found the information presented here to be of use.
- If you haven’t tried ParkMyCloud, we offer a no-strings-attached, 30-day free trial.
28:23 Estimate Your Savings Beforehand
- We also offer a Parking Savings Calculator, which you can use to estimate the savings for your environment.
- Question: How easy is the configuration for ParkMyCloud? Is this something I need to install in our AWS server?
- Answer: ParkMyCloud is a SaaS application that runs inside of AWS, and you don’t install anything. When you start the 30-day trial, you just enter contact information, enter an AWS credential – either IAM user or IAM role – and you’re up and running. Customers can be up and running and parking within 7 minutes.
- Question: If I were using ParkMyCloud, how would I make sure other people on my team don’t park instances that I don’t want them to park?
- Answer: Here’s an example. I have 4 teams here. If I looked at the environment on this Sandbox team, I would see that when Jon logs in, he would see just a few instances that he’s been allowed to see. You can use teams to hide instances from people.
- Question: Can I override a schedule? For example, if ParkMyCloud shut down a server but my team is working on a release over the weekend, can I override the schedule?
- Answer: Yes, if there is a parking schedule on the system right now and it’s running, and you wanted to prevent the schedule from shutting down the server, you can use the snooze button to delay the schedule action for a certain period of time. The instance stays in whatever state it was in for that period of time.
- Question: What reporting do you offer? For example, if I want to show a proof of savings we’ve achieved with ParkMyCloud and I want to make sure I know which of my team members are parking which instances, what do you provide?
- Answer: We allow you to download a few Excel spreadsheets that show reports of the savings. We have 4 reports in the system with customizable start and end dates. You can get a detailed cost by resource, cost summary by team, cost summary by AWS account/credential, and also a roster of your team members.
- Question: You mentioned that you’re adding parking for Auto Scaling Groups. What will you do to Auto Scaling Groups?
- Answer: We’re rolling out the ability to park Auto Scaling groups. You’ll be able to click on each group and see the instances running within the group. You can look at tags on the group and information about the group. There will be controls at the group level that you can control for individual instances – parking, toggle on and off, and snooze the schedule, all at the group level.