Should You Use the Cloud-Native Instance Scheduler Tools?

Should You Use the Cloud-Native Instance Scheduler Tools?

When adopting or optimizing your public cloud use, it’s important to eliminate wasted spend from idle resources – which is why you need to include an instance scheduler in your plan. An instance scheduler ensures that non-production resources – those used for development, staging, testing, and QA – are stopped when they’re not being used, so you aren’t charged for compute time you’re not actually using.

AWS, Azure, and Google Cloud each offer an instance scheduler option. Will these fit your needs – or will you need something more robust? Let’s take a look at the offerings and see the benefits and drawbacks of each.

AWS Instance Scheduler

AWS has a solution called the AWS Instance Scheduler. AWS provides a CloudFormation template that deploys all the infrastructure needed to schedule EC2 and RDS instances. This infrastructure includes DynamoDB tables, Lambda functions, and CloudWatch alarms and metrics, and relies on tagging of instances to shut down and turn on the resources.

The AWS Instance scheduler is fairly robust in that it allows you to have multiple schedules, override those schedules, connect to other AWS accounts, temporarily resize instances, and manage both EC2 instances and RDS databases.  However, that management is done exclusively through editing DynamoDB table entries, which is not the most user-friendly experience. All of those settings in DynamoDB are applied via instance tags, which is good if your organization is tag-savvy, but can be a problem if not all users have access to change tags.

If you will have multiple users adding and updating schedules, the Instance Scheduler does not provide good auditing or multi-user capabilities. You’ll want to strongly consider an alternative.

Microsoft Azure Automation

Microsoft has a feature called Azure Automation, which includes multiple solutions for VM management. One of those solutions is “Start/Stop VMs during off-hours”, which deploys runbooks, schedules, and log analytics in your Azure subscription for managing instances. Configuration is done in the runbook parameters and variables, and email notifications can be sent for each schedule.

This solution steps you through the setup for timing of start and stop, along with email configuration and the target VMs. However, multiple schedules require multiple deployments of the solution, and connecting to additional Azure subscriptions requires even more deployments. They do include the ability to order or sequence your start/stop, which can be very helpful for multi-component applications, but there’s no option for temporary overrides and no UI for self-service management. One really nice feature is the ability to recognize when instances are idle, and automatically stop them after a set time period, which the other tools don’t provide.

Google Cloud Scheduler

Google also has packaged some of their Cloud components together into a Google Cloud Scheduler. This includes usage of Google Cloud Functions for running the scripts, Google Cloud Pub/Sub messages for driving the actions, and Google Cloud Scheduler Jobs to actually kick-off the start and stop for the VMs. Unlike AWS and Azure, this requires individual setup (instead of being packaged into a deployment), but the documentation takes you step-by-step through the process.

Google Cloud Scheduler relies on instance names instead of tags by default, though the functions are all made available for you to modify as you need. The settings are all built in to those functions, which makes updating or modifying much more complicated than the other services. There’s also no real UI available, and the out-of-the-box experience is fairly limited in scope.

Cloud Native or Third Party?

Each of the instance scheduler tools provided by the cloud providers has a few limitations. One possible dealbreaker is that none of these tools are multi-cloud capable, so if your organization uses multiple public clouds then you may need to go for a third-party tool. They also don’t provide a self-service UI, built-in RBAC capabilities, Single Sign-On, or reporting capabilities. When it comes to cost, all of these tools are “free”, but you end up paying for the deployed infrastructure and services that are used, so the cost can be very hard to pin down.

We built ParkMyCloud to solve the instance scheduler problem (now with rightsizing too). Here’s how the functionality stacks up against the cloud-native options:

 

AWS Instance Scheduler Microsoft Azure Automation Google Cloud Scheduler ParkMyCloud
Virtual Machine scheduling
Database scheduling
Scale Set scheduling
Tag-based scheduling
Usage-based recommendations
Simple UI
Resize instances
Override Schedules
Reporting
Start/Stop notifications
Multi-Account
Multi-Cloud

Overall, the cloud-native instance scheduler tools can help you get started on your cost-saving journey, but may not fulfill your longer-term requirements due to their limitations.

Try ParkMyCloud with a free trial — we think you’ll find that it meets your needs in the long run.  

The One Thing You Need More than Cloud Visibility

The One Thing You Need More than Cloud Visibility

If you ask a group of CIOs or analysts for a list of priorities for companies adopting cloud infrastructure, there’s no doubt that cloud visibility would be named near the top. Insight is important for everything from security to cost management. But cloud visibility on its own is not enough, particularly as widespread cloud usage continues to mature.

Don’t Get Us Wrong: Cloud Visibility is Important

Cloud visibility is a broad term, encompassing resource consumption and spend, security and regulatory compliance, and monitoring. In fact, cloud “monitoring” is a term that typically encompasses performance monitoring and security. This is certainly important: some projections show the cloud monitoring marketing reaching $3.9 billion in 2026, so there is obviously demand for these tools.

Another aspect is cost. Cloud cost visibility is a hot topic right now, and with good reason. Public cloud providers’ bills are confusing, and you need to be able to understand what you’re being charged for. It’s also important to see where your spend is going, ideally with slice-and-dice reporting so you can analyze by user, team, project, and resource type, and ensure internal chargeback based on consumption.

However, in terms of resource and cost management, cloud visibility alone is not enough to make change.

Cloud Visibility is Useless without Action

There’s a reason that this time of year, self-help gurus encourage resolution makers to make their goals actionable. Aspirations are great. Knowledge is great. But without practical application, aspirations and knowledge won’t lead to change.

When it comes to cloud cost management, there are several capabilities that you need in order to capitalize on the insights gained through visibility. Three important ones to keep in mind are:

  1. The ability to allocate costs to teams.
  2. The ability to automate remediation.
  3. The ability to optimize spending.

The popular cloud cost management tools tend to be strong on some combination of analytics, reporting dashboards, chargeback/showback, budget allocation, governance, and recommendations (which can get quite granular in areas such as reserved instances and orphaned resources). However, they require external tools or people to act upon these recommendations and lack automation.

Actionable is Good. Optimization is Better.

As you research cloud visibility and monitoring solutions to address knowledge gaps in your organization, be sure to include a requirement to address cloud waste. Cloud optimization should require little to no manual work on your part by integrating into your cloud operations, allowing you to automatically reap the benefits and savings.

Here’s a first step on your optimization journey: pick a cloud account, plug it into ParkMyCloud, and get immediate recommendations for cost reduction. Click to apply the recommendations – or set a policy to do it automatically – and see the savings start to add up.

$14.1 Billion in Cloud Spending to be Wasted in 2019

$14.1 Billion in Cloud Spending to be Wasted in 2019

It’s that time of year: new gym memberships, fresh diet goals, and plans to reform… cloud spending?

If you’re at all involved in your organization’s public cloud infrastructure, that last one should definitely be on your to-do list. Chances are, if you’re spending money on cloud, some of that money is being wasted. For some, a lot of that money is being wasted. Here are the numbers.

Predicted Cloud Spending 2019

The latest predictions from Gartner estimate that overall IT spending will reach $3.8 trillion this year, a growth of 3.2% over IT spending in 2018.

Of this spend, public cloud spending is expected to reach $206.2 billion — of which, the fastest growing segment is Infrastructure as a Service (IaaS) which Gartner says will grow 27.6 percent in 2019 to reach $39.5 billion, up from $31 billion in 2018.

Now we can subdivide the public cloud spend number further to look just at compute resources — typically ⅔ of cloud spend is on compute, or about $26.3 billion. This segment of spend is especially vulnerable to waste, particularly from idle resources and oversized resources.

Wasted Cloud Spending from Idle Resources

Let’s first take a look at idle resources — resources that are being paid for by the hour or minute, but are not actually being used. Typically, this kind of waste occurs in non-production environments – that is, those used for development, testing, staging, and QA. About 44% of compute spend is on non-production resources (that’s our number).

Most non-production resources are only used during a 40-hour work week, and do not need to run 24/7. That means that for the other 128 hours of the week (76%), the resources sit idle, but are still paid for.

So what we get is:

$26.3 billion in compute spend * 0.44 non-production * 0.76 of week idle = $8.8 billion wasted on idle cloud resources

Wasted Cloud Spending from Oversized Resources

The other source of wasted cloud spend is oversized infrastructure — that is, paying for resources at a larger capacity than needed.

RightScale found that 40% of instances were sized at least one size larger than needed for their workloads. Just by reducing an instance by one size, the cost is reduced by 50%. Downsizing by two sizes saves 75%.

The data we see in our users’ infrastructure in the ParkMyCloud confirms this, and in fact we find that it may even be a conservative estimate. Infrastructure managed in our platform has an average CPU utilization of 4.9%. Of course, this doesn’t take memory into account, and could be skewed by the fact that resources managed in ParkMyCloud are more commonly for non-production resources. However, it still paints a picture of gross underutilization, ripe for rightsizing and optimization.

If we take a conservative estimate of 40% of resources oversized by just one size, we find the following:

$26.3 billion in compute spend * 0.4 oversized * 0.5 overspend per oversized resource = $5.3 billion wasted on oversized resources

Total Cloud Spending to be Wasted in 2019

Between idle resources and overprovisioning, wasted cloud spend will exceed $14.1 billion in 2019.


In fact, this estimation of wasted cloud spend is probably low. This calculation doesn’t even account for waste accumulated through orphaned resources, suboptimal pricing options, misuse of reserved instances, and more.

Join ParkMyCloud and become a cloud waste killer

End the Waste

It’s time to fight this cloud waste. That’s what we’re all about at ParkMyCloud — eliminating wasted cloud spending through scheduling, rightsizing, and optimization.

Ready to join us and become a cloud waste killer? Let’s do it.

10,000 Years of Data Says Your Server Sizing is Wrong.

10,000 Years of Data Says Your Server Sizing is Wrong.

Serving sizing in the cloud can be tricky. Unless you are about to do some massive high-performance computing project, super-sizing your cloud virtual machines/instances is probably not what you are thinking about when you log in to you favorite cloud service provider.  But from looking at customer data within our system, it certainly does look like a lot of folks are walking up to to their neighborhood cloud provider and saying exactly that: Super Size Me!

Like at a fast-food place, buying in the super size means paying extra costs…and when you are looking for ways to save money on cloud costs, whether for production or non-production resources, the first place to look is at idle and underutilized resources.

Within the ParkMyCloud SaaS platform, we have collected bazillions (scientific term) of samples of performance data for tens of thousands of virtual machines, across hundreds of customers, and the average of all “Average CPU” readings is an amazing (even to us) 4.9%.  When you consider that many of our customer are already addressing underutilization by stopping or “parking” their instances when they are not being used, one can easily conclude that the server sizing is out of control and instances are tremendously overbuilt. In other words, they are much more powerful than they need to be…and thus much more expensive than they need to be.  As cool as “super sizing” sounds, the real solution is in rightsizing, and ensuring the instance size and type are better tailored to the actual load.

Size, Damned Size, and Statistics

Before we start talking about what is involved in rightsizing, let’s look at a few more statistics, just because the numbers are pretty cool.  Looking at utilization data from about 88.9 million instance-hours on AWS – that’s 10,148 years –  we find the following:

So, what is this telling us about server sizing?  The percentiles alone tell us that more than 95% of our samples are operating at less than 50% Average CPU – which means if we cut the number of CPUs in half for most of our instances, we would probably still be able to carry our workload.  The 95th percentile for Peak CPU is 58%, so if we cut all of those CPUs in half we would either have to be OK with a small degradation in performance, or maybe we select an instance to avoid exceeding 99% peak CPU (which happens around the 93rd percentile – still a pretty massive number).

Looking down at the 75th and 50th percentiles we see instances that could possibly benefit from multiple steps down!  As shown in the next section, one step down can save you 50% of the cost for an instance. Two steps down can save you 75%!

Before making an actual server sizing change, this data would need to be further analyzed on an instance by instance basis – it may be that many of these instances have bursty behavior, where their CPUs are more highly utilized for short periods of time, and idle all the rest of the time.  Such an instance would be better of being parked or stopped for most of the time, and only started up when needed. Or…depending on the duration and magnitude of the burst, might be better off moving to the AWS T instance family, which accumulates credits for bursts of CPU, and is less expensive than the M family, built for a more continuous performance duty cycle.  Also – as discussed below – complete rightsizing would entail looking at some other utilization stats as well, like memory, network, etc.

Hungry Size

On every cloud provider there is a clear progression of server sizing and prices within any given instance family.  The next size up from where you are is usually twice the CPUs and twice the memory, and as might be expected, twice the price.

Here is a small sample of AWS prices in us-east-1 (N. Virginia) to show you what I mean:

Double the memory and/or double the CPU…and double the price.

Pulling the Wool Over Your Size?

It is important to note that there is more to instance utilization than just the CPU stats.  There are a number of applications with low-CPU but high network, memory, disk utilization, or database IOPs, and so a complete set of stats are needed before making a rightsizing decision.

This can be where rightsizing across instance families makes sense.  

On AWS, some of the most commonly used instance types are the T and M general purpose families.  Many production applications start out on the M family, as it has a good balance of CPU and memory.  Let’s look at the m5.4xlarge as a specific example, shown in the middle row below.

  • If you find that such an instance was showing good utilization of its CPU, maybe with an Average CPU of 75% and Peak CPU of 95%, but the memory was extremely underutilized, maybe only consuming 20%, we may want to move to more of a compute-optimized instance family.  From the table below, we can see we could move over to a c5.4xlarge, keeping the same number of CPUs, but cutting the RAM in half, saving about 11% of our costs.
  • On the other hand, if you find the CPU was significantly underutilized, for example showing an Average CPU of 30% and Peak of 45%, but memory was 85% utilized, we may be better off on a memory-optimized instance family.  From the table below, we can move to an r5.2xlarge instance cutting the vCPUs in half, and keeping the same amount of RAM, and saving about 34% of the costs.

Within AWS there are additional considerations on the network side.  As shown here, available network performance follows the instance size and type.  You may find yourself in a situation where memory and CPU are non-issues, but high network bandwidth is critical, and deliberately super-size an instance.  Even in this case, though, you should think about if there is a way to split your workload into multiple smaller instances (and thus multiple network streams) that are less expensive than a beastly machine selected solely on the basis of network performance.

You may also need to consider availability when determining your server sizing. For example, if you need to run in a high-availability mode using an autoscaling group you may be running two instances, either one of which can handle your full load, but both are only 50% active at any given time.  As long as they are only 50% active that is fine – but you may want to consider if maybe two instances at half the size would be OK, and then address a surge in load by scaling-up the autoscaling group.

Keep Your Size on the Prize

For full cost optimization for your virtual machines, you need to consider appropriate resource scheduling, server sizing, and sustained usage.

  • Rightsize instances wherever possible.  You can easily save 50% just by going down one size tier – and this applies to production resources as well as development and test systems!
  • Modernize your instance types.  This is similar to rightsizing, in that you are changing to the same instance type in a newer generation of the same family, where cloud provider efficiency improvements mean lower costs.  For example, moving an application from an m3.xlarge to an m5.xlarge can save 28%!
  • Park/stop instances when they are not in use.  You can save 65% of the cost of a development or test virtual machine by just having it on 12 hours per day on weekdays!
  • For systems that must be up continually, (and once you have settled on the correct size instance) consider purchasing reserved instances, which can save 54-75% off the regular cost.  If you would like a review of your resource usage to see where you can best utilized reserved instances, please let us know

A Site for Sore Size

Last week, ParkMyCloud released the ability to rightsize and modernize instances. This release helps you identify the virtual machine and database instances that are not fully utilized or on an older family, making smart recommendations for better server sizing and/or family selection for the level of utilization, and then letting you execute the rightsize action. We will also be adding a feature for scheduled rightsizing, allowing you to maintain instance continuity, but reducing its size during periods of lower utilization.

Get started today!

Predictive Scaling for EC2 & More AWS Announcements to Be Thankful For

Predictive Scaling for EC2 & More AWS Announcements to Be Thankful For

Amazon Web Services (AWS) has been pumping out announcements in the lead up to their AWS re:Invent conference next week – which is predicted to exceed 50,000 attendees this year. (See you there?) We’re excited to see what big news the cloud giant has for us next week!

In the meantime, here are three AWS announcements from the last few days that will interest anyone who’s concerned with cloud costs.

Predictive Scaling for EC2

AWS’s new predictive scaling for EC2 is a new and improved way to use Auto Scaling to optimize costs. Typically when you set up an Auto Scaling Group, you need to set scaling policies, such as rules for launching instances based on changes in capacity. Given the complexity of these requirements, some users we’ve talked to forgo them altogether, instead using Auto Scaling simply for instance health checks and replacements.

With predictive scaling for EC2, there is very little the user needs to set up. You will simply set up the group, and machine learning models will analyze daily and weekly scaling patterns to predictively scale. You’ll have choices to optimize for availability, or optimize for cost – making it easy to use Auto Scaling to save money.Of course, sometimes you’ll know better than the machine – for example, development and test instances may require on/off or scale-up/scale-down schedules based on when users need them, which won’t always be consistent. For that, use ParkMyCloud to schedule auto scaling groups to turn off or change scaling when you know they will have little or no utilization.

AWS Cost Explorer Forecasting

AWS has announced an improved forecasting engine for the AWS Cost Explorer. It now breaks down historical data based on charge type – distinguishing between On Demand and Reserved Instance charges – and applies machine learning to predict future spend.

They have extended the prediction range from three months to twelve months, which will certainly be of use for budget forecasting. It’s also accessible via the API – we see this being used to show budget predictions on team dashboards in your office, among other applications.

CloudWatch Automatic Dashboards

The third announcement from this week that we’re looking forward to using ourselves here at ParkMyCloud is the new series of CloudWatch Automatic Dashboards. This will make it remarkably easier to navigate through your CloudWatch metrics and monitor costs and performance, and help potential issues break through the noise.

Thanks, AWS!

Now, play around with AWS’s new predictive scaling for EC2, then take some time to relax.

Happy Thanksgiving! (And to our non-U.S. readers, enjoy your Thursday!)

Interview: How Dealer-FX Saves Sysadmins’ Sanity with Automated AWS Management

Interview: How Dealer-FX Saves Sysadmins’ Sanity with Automated AWS Management

We chatted with Steve Scott, Cloud Infrastructure Manager at Dealer-FX about how they use ParkMyCloud’s automated AWS management to save significant amounts of time and sanity.

Tell us about what Dealer-FX does, and what your team does within the company. 

Dealer-FX provides software solutions to dealerships. Our software is used at the service advisor level – the people that you see when you take your car in. They’re usually behind a monitor that you never get to see and they’re typing away all things associated with your car information, VIN, scheduling information, recall information, etc. Our software controls all of that across many different OEMs, which are the manufacturers, and thousands of dealerships across Canada and the US.

I am the manager of cloud operations here and my team is strictly at the cloud management level, fully invested in AWS. We started using AWS through one of the OEMs we work with and that’s how we got into the cloud a few years ago.

Can you describe more about how you’re using AWS?

We use AWS for all of our testing, development, staging, and production environments. We use it all, from the API level to the functional level with virtual servers and virtual environments – everything we have that’s customer facing resides with AWS today.

Before you started using ParkMyCloud, what challenges did you face in your use of AWS?

One of the biggest things is that we use a lot of servers. When we had somewhere around 400 servers, we started to look into scheduling, both for server maintenance and for things that were only required to be online during certain periods of time. There was no inherent AWS service that was easily configurable for the same function that ParkMyCloud offered.

We’ve been using ParkMyCloud for a few years for automated AWS management to schedule resources on and off. Our code is in a period of transition from legacy to more cloud native, so we don’t have the resources to use some of the more cost-effective offerings from AWS like reserved instances, but we’re getting there. ParkMyCloud is certainly helping us, as we rely on it for scheduling server maintenance, staging, testing, and development environments.

How did you find ParkMyCloud?

I was bugging our AWS rep for some type of scheduling functionality. They could do it, but it would have taken a lot of work, and it was kind of iffy whether or not it would work for us. He directed me to ParkMyCloud.

Do you see yourselves using more cost efficient resources like Reserved Instances in the future?

I wouldn’t say that exactly. One thing we will look into is more autoscaling functionality. We do all of that manually, except ParkMyCloud sets up the scheduling and does that beautifully. We currently use ParkMyCloud scheduling because we have a predictable workload. For example, we might have 8 servers online between a certain number of hours, and after a period of time bring it down to 7, then 6, and so on depending on the environment, and then bring them back up again the next day.

In the future, as we build new apps, we’ll still be utilizing ParkMyCloud as we always have. We have RDS functionality on the horizon, which we know we can also schedule with ParkMyCloud’s automated AWS management.

We also use ParkMyCloud for planning on/off times for our staging environments which are on-demand. We haven’t taken advantage of all the features yet, but we use ParkMyCloud for very strategic reasons, in very strategic places, and it works phenomenally.

How would you describe the benefits that Dealer-FX has gotten from ParkMyCloud?

From the sysadmin perspective, the main reason we wanted ParkMyCloud was the sheer ease of turning servers on and off. Before, we needed to wake up at certain times and do it ourselves, manually turning off and on hundreds of servers. Having to do those things is no one’s cup of tea!

Who was responsible for doing that previously?

It was 2-3 people on my team.

It sounds like that took a lot of time.

It was a significant amount of time, and due to the high volume of deployments and growth over time, it become more and more terrible to administrate. ParkMyCloud is saving us time and sanity all over the place, and it just works. We’ve never had an issue with it. The design is ultimately “set it and forget it.”

Any other feedback? 

I know there’s lots of things on the horizon that we’ll be using as needed, and I’d be happy to receive updates of new features. Any new tools, extensions, or anything you add I would love to hear about.

We’ll be sharing rightsizing shortly, so look forward to that next! We appreciate your time and feedback.

Sounds great! Thanks!