How to Use 9 Cloud DevOps Best Practices For Cost Control

How to Use 9 Cloud DevOps Best Practices For Cost Control

Any organization with a functioning cloud DevOps practice will have some common core tenants. While those tenants are frequently applied to things like code delivery and security, a company that fails to apply those tenants to cost control are destined to have a runaway cloud bill (or at least have a series of upcoming meetings with the CFO). Here are some of those tenants, and how they apply to cost control:

1. Leadership

One common excuse for wasted cloud spend is “well that other group has cloud waste too!” By aggressively targeting and eliminating cloud waste, you can set the tone for cost control within your team, which will spread throughout the rest of the organization. This also helps to get everyone thinking about the business, even if it doesn’t seem like wasting a few bucks here or there really matters (hint: it does).

2. Collaborative Culture

By tearing down silos and sharing ideas and services, cost control can be a normal part of DevOps cloud practice instead of a forced decree that no one wants to take part in. Writing a script that is more generally applicable, or finding a tool that others can be invited to will cause others to save money and join in. You may also get ideas from others that you never thought of, without having to waste time or replicate work.

3. Design for DevOps

Having cost control as a central priority within your team means that you end up building it in to your processes and software as you go. Attempting to control costs after-the-fact can be tough and can cause rewrites or rolling back instead of pressing forward. Also, tacked-on cost control is often less effective and saves less money than starting with it.

4. Continuous Integration

Integrating ideas and code from multiple teams with multiple codebases and processes can be daunting, which is why continually integrating as new commits happen is such a big step forward. Along the same lines, continually controlling costs during the integration phase means you can optimize your cloud spend by sharing resources, slimming down those resources, and shutting down resources until they are needed by the integration.

5. Continuous Testing

Continuous testing of software helps find bugs quickly and while developers are still working on those systems. Cost control during the testing phase can take multiple forms, including controlling the costs of those test servers, or doing continuous testing of the cost models and cost reduction strategies. New scripts and tools that are being used for cost control can also be tested during this phase.

6. Continuous Monitoring

Monitoring and reporting, like cost control, are often haphazardly tacked on to a software project instead of being a core component. For a lot of organizations, this means that costs aren’t actively being monitored and reported, which is what causes yelling from the Finance team when that cloud bill comes. By making everyone aware of how costs are trending and noting when huge spikes occur, you can keep those bills in check and help save yourself from those dreaded finance meetings.

7. Continuous Security

Cloud cost control can contribute to better security practices. For example, shutting down Virtual Machines when they aren’t in use decreases the number of entry points for would-be hackers, and helps mitigate various attack strategies. Reducing your total number of virtual machines also makes it easier for your security teams to harden and monitor the machines that exist.

8. Elastic Infrastructure

Auto-scaling resources are usually implemented by making services scale up automatically, while the “scaling down” part is an afterthought. It can be admittedly tricky to drain existing users and processes from under-utilized resources, but having lots of systems with low load is the leading cause of cloud waste. Additionally, having different scale patterns based on time of day, day of the week, and business need can be implemented, but requires thought and effort into this type of cost control.

9. Continuous Delivery/Deployment

Deploying your completed code to production can be exciting and terrifying at the same time. One factor that you need to consider is the size and cost of those production resources.  Cost savings for those resources is usually different from the dev/test/QA resources, as they typically need to be on 24/7 and can’t have high latency or long spin-up times. However, there are some cost control measures, like pre-paying for instances or having accurate usage patterns for your elastic environments, that should be considered by your production teams.

Full Cloud DevOps Cost Control

As you can see, there are a lot of paths to lowering your cloud bill by using some common cloud DevOps tenants. By working these ideas into your teams and weaving it throughout your processes, you can save money and help lead others to do the same. Controlling these costs can lead to less headaches, more time, and more money for future projects, which is what we’re all aiming to achieve with DevOps.

5 Things to Look For in an IaaS Cost Management Tool

5 Things to Look For in an IaaS Cost Management Tool

With $39.5 billion projected to be spent on Infrastructure as a Service (IaaS) this year, many cloud users will find it’s time to optimize spend with an IaaS cost management tool.  With so many different options to choose from – picking the right one can be overwhelming. While evaluating your options, you should have an idea of what would be most compatible for you and your organization.  In order to cut cloud costs and waste, make sure you look for these 5 things while picking an IaaS cost management tool.

1. UI is Easy to Understand

When adopting a new piece of software, you should not be stressed out trying to figure out how it works. It should be designed around the end user in order to give them an easy user experience so they can accomplish tasks quickly. Many native tools required by the cloud providers require specialized coding knowledge that the IaaS users in your organization may not have. Whether it is useful or not depends on how simple and easy to follow it is so that every cloud user can contribute to the task of managing IaaS cost.  

2.  Improved Visibility

It is essential that you have all of your information available to you in one place – this helps make sure you didn’t overlook anything.  Seeing all your resources on one screen, all at once, will allow you to pinpoint strengths/weaknesses you need to focus on to that will help manage your IaaS cost. Of course, cost management includes more than visibility, which leads to the next points.

3. Provides Reporting

You want your organization to be well informed, so it is important that any IaaS cost management tool you adopt includes the ability to generate cost and savings reports. You can’t change something if you don’t know what it means, the data gathered – past and present – will help you understand the past and make a forecast for the future. These reports will give you the information you need to make quick, informed decisions. Preferably, they contain automated recommendations as well based on your resource utilization history and patterns. Additionally, it’s important for any cost optimization tool to report on the amount of money you have saved using it, so you can justify the cost of the tool as needed to your management or Finance department.

4. Implements actions

After gathering the data and making suggestions, the next step in cost optimization is to actually make these changes. Using the reports and data gathered, the tool should be able to manage your resources and implement any necessary changes without you having to do anything.  

5.  Automation and APIs

Even though it goes on in the background, APIs are necessary because they allow your tool to work in conjunction with other operations. With the support of inbound actions and outbound notifications, this automated process allows you to streamline all of your data.  This will make things faster and more efficient – allowing you to cut down on time and IaaS cost. Highlights to look for include Single Sign-On, ChatOps integrations, and a well-documented API.

Keep Your Organization’s IaaS Cost Needs in Mind

These are just a few of the things you should be looking for when searching for IaaS cost optimization – but you have to find the platform that works best for you!

ParkMyCloud automatically optimizes your IaaS costs with these principles in mind – try it out with a 14-day free trial and see if it’s the right fit for you.

 

Should You Use the Cloud-Native Instance Scheduler Tools?

Should You Use the Cloud-Native Instance Scheduler Tools?

When adopting or optimizing your public cloud use, it’s important to eliminate wasted spend from idle resources – which is why you need to include an instance scheduler in your plan. An instance scheduler ensures that non-production resources – those used for development, staging, testing, and QA – are stopped when they’re not being used, so you aren’t charged for compute time you’re not actually using.

AWS, Azure, and Google Cloud each offer an instance scheduler option. Will these fit your needs – or will you need something more robust? Let’s take a look at the offerings and see the benefits and drawbacks of each.

AWS Instance Scheduler

AWS has a solution called the AWS Instance Scheduler. AWS provides a CloudFormation template that deploys all the infrastructure needed to schedule EC2 and RDS instances. This infrastructure includes DynamoDB tables, Lambda functions, and CloudWatch alarms and metrics, and relies on tagging of instances to shut down and turn on the resources.

The AWS Instance scheduler is fairly robust in that it allows you to have multiple schedules, override those schedules, connect to other AWS accounts, temporarily resize instances, and manage both EC2 instances and RDS databases.  However, that management is done exclusively through editing DynamoDB table entries, which is not the most user-friendly experience. All of those settings in DynamoDB are applied via instance tags, which is good if your organization is tag-savvy, but can be a problem if not all users have access to change tags.

If you will have multiple users adding and updating schedules, the Instance Scheduler does not provide good auditing or multi-user capabilities. You’ll want to strongly consider an alternative.

Microsoft Azure Automation

Microsoft has a feature called Azure Automation, which includes multiple solutions for VM management. One of those solutions is “Start/Stop VMs during off-hours”, which deploys runbooks, schedules, and log analytics in your Azure subscription for managing instances. Configuration is done in the runbook parameters and variables, and email notifications can be sent for each schedule.

This solution steps you through the setup for timing of start and stop, along with email configuration and the target VMs. However, multiple schedules require multiple deployments of the solution, and connecting to additional Azure subscriptions requires even more deployments. They do include the ability to order or sequence your start/stop, which can be very helpful for multi-component applications, but there’s no option for temporary overrides and no UI for self-service management. One really nice feature is the ability to recognize when instances are idle, and automatically stop them after a set time period, which the other tools don’t provide.

Google Cloud Scheduler

Google also has packaged some of their Cloud components together into a Google Cloud Scheduler. This includes usage of Google Cloud Functions for running the scripts, Google Cloud Pub/Sub messages for driving the actions, and Google Cloud Scheduler Jobs to actually kick-off the start and stop for the VMs. Unlike AWS and Azure, this requires individual setup (instead of being packaged into a deployment), but the documentation takes you step-by-step through the process.

Google Cloud Scheduler relies on instance names instead of tags by default, though the functions are all made available for you to modify as you need. The settings are all built in to those functions, which makes updating or modifying much more complicated than the other services. There’s also no real UI available, and the out-of-the-box experience is fairly limited in scope.

Cloud Native or Third Party?

Each of the instance scheduler tools provided by the cloud providers has a few limitations. One possible dealbreaker is that none of these tools are multi-cloud capable, so if your organization uses multiple public clouds then you may need to go for a third-party tool. They also don’t provide a self-service UI, built-in RBAC capabilities, Single Sign-On, or reporting capabilities. When it comes to cost, all of these tools are “free”, but you end up paying for the deployed infrastructure and services that are used, so the cost can be very hard to pin down.

We built ParkMyCloud to solve the instance scheduler problem (now with rightsizing too). Here’s how the functionality stacks up against the cloud-native options:

 

AWS Instance Scheduler Microsoft Azure Automation Google Cloud Scheduler ParkMyCloud
Virtual Machine scheduling
Database scheduling
Scale Set scheduling
Tag-based scheduling
Usage-based recommendations
Simple UI
Resize instances
Override Schedules
Reporting
Start/Stop notifications
Multi-Account
Multi-Cloud

Overall, the cloud-native instance scheduler tools can help you get started on your cost-saving journey, but may not fulfill your longer-term requirements due to their limitations.

Try ParkMyCloud with a free trial — we think you’ll find that it meets your needs in the long run.  

The One Thing You Need More than Cloud Visibility

The One Thing You Need More than Cloud Visibility

If you ask a group of CIOs or analysts for a list of priorities for companies adopting cloud infrastructure, there’s no doubt that cloud visibility would be named near the top. Insight is important for everything from security to cost management. But cloud visibility on its own is not enough, particularly as widespread cloud usage continues to mature.

Don’t Get Us Wrong: Cloud Visibility is Important

Cloud visibility is a broad term, encompassing resource consumption and spend, security and regulatory compliance, and monitoring. In fact, cloud “monitoring” is a term that typically encompasses performance monitoring and security. This is certainly important: some projections show the cloud monitoring marketing reaching $3.9 billion in 2026, so there is obviously demand for these tools.

Another aspect is cost. Cloud cost visibility is a hot topic right now, and with good reason. Public cloud providers’ bills are confusing, and you need to be able to understand what you’re being charged for. It’s also important to see where your spend is going, ideally with slice-and-dice reporting so you can analyze by user, team, project, and resource type, and ensure internal chargeback based on consumption.

However, in terms of resource and cost management, cloud visibility alone is not enough to make change.

Cloud Visibility is Useless without Action

There’s a reason that this time of year, self-help gurus encourage resolution makers to make their goals actionable. Aspirations are great. Knowledge is great. But without practical application, aspirations and knowledge won’t lead to change.

When it comes to cloud cost management, there are several capabilities that you need in order to capitalize on the insights gained through visibility. Three important ones to keep in mind are:

  1. The ability to allocate costs to teams.
  2. The ability to automate remediation.
  3. The ability to optimize spending.

The popular cloud cost management tools tend to be strong on some combination of analytics, reporting dashboards, chargeback/showback, budget allocation, governance, and recommendations (which can get quite granular in areas such as reserved instances and orphaned resources). However, they require external tools or people to act upon these recommendations and lack automation.

Actionable is Good. Optimization is Better.

As you research cloud visibility and monitoring solutions to address knowledge gaps in your organization, be sure to include a requirement to address cloud waste. Cloud optimization should require little to no manual work on your part by integrating into your cloud operations, allowing you to automatically reap the benefits and savings.

Here’s a first step on your optimization journey: pick a cloud account, plug it into ParkMyCloud, and get immediate recommendations for cost reduction. Click to apply the recommendations – or set a policy to do it automatically – and see the savings start to add up.

$14.1 Billion in Cloud Spending to be Wasted in 2019

$14.1 Billion in Cloud Spending to be Wasted in 2019

It’s that time of year: new gym memberships, fresh diet goals, and plans to reform… cloud spending?

If you’re at all involved in your organization’s public cloud infrastructure, that last one should definitely be on your to-do list. Chances are, if you’re spending money on cloud, some of that money is being wasted. For some, a lot of that money is being wasted. Here are the numbers.

Predicted Cloud Spending 2019

The latest predictions from Gartner estimate that overall IT spending will reach $3.8 trillion this year, a growth of 3.2% over IT spending in 2018.

Of this spend, public cloud spending is expected to reach $206.2 billion — of which, the fastest growing segment is Infrastructure as a Service (IaaS) which Gartner says will grow 27.6 percent in 2019 to reach $39.5 billion, up from $31 billion in 2018.

Now we can subdivide the public cloud spend number further to look just at compute resources — typically ⅔ of cloud spend is on compute, or about $26.3 billion. This segment of spend is especially vulnerable to waste, particularly from idle resources and oversized resources.

Wasted Cloud Spending from Idle Resources

Let’s first take a look at idle resources — resources that are being paid for by the hour or minute, but are not actually being used. Typically, this kind of waste occurs in non-production environments – that is, those used for development, testing, staging, and QA. About 44% of compute spend is on non-production resources (that’s our number).

Most non-production resources are only used during a 40-hour work week, and do not need to run 24/7. That means that for the other 128 hours of the week (76%), the resources sit idle, but are still paid for.

So what we get is:

$26.3 billion in compute spend * 0.44 non-production * 0.76 of week idle = $8.8 billion wasted on idle cloud resources

Wasted Cloud Spending from Oversized Resources

The other source of wasted cloud spend is oversized infrastructure — that is, paying for resources at a larger capacity than needed.

RightScale found that 40% of instances were sized at least one size larger than needed for their workloads. Just by reducing an instance by one size, the cost is reduced by 50%. Downsizing by two sizes saves 75%.

The data we see in our users’ infrastructure in the ParkMyCloud confirms this, and in fact we find that it may even be a conservative estimate. Infrastructure managed in our platform has an average CPU utilization of 4.9%. Of course, this doesn’t take memory into account, and could be skewed by the fact that resources managed in ParkMyCloud are more commonly for non-production resources. However, it still paints a picture of gross underutilization, ripe for rightsizing and optimization.

If we take a conservative estimate of 40% of resources oversized by just one size, we find the following:

$26.3 billion in compute spend * 0.4 oversized * 0.5 overspend per oversized resource = $5.3 billion wasted on oversized resources

Total Cloud Spending to be Wasted in 2019

Between idle resources and overprovisioning, wasted cloud spend will exceed $14.1 billion in 2019.


In fact, this estimation of wasted cloud spend is probably low. This calculation doesn’t even account for waste accumulated through orphaned resources, suboptimal pricing options, misuse of reserved instances, and more.

Join ParkMyCloud and become a cloud waste killer

End the Waste

It’s time to fight this cloud waste. That’s what we’re all about at ParkMyCloud — eliminating wasted cloud spending through scheduling, rightsizing, and optimization.

Ready to join us and become a cloud waste killer? Let’s do it.

10,000 Years of Data Says Your Server Sizing is Wrong.

10,000 Years of Data Says Your Server Sizing is Wrong.

Serving sizing in the cloud can be tricky. Unless you are about to do some massive high-performance computing project, super-sizing your cloud virtual machines/instances is probably not what you are thinking about when you log in to you favorite cloud service provider.  But from looking at customer data within our system, it certainly does look like a lot of folks are walking up to to their neighborhood cloud provider and saying exactly that: Super Size Me!

Like at a fast-food place, buying in the super size means paying extra costs…and when you are looking for ways to save money on cloud costs, whether for production or non-production resources, the first place to look is at idle and underutilized resources.

Within the ParkMyCloud SaaS platform, we have collected bazillions (scientific term) of samples of performance data for tens of thousands of virtual machines, across hundreds of customers, and the average of all “Average CPU” readings is an amazing (even to us) 4.9%.  When you consider that many of our customer are already addressing underutilization by stopping or “parking” their instances when they are not being used, one can easily conclude that the server sizing is out of control and instances are tremendously overbuilt. In other words, they are much more powerful than they need to be…and thus much more expensive than they need to be.  As cool as “super sizing” sounds, the real solution is in rightsizing, and ensuring the instance size and type are better tailored to the actual load.

Size, Damned Size, and Statistics

Before we start talking about what is involved in rightsizing, let’s look at a few more statistics, just because the numbers are pretty cool.  Looking at utilization data from about 88.9 million instance-hours on AWS – that’s 10,148 years –  we find the following:

So, what is this telling us about server sizing?  The percentiles alone tell us that more than 95% of our samples are operating at less than 50% Average CPU – which means if we cut the number of CPUs in half for most of our instances, we would probably still be able to carry our workload.  The 95th percentile for Peak CPU is 58%, so if we cut all of those CPUs in half we would either have to be OK with a small degradation in performance, or maybe we select an instance to avoid exceeding 99% peak CPU (which happens around the 93rd percentile – still a pretty massive number).

Looking down at the 75th and 50th percentiles we see instances that could possibly benefit from multiple steps down!  As shown in the next section, one step down can save you 50% of the cost for an instance. Two steps down can save you 75%!

Before making an actual server sizing change, this data would need to be further analyzed on an instance by instance basis – it may be that many of these instances have bursty behavior, where their CPUs are more highly utilized for short periods of time, and idle all the rest of the time.  Such an instance would be better of being parked or stopped for most of the time, and only started up when needed. Or…depending on the duration and magnitude of the burst, might be better off moving to the AWS T instance family, which accumulates credits for bursts of CPU, and is less expensive than the M family, built for a more continuous performance duty cycle.  Also – as discussed below – complete rightsizing would entail looking at some other utilization stats as well, like memory, network, etc.

Hungry Size

On every cloud provider there is a clear progression of server sizing and prices within any given instance family.  The next size up from where you are is usually twice the CPUs and twice the memory, and as might be expected, twice the price.

Here is a small sample of AWS prices in us-east-1 (N. Virginia) to show you what I mean:

Double the memory and/or double the CPU…and double the price.

Pulling the Wool Over Your Size?

It is important to note that there is more to instance utilization than just the CPU stats.  There are a number of applications with low-CPU but high network, memory, disk utilization, or database IOPs, and so a complete set of stats are needed before making a rightsizing decision.

This can be where rightsizing across instance families makes sense.  

On AWS, some of the most commonly used instance types are the T and M general purpose families.  Many production applications start out on the M family, as it has a good balance of CPU and memory.  Let’s look at the m5.4xlarge as a specific example, shown in the middle row below.

  • If you find that such an instance was showing good utilization of its CPU, maybe with an Average CPU of 75% and Peak CPU of 95%, but the memory was extremely underutilized, maybe only consuming 20%, we may want to move to more of a compute-optimized instance family.  From the table below, we can see we could move over to a c5.4xlarge, keeping the same number of CPUs, but cutting the RAM in half, saving about 11% of our costs.
  • On the other hand, if you find the CPU was significantly underutilized, for example showing an Average CPU of 30% and Peak of 45%, but memory was 85% utilized, we may be better off on a memory-optimized instance family.  From the table below, we can move to an r5.2xlarge instance cutting the vCPUs in half, and keeping the same amount of RAM, and saving about 34% of the costs.

Within AWS there are additional considerations on the network side.  As shown here, available network performance follows the instance size and type.  You may find yourself in a situation where memory and CPU are non-issues, but high network bandwidth is critical, and deliberately super-size an instance.  Even in this case, though, you should think about if there is a way to split your workload into multiple smaller instances (and thus multiple network streams) that are less expensive than a beastly machine selected solely on the basis of network performance.

You may also need to consider availability when determining your server sizing. For example, if you need to run in a high-availability mode using an autoscaling group you may be running two instances, either one of which can handle your full load, but both are only 50% active at any given time.  As long as they are only 50% active that is fine – but you may want to consider if maybe two instances at half the size would be OK, and then address a surge in load by scaling-up the autoscaling group.

Keep Your Size on the Prize

For full cost optimization for your virtual machines, you need to consider appropriate resource scheduling, server sizing, and sustained usage.

  • Rightsize instances wherever possible.  You can easily save 50% just by going down one size tier – and this applies to production resources as well as development and test systems!
  • Modernize your instance types.  This is similar to rightsizing, in that you are changing to the same instance type in a newer generation of the same family, where cloud provider efficiency improvements mean lower costs.  For example, moving an application from an m3.xlarge to an m5.xlarge can save 28%!
  • Park/stop instances when they are not in use.  You can save 65% of the cost of a development or test virtual machine by just having it on 12 hours per day on weekdays!
  • For systems that must be up continually, (and once you have settled on the correct size instance) consider purchasing reserved instances, which can save 54-75% off the regular cost.  If you would like a review of your resource usage to see where you can best utilized reserved instances, please let us know

A Site for Sore Size

Last week, ParkMyCloud released the ability to rightsize and modernize instances. This release helps you identify the virtual machine and database instances that are not fully utilized or on an older family, making smart recommendations for better server sizing and/or family selection for the level of utilization, and then letting you execute the rightsize action. We will also be adding a feature for scheduled rightsizing, allowing you to maintain instance continuity, but reducing its size during periods of lower utilization.

Get started today!