DevOps cloud cost control: an oxymoron? If you’re in DevOps, you may not think that cloud cost is your concern. When asked what your primary concern is, you might say speed of delivery, or integrations, or automation. However, if you’re using public cloud, cost should be on your list of problems to control.
The Cloud Waste Problem
If DevOps is the biggest change in IT process in decades, then renting infrastructure on demand is the most disruptive change in IT operations. With the switch from traditional datacenters to public cloud, infrastructure is now used like a utility. Like any utility, there is waste. (Think: leaving the lights on or your air conditioner running when you’re not home.)
How big is the problem? In 2016, enterprises spent $23B on public cloud IaaS services. We estimate that about $6B of that was wasted on unneeded resources. The excess expense known as “cloud waste” comprises several interrelated problems: services running when they don’t need to be, improperly sized infrastructure, orphaned resources, and shadow IT.
Everyone who uses AWS, Azure, and Google Cloud Platform is either already feeling the pressure — or soon will be — to reel in this waste. As DevOps teams are primary cloud users in many companies, DevOps cloud cost control processes become a priority.
4 Principles of DevOps Cloud Cost Control
Let’s put this idea of cloud waste in the framework of some of the core principles of DevOps. Here are four key DevOps principles, applied to cloud cost control:
1. Holistic Thinking
In DevOps, you cannot simply focus on your own favorite corner of the world, or any one piece of a project in a vacuum. You must think about your environment as a whole.
For one thing, this means that, as mentioned above, cost does become your concern. Businesses have budgets. Technology teams have budgets. And, whether you care or not, that means DevOps has a budget it needs to stay within. Whether it’s a concern upfront or doesn’t become one until you’re approached by your CTO or CFO, at some point, infrastructure cost is going to be under scrutiny – and if you go too far out of budget, under direct mandates for reduction.
Solving problems not only speedily and elegantly, but cost efficiently becomes a necessity. You can’t just be concerned about Dev and Ops, you need to think about BizDevOps.
Holistic thinking also means that you need to think about ways to solve problems outside of code… more on this below.
2. No Silos
The principle of “no silos” means not only no communication silos, but also, no silos of access. This applies to the problem of cloud cost control when it comes to issues like leaving compute instances running when they’re not needed. If only one person in your organization has the ability to turn instances on and off, then all responsibility to turn those instances off falls on his or her shoulders.
It also means that if you want to use an instance that is scheduled to be turned off… well, too bad. You either call the person with the keys to log in and turn your instance on, or you wait until it’s scheduled to come on. Or if you really need a test environment now, you spin up new instances – completely defeating the purpose of turning the original instances off.
The solution is eliminating the control silo by allowing users to access their own instances to turn them on when they need them and off when they don’t — of course, using governance via user roles and policies to ensure that cost control tactics remain uninhibited.
(In this case, we’re thinking of providing access to outside management tools like the one we provide, but this can apply to your public cloud accounts and other development infrastructure management portals as well.)
3. Rapid, Useful Feedback
In the case of eliminating cloud waste, the feedback you need is where, in fact, waste is occurring. Are your instances sized properly? Are they running when they don’t need to be? Are there orphaned resources chugging away, eating at your budget?
Useful feedback can also come in the form of total cost savings, percentages of time your instances were shut down over the past month, and overall coverage of your cost optimization efforts. Reporting on what is working for your environment helps you decide how to continually address the problem that you are working on next.
You need monitoring tools in place in order to discover the answers to these questions. Preferably, you should be able to see all of your resources in a single dashboard, to ensure that none of these budget-eaters slip through the cracks. Multi-cloud and multi-region environments make this even more important.
The principle of Automation means that you should not waste time creating solutions when you don’t have to. This relates back to the problem of solving problems outside of code mentioned above.
So when automating, keep your eyes open and do your research. If there’s already an existing tool that does what you’re trying to code, it could be a potential time-saver and process-simplifier.
So take a look at your DevOps processes today, and see how you can incorporate a DevOps cloud cost control – or perhaps, “continuous cost control” – mindset to help with your continuous integration and continuous delivery pipelines. Automate cost control to reduce your cloud expenses and make your life easier.
When you start looking for an instance management tool to help manage your cloud infrastructure costs, you’ll realize there are a lot of options. While evaluating such tools, you need to make sure to have a list of requirements to make sure the software fits your needs and will help you reduce cloud waste. Here are a few items you might want to have on your checklist:
1. High visibility
One factor that contributes to cloud waste is the inability to track cloud instances. In today’s world, cross-cloud and cross-region are must-haves in order to provide high availability and true redundancy. Any modern instance management tool must be able to see all of your instances in one place, or you’re sure to have some fall through the cracks.
You might hate making reports, but solid reporting can be the difference between a well-informed organization and a proverbial dumpster fire. With the help of a good tool, you can generate reports that show the data you need for decision-making, without wasting time.
3. Takes Action
Sure, reports and pretty graphs are nice, but something needs to actually be acted upon in order to make any real difference to your monthly AWS or Azure bill! A lot of tools will gather up that data for you, but you really need something that can actually turn off the lights, so to speak — not just tell you which lights haven’t been turned off.
4. Simple to use UI
The user experience of an application can sometimes go unnoticed, but it’s often the difference between a useful tool and shelfware. One of the main difficulties in determining how easy an interface is to use is that you need to understand who the actual end user will be. The IT administrator who is evaluating products may be able to figure out the interface, but if other team members will need to use it, then their needs must be taken into account.
5. APIs and Automation
With the rise of DevOps practices and automated infrastructures, API access is a must. By enabling inbound actions and outbound notifications, new tools can work seamlessly with existing operations to eliminate wasted resources. Automation should also take into account your naming conventions and tagging standards for optimal integration.
6. Schedule Overrides
Once you’ve started working on solving your cloud waste problem by scheduling resources to turn off when not needed, you need to be able to adapt to the changing needs of the user and the organization. Anyone with proper access to a system should be able to override a given schedule if necessary, since any tool you use should be helping your users get work done.
7. Team Governance
A huge concern when letting users run wild with any new tool is how you can make sure they aren’t going to break anything. Giving someone the minimum required access is a security best practice, but sometimes those access controls can be confusing. In addition to a simple UI, the role-based access controls should also be simple to set up, modify, and understand.
8. Single Sign-On
Some might consider this a nice-to-have, but most enterprises today have started requiring this for all products they use. Users find it easy to sign in without remembering a million credentials, and admins find it more secure and faster to deploy. If SSO is being used within your organization, then you should start picking tools that integrate with it easily.
This is a starting point, but of course when evaluating an instance management tool, make sure to incorporate any unique needs your organization. What else would you include on your checklist?
Perhaps you’ve heard this around the office. It shouldn’t be too surprising: anyone who’s ever tried to load the Amazon EC2 console has quickly found how difficult it is to keep a handle on everything that is running. Only one region gets displayed at a time, which makes it common for admins to be surprised when the bill comes at the end of the month. In today’s distributed world, it not only makes sense for different instances to be running in different geographical regions, but it’s encouraged from an availability perspective.
On top of this multi-region setup, many organizations are moving to a multi-cloud strategy as well. Many executives are stressing to their operations teams that it’s important to run systems in both Azure and AWS. This provides extreme levels of reliability, but also complicates the day-to-day management of cloud instances.
So is that old cloud instance running?
You may get a chuckle out of the idea that IT administrators can lose servers, but it happens more frequently than we like to admit. If you only ever log in to US-East1, then you might forget that your dev team that lives in San Francisco was using US-West2 as their main development environment. Or perhaps you set up a second cloud environment to make sure your apps all work properly, but forgot to shut them down prior to going back to your main cloud.
That’s where a single-view dashboard (like the view you get with ParkMyCloud) can provide administrators with unprecedented visibility into their cloud accounts. This is a huge benefit that leads to cost savings right off the bat, as the cloud servers running that you forgot about or thought you turned off can be seen in a single pane of glass. Knowledge is power: now that you know it exists, you can turn it off. You also get an easy view into how your environment changes over time, so you’ll be aware if instances get spun up in various regions.
This level of visibility also has a freeing effect, as it can lead you to utilizing more regions without fear of losing instances. Many folks know they should be distributed geographically, but don’t want to deal with the headache of keeping track of the sprawl. By tracking all of your regions and accounts in one easy-to-use view, you can start to fully benefit from cloud computing without wasting money on unused resources.