You’ve gone full-blown DevOps, drank the Agile Kool-Aid, cloudified everything, and turned your monolith to microservices — so why have all of your old monolith costs turned into even bigger microservices costs? There are a few common reasons this happens, and some straightforward first steps to get microservices cost control in place.
Why Monolith to Microservices Drives Costs Up
As companies and departments adapt to modern software development processes and utilize the latest technologies, they assume they’re saving money – or forget to think about it altogether. Smaller applications and services should come with more savings opportunities, but complexity and rapidly-evolving environments can actually make the costs skyrocket. Sometimes, it’s happening right under your nose, but the costs are so hard to compile that you don’t even know it’s happening until it’s too late.
The same thing that makes microservices attractive — smaller pieces of infrastructure that can work independently from each other — can also be the main reason that costs spiral out of control. Isolated systems, with their own costs, maintenance, upgrades, and underlying architecture, can each look cheaper than the monolithic system you were running before, but can skyrocket in cost when aggregated.
How to Control Microservices Costs
If your microservices costs are already out of control, there are a few easy first steps to reining them in.
Keep It Simple
As with many new trends, there is a tendency to jump right in and switch everything to the new hotness. Having a drastic cutover, while scrapping all of your old code, can be refreshing and damaging all at the same time. It makes it hard to keep track of everything, so costs can run rampant while you and your team are struggling just to comprehend what pieces are where. By keeping some of what you already have, but slowly creating new functionality in a microservices model, you can maintain a baseline while focusing on costs and infrastructure of your new code.
The other way to keep it simple is to keep each microservice extremely limited in scope. If a microservice does just one thing, without a bunch of bells and whistles, it’s much easier to see if costs are rising and make the infrastructure match the use case. Additional opportunities for using PaaS or picking a cloud provider that fits your needs can really help maximize utilization.
Scalability and Bursting
Microservices architectures, by the very nature of their design, allow you to optimize individual pieces to minimize bottlenecks. This optimization can also include cost optimization of individual components, even to the point of having idle pieces turned completely off until they are needed. Other pieces might be on, but scaled down to the bare minimum, then rapidly scale out when demand runs high. A fluctuating architecture sounds complex, but can really help keep costs down when load is low.
Along with a microservices architecture, you may start having certain users and departments be responsible for just a piece of the system. With that in mind, cloud providers and platform tools can help you separate users to only access the systems and infrastructure they are working on so they can focus on the operation (and costs) of that piece. This allows you to give individual users the role that is necessary for minimal access controls, while still allowing them to get their jobs done.
Ordered Start/Stop and Automation with ParkMyCloud
ParkMyCloud is all about cost control, so we’ve started putting together a cost-savings plan for our customers who are moving from monolith to microservices.
First, they should use ParkMyCloud’s Logical Groups to put multiple instances and databases into a single entity with an ordered list. This way, your users do not have to remember multiple servers to start for their application – instead, they can start one group with a single click. This can help eliminate the support tickets that are due to parts of the system not running.
Additionally, use Logical Groups to set start delays and stop delays between nodes of the group. With delays, ParkMyCloud will know to start database A, then wait 10 minutes before starting instance B, to ensure the database is up and ready to accept connections. Similarly, you can make sure other microservices are shut down before finally shutting down the database.
Everything you can do in the ParkMyCloud user interface can also be done through the ParkMyCloud REST API. This means that you can temporarily override schedules, toggle instances to turn off or on, or change team memberships programmatically. In a microservices setup, you might have certain pieces that are idle for large portions of the day. With the ParkMyCloud API, you could have those nodes turned off on a schedule to save money, then have a separate microservice call the API to turn the node on when it’s needed.
The Goal: Continuous Cost Control
Moving from monolith to microservices can be a huge factor in a successful software development practice. Don’t let cost be a limiting factor – practice continuous cost control, no matter what architecture you choose. By putting a few costs control measures in place with ParkMyCloud, along with some automation and user management, you can make sure your new applications are not only modern, but also cost-effective.
Travel technology company Sabre announced a strategic agreement with Microsoft last week, weeks after a similar agreement with AWS. There are a lot of factors contributing to these decisions, but among them, it seems likely they’ve chosen multi-cloud for cost control.
The company has been under the leadership of CEO Sean Menke for a year and a half, and in that time has already downsized its workforce by 10% – saving the company $110 million in annual costs. Against such a backdrop, clearly, cost control will be front of mind.
So how will a multi-cloud strategy contribute to controlling costs as Sabre aims to “reimagine the business of travel”, in their words?
Why Multi-Cloud for Cost Control Makes Sense
As Sabre moves into AWS and Azure, they plan to write new applications with a microservices architecture deployed on Docker containers. Containerization can be an effective cost-saving strategy by reducing the amount of infrastructure needed – and thereby reducing wasted spend, and simplifying software delivery processes to increase productivity and reduce maintenance.
Plus, containerization has the advantage of ease of portability. With a large and public account like Sabre’s, this becomes a cost reduction strategy as AWS and Azure are forced into competition for their business against each other. “We want to have incentives for (cloud providers) not to take our business for granted,” said CIO Joe DiFonzo.
Avoiding vendor lock-in and optimizing workloads are the top two cited reasons for companies to choose a multi-cloud strategy – both of which contribute to cost control.
Either Way, Cost Has to Be a Factor
Aside from the reasons listed above, Sabre may have chosen to make deals with both AWS and Azure due to each cloud providers’ technological strengths, support offerings, developer familiarity, or for other reasons. Whether they’ve chosen multi-cloud for cost control as the primary reason is debatable, but they certainly need to control costs now that they’re there.
First of all, most cloud migrations go over budget – not to mention that 62% of first-attempt cloud migrations take longer than expected or fail outright, wasting money directly and through opportunity cost.
Second, Sabre’s legacy system of local, on-premises infrastructure means their IT and development staff is used to the idea of resources that are always available. Users need to be re-educated to learn a “cloud as utility” mindset – as a Director of Infrastructure of Avid put it, users need to learn “that there’s a direct monetary impact for every hour that an idle instance is running.” Of course, this is an issue we see every day.
For companies new to the cloud, we recommend providing training and guidelines to IT Ops, DevOps and Development teams about proper use of cloud infrastructure. This should include:
- Clear governance structures – what users can make infrastructure purchases? How are these purchases controlled?
- Turning resources off when not needed – automating non-production resources to turn off when not needed can reduce the cost of those resources by 65% or more (happy to help, Joe DiFonzo!)
- Regular infrastructure reviews – especially as companies get started in the cloud, it’s easy to waste money on orphaned resources, oversized resources, and resources you no longer need. We recommend regular reviews of all infrastructure to ensure every unused item is caught and eliminated.
Cheers to you, Sabre, and best of luck in your cloud journey.
Maybe you’re familiar with the ways idle instances contribute to cloud waste, but orphaned volumes and other resources also go easily-missed, needlessly increasing your monthly bill. Since the cloud is a pay-as-you-go utility, it’s easy to lose visibility of specific infrastructure costs and discover charges for resources you aren’t even using. Here’s how orphaned resources contribute to cloud waste, and what you can do about it.
How Orphaned Volumes are Eating Your Budget
The gist of it: When you shut down or terminate an instance or VM, you deal with orphaned volumes and snapshots of those or other volumes, unattached to servers and continuing to incur monthly $/GB charges.
Let’s take the example of AWS EC2. You’ve stopped all of your AWS EC2 instances, but you’re still getting charged monthly for Amazon EBS storage and accruing charges for unused instances. This happens because even though you didn’t leave your instances running (*high five*), you’re still getting charged for EBS storage in GB per month for the amount provisioned to your account. While EC2 instances only accrue charges while they’re running, EBS volumes attached to those instances retain information and continue charging you even after an instance has been stopped.
How to Reduce Waste from Orphaned Volumes
To save your data without paying monthly for the storage volume, you can take a snapshot of the volume as a backup and then delete the original volume. You’ll still be charged for EBS snapshots, but they’re billed at a lower rate and you still have the option to restore the volume from the snapshot if you need it later. EBS volume snapshots are backed up to S3. They’re compressed and therefore save storage, but do keep in mind that the initial snapshot is of the entire volume, and depending on how frequently you take subsequent (incremental) snapshots, your total could end up taking as much space the first snapshot.
When you no longer need these snapshots, Amazon’s user guide has instructions for how to delete EBS volumes and EBS snapshots.
Similar to EBS, Azure offers Managed Disks as a storage service for VMs and provides backups of persistent disks. But while EBS volume snapshots are compressed and also include incremental backups, therefore taking up less storage, Azure only takes full point-in-time snapshots, which can become costly when you can take as many snapshots as you want from the same Managed Disk.
If you’re using Google Cloud Platform, then Compute Engine also provides backups of persistent disks with instructions for create, restore, and delete snapshots. Like EBS snapshots, Google’s persistent disk snapshots are automatically compressed and also include incremental backups, saving storage space. The benefits (and risks) are the same as the other cloud providers e.g. lower bills and less storage costs, but you will still need to ensure that your snapshotting strategy does not leave you exposed to risk.
Watch Out for Other Orphaned Resources
Moral of the story: delete snapshots that you don’t need from terminated instances and VMs. It’s easy to see how a small feature that is supposed to save you money can end up forgotten, costing you money for resources you’re not using.
Orphaned volumes and snapshots are just one example of how orphaned resources can result in unnecessary charges. Others include:
- Unassociated IPs (AWS – Elastic IPs);
- Load Balancers (with no instances);
- Unused machine images; and
- Object Storage.
Don’t let orphaned volumes, snapshots, and other forgotten resources drive up your cloud bill. Put a stop to cloud waste by eliminating orphaned resources and inactive storage, saving space, time, and money in the process.
Today we’d like to announce a new Microsoft Teams bot that allows you to fully interact with ParkMyCloud directly through your chat window, without having to access the web GUI. By combining this chatbot with a direct notifications feed of any ParkMyCloud activities through our webhook integration, you can manage your continuous cost control from the Microsoft Teams channels you live in every day — making it easy to save 65% or more on your instance costs.
Organizations who are utilizing DevOps principles are increasingly utilizing ChatOps to manipulate their environments and provide a self-service platform to access the servers and databases they require for their work. There are a few different chat systems and bot platforms available – we also have a chat bot for Slack – but one that is growing rapidly in popularity is Microsoft Teams.
By setting up the Microsoft Teams bot to interact with your ParkMyCloud account, you can allow users to:
- Assign schedules
- Temporarily override schedules on parked instances
- Toggle instances to turn off or on as needed
Combine this with notifications from ParkMyCloud, and you can have full visibility into your cost control initiatives right from your standard Microsoft Teams chat channels. Notifications allow you to have ParkMyCloud post messages for things like schedule changes or instances that are being turned off automatically.
Now, with the new ParkMyCloud Teams bot, you can reply back to those notifications to:
- Snooze the schedule
- Turn a system back on temporarily
- Assign a new schedule.
The chatbot is open-source, so you can feel free to modify the bot as necessary to fit your environment or use cases. It’s written in NodeJS using the botbuilder library from Microsoft, but even if you’re not a NodeJS expert, we tried to make it easy to edit the commands and responses. We’d love to have you send your ideas and modifications back to us for rapid improvement.
If you haven’t already signed up for ParkMyCloud to help save you 65% on your cloud bills, then start a free trial and get the Microsoft Teams bot hooked up for easy ChatOps control. You’ll find that ParkMyCloud can make continuous cost control easy and help reduce your cloud spend, all while integrating with your favorite DevOps tools.
The time is ripe to take a fresh look at the advantages of multi-cloud. In the past 12 months, we’ve seen a huge increase in the number of our customers who use multiple public clouds – now more than 20% of our customers use multiple public clouds. With this trend in mind, we wanted to take a look at the positives of a multi-cloud strategy as well as the risks – because of course there’s no “easy button.”
What is Multi-Cloud?
First off, let’s define multi-cloud. Clearly, we’re talking about using one or more clouds, but clouds come in different flavors. For example, multi-cloud incorporates the idea of hybrid cloud – a mix of public and private Clouds. But multi-cloud can also mean two or more public clouds or two or more private clouds.
According to the RightScale 2018 State of the Cloud Report, 81% of Enterprises have a multi-cloud strategy:
What are the advantages of multi-cloud?
So why are businesses heading this direction with their infrastructure? Simple reasons include the following:
- Risk Mitigation – create resilient architectures
- Managing vendor lock-in – get price protection
- Optimization – place your workloads to optimize for cost and performance
- Cloud providers’ unique capabilities – take advantage of offerings in AI, IOT, Machine Learning, and more
When I asked our CTO what he sees as the advantages of a multi-cloud strategy, he highlighted risk management. ParkMyCloud’s own platform was born in the cloud, we run on AWS, we have a multi-region architecture with redundancy (let’s call this multi-cloud ‘light’), and if we went multi-cloud we would leverage another public cloud for risk mitigation.
Specifically, risk management from the perspective of one vendor having an infrastructure meltdown or attack. AWS had an issue about 15 months ago when S3 was offline in US-East-1 region for 5+ hours affecting many companies, large and small, and software from web apps to smartphones apps were affected (including ours). There have also been issues of certain AWS regions getting a DDoS attack that have affected service availability.
Having a backup to another cloud service provider (CSP) or Private Cloud in these cases could have ensured 100% uptime. In the case of Alibaba and other cloud vendors, they may have a much stronger presence in certain geographic regions due to a long term presence. When any of the vendors just start getting a toe-hold in a region, their environment has minimal redundancy and safeguards in place that provide the desired high-availability, so another provider in the same region may be safer from that availability perspective.
Do the advantages of multi-cloud outweigh the challenges?
Now let’s say you want to go multi-cloud, what does this mean to you? From our own experience integrating with AWS, Azure, and Google Cloud, we’ve seen that each cloud has its own set of interfaces and own challenges. It is not a “write once, runs everywhere” situation between the vendors, and any cloud or network management utility system needs to do the work to provide deep integration with each CSP.
Further, the nuances of configuring and managing each CSP require both broad and deep knowledge, and it is rare to find employees with the essential expertise for multiple clouds – so more staff is needed to manage multi-cloud with confidence that it is being done in a way that is both secure and highly available. With everyone trying to play catch-up with AWS, and with AWS itself evolving at a breakneck pace, it is very difficult for an individual or organization to best utilize one CSP, let alone multiple clouds.
Things like a common container environment can help mitigate these issues somewhat by isolating engineers from the nuances of virtual machine management, but the issues of network, infrastructure, cost optimization, security, and availability remain very CSP-specific.
On paper there are advantages of having a multi-cloud strategy. In practice, like many things, it ain’t easy.
Given that spring is very much in the air – at least it is here in Northern Virginia – our attention has turned to tidying up the yard and getting things in good shape for summer. While things are not so seasonally-focused in the world of cloud, the metaphor of taking time out to clean things up applies to unused cloud resources as well. We have even seen some call this ‘cloud pruning’ (not to be confused with the Japanese gardening method).
Cloud pruning is important for improving both cost and performance of your infrastructure. So what are some of the ways you can go about cleaning up, optimizing, and ensuring that our cloud environments are in great shape?
Delete Old Snapshots
Let’s start with focusing on items that we no longer need. One of the most common types of unused cloud resources is old Snapshots. These are your old EBS volumes on AWS, your storage disks (blobs) on Azure, and persistent disks on GCP. If you have had some form of backup strategy then it’s likely that you will understand the need to manage the number of snapshots you keep for a particular volume, and the need to delete older, unneeded snapshots. Cleaning these up immediately helps save on your storage costs and there are a number of best practices documenting how to streamline this process as well as a number of free and paid-for tools to help support this process.
Delete Old Machine Images
A Machine Image provides the information required to launch an instance, which is a virtual server in the cloud. In AWS these are called AMIs, in Azure they’re called Managed Images, and in GCP Custom Images. When these images are no longer needed, it is possible to deregister them. However, depending on your configuration you are likely to continue to incur costs, as typically the snapshot that was created when the image was first created will continue to incur storage costs. Therefore, if you are finished with an AMI, be sure to ensure that you also delete its accompanying snapshot. Managing your old AMIs does require work, but there are a number of methods to streamline these processes made available both by the cloud providers as well as third-party vendors to manage this type of unused cloud resources.
With the widespread adoption of containers in the last few years and much of the focus on their specific benefits, few have paid attention to ensuring these containers are optimized for performance and cost. One of the most effective ways to maximize the benefits of containers is to host multiple containerized application workloads within a single larger instance (typically large or x-large VM) rather than on a number of smaller, separate VMs. In particular, this is something you could utilize in your dev and test environments rather than in production, where you may just have one machine available to deploy to. As containerization continues to evolve, services such as AWS’s Fargate are enabling much more control of the resources required to run your containers beyond what is available today using traditional VMs. In particular, the ability to specify the exact CPU and memory your code requires (and thus the amount you pay) scales exactly with how many containers you are running.
So alongside pruning your trees or sweeping your deck and taking care of your outside spaces this spring, remember to take a look around your cloud environment and look for opportunities to remove unused cloud resources to optimize not only for cost, but also performance.