Not only has it become apparent that public cloud is here to stay, it’s also growing faster as time goes on (by 2020, it is estimated that more than 40% of enterprise workloads will be in the cloud). IT infrastructure has changed permanently, and enterprise organizations are coming to terms with some of the side effects of this shift. One of those side effects is the need for tools and processes (and even teams in larger organizations) dedicated to cloud cost management and cost control. Executives from all teams within an organization want to see costs, projections, usage, savings, and quantifiable efforts to save the company money while maximizing IT throughput as enterprises shift to resources to the cloud.
There’s a variety of tools to solve some of these problems, so let’s take a look at a few of the major ones. All of the tools mentioned below support Amazon AWS, Microsoft Azure, and Google Cloud Platform.
CloudHealth provides detailed analytics and reporting on your overall cloud spend, with the ability to slice-and-dice that data in a variety of ways. Recommendations about your instances are made based on a score driven by instance utilization and cloud provider best practices. This data is collected from agents that are installed on the instances, along with cloud-level information. Analysis and business intelligence tools for cloud spend and infrastructure utilization are featured prominently in the dashboard, with governance provided through policies driven by teams for alerts and thresholds. Some actions can be scripted, such as deleting elastic IPs/snapshots and managing EC2 instances, but reporting and dashboards are the main focus.
Overall, the platform seems to be a popular choice for large enterprises wanting cost and governance visibility across their cloud infrastructure. Pricing is based on a percentage of your monthly cloud spend.
Cloudcheckr provides visibility into governance, security, compliance, and cost problems based on doing analytics and checks against logic built into their platform. It relies on non-native tools and integrations to take action on the recommendations, such as Spotinst, Ansible, or Chef. CloudCheckr’s reports cover a wide range of topics, including inventory, utilization, security, costs, and overall best-practices. The UI is simple and is likely equally well regarded by technical and non-technical users.
The platform seems to be a popular choice with small and medium sized enterprises looking for greater overall visibility and recommendations to help optimize their use of cloud. Given their SMB focus customers are often provided this service through MSPs. Pricing is based on your cloud spend, but a free tier is also available.
Cloudyn (recently acquired by Microsoft) is focused on providing advice and recommendations along with chargeback and showback capabilities for enterprise organizations. Cloud resources and costs can be managed through their hierarchical team structure. Visibility, alerting, and recommendations are made in real time to assist in right-sizing instances and identifying outlying resources. Like CloudCheckr, it relies on external tools or people to act upon recommendations and lacks automation
Their platform options include supporting MSPs in the management of their end customer’s cloud environments as well as an interesting cloud benchmarking service called Cloudyndex. Pricing for Cloudyn is also based on your monthly cloud spend. Much of the focus seems to be on current Microsoft Azure customers and users.
Unlike the other tools mentioned, ParkMyCloud focuses on actions and automated scheduling of resources to provide optimization and immediate ROI. Reports and dashboards are available to show the cost savings provided by these schedules and recommendations on which instances to park. The schedules can be manually attached to instances, or automatically assigned based on tags or naming schemes through its Policy Engine. It pairs well with the other previously mentioned recommendation-based tools in this space to provide total cost control through both actions and reporting.
ParkMyCloud is widely used by DevOps and IT Ops in organizations from small startups to global multinationals, all who are keen to automate cost control by leveraging ParkMyCloud’s native API and pre-built integration with tools like Slack, Atlassian, and Jenkins. Pricing is based on a cost per-instance, with a free tier available.
Cloud cost management isn’t just a “should think about” item, it’s a “must have in place” item, regardless of the size of a company’s cloud bill. Specialized tools can help you view, manage, and project your cloud costs no matter which provider you choose. The right toolkit can supercharge your IT infrastructure, so consider a combination of some of the tools above to really get the most out of your AWS, Azure, or Google environment.
Webhooks are user-defined HTTP POST callbacks. They provide a lightweight mechanism for letting remote applications receive push notifications from a service or application, without requiring polling. In today’s IT infrastructure that includes monitoring tools, cloud providers, DevOps processes, and internally-developed applications, webhooks are a crucial way to communicate between individual systems for a cohesive service delivery. Now, in ParkMyCloud, webhooks are available for even more powerful cost control.
For example, you may want to let a monitoring solution like Datadog or New Relic know that ParkMyCloud is stopping a server for some period of time and therefore suppress alerts to that monitoring system for the period the server will be parked, and vice versa enable the monitoring once the server is unparked (turned on). Another example would be to have ParkMyCloud post to a chatroom or dashboard when schedules have been overridden by users. We do this by enabling systems notifications to our cloud webhooks.
Previously only two options were provided when configuring system level and user notifications in ParkMyCloud: System Errors and Parking Actions. We have added 3 new notification options for both system level and user notifications. Descriptions for all five options are provided below:
- System Errors – These are errors occurring within the system itself such as discovery errors, parking errors, invalid credential permissions, etc.
- System Maintenance and Updates – These are the notifications provided via the banner at the top of the dashboard.
- User Actions – These are actions performed by users in ParkMyCloud such as manual resource state toggles, attachment or detachment of schedules, credential updates, etc.
- Parking Actions – These are actions specifically related to parking such as automatic starting or stopping of resources based on defined parking schedules.
- Policy Actions – These are actions specifically related to configured policies in ParkMyCloud such as automatic schedule attachments based on a set rule.
We have made the options more granular to provide you better control on events you want to see or not see.
These options can be seen when adding or modifying a channel for system level notifications (Settings > System Level Notifications). In the image shown below, a channel is being added.
Note: For additional information regarding these options, click on the Info Icon to the right of Notify About.
The new notification options are also viewable by users who want to set up their own notifications (Username > My Profile). These personal notifications are sent via email to the address associated with your user. Personal notifications can be set up by any user, while Webhooks must be set up by a ParkMyCloud Admin.
After clicking on Notifications, you will see the above options and may use the checkboxes to select the notifications you want to receive. You can also set each webhook to handle a specific ParkMyCloud team, then set up multiple webhooks to handle different parts of your organization. This offers maximum flexibility based on each team’s tools, processes, and procedures. Once finished, click on Save Changes. Any of these notifications can be sent then to your cloud webhook and even Slack to ensure ParkMyCloud is integrated into your cloud management operations.
Large companies have traditionally had an impressive list of batch workloads, which run at night, when people have gone home for the day. These include such things as application and database backup jobs; extraction, transform, and load (ETL) jobs; disaster recovery (DR) environment checks and updates; online analytical processing (OLAP) jobs; and monthly/ quarterly billing updates or financial “close”, to name a few.
Traditionally, with on-premise data centers, these workloads have run at night to allow the same hardware infrastructure that supports daytime interactive workloads to be repurposed, if you will, to run these batch workloads at night. This served a couple of purposes:
- It avoided network contention between the two workloads (as both are important), allowing the interactive workloads to remain responsive.
- It avoided data center sprawl by using the same infrastructure to run both, rather than having dedicated infrastructure for interactive and batch.
Things Are Different with Public Cloud
As companies move to the public cloud, they are no longer constrained by having to repurpose the same infrastructure. In fact, they can spin up and spin down new resources on demand in AWS, Azure or Google Cloud Platform (GCP), running both interactive and batch workloads whenever they want.
Network contention is also less of concern, since the public cloud providers typically have plenty of bandwidth. The exception of course is where batch workloads use the same application interfaces or APIs to read/write data.
So, moving to public cloud offers a spectrum of possibilities, and you can use one or any combination of them:
- You can run batch nightly using similar processes as you do in your online data centers, but on separate provisioned instances/virtual machines. This probably results in the least effort to moving batch to the public cloud, the least change to your DevOps processes, and perhaps saves you some money by having instances sized specifically for the workloads and being able to leverage cloud cost savings options (e.g., reserved instances);
- You can run batch on separately provisioned instances/virtual machines, but concurrently with existing interactive workloads. This will likely result in some additional work to change your DevOps processes, but offers more freedom and similar benefits mentioned above. You will still need to pay attention to application interfaces/APIs the workloads may have in common; or
- At the extreme end of the cloud adoptions spectrum, you could use cloud provider platform as a service (PaaS) offerings, such as AWS Batch, Microsoft Azure Batch or GCP Cloud Dataflow, where batch is essentially treated as a “black box”. A detailed comparison of these services is beyond the scope of this blog. However, in summary, these are fully managed services, where you queue up input data in an S3 bucket, object blob or volume along with a job definition, appropriate environment variables and a schedule and you’re off to races. These services employ containers and autoscaling/resource groups/instance groups where appropriate, with options to use less expensive compute in some cases. (For example, with AWS Batch, you have the option of using spot instances.)
The advantage of this approach is potentially faster time to implement and (maybe) less expensive monthly cloud costs, because the compute services run only at the times you specify. The disadvantages of this approach may be the degree of operational/configuration control you have; the fact, that these services may be totally foreign to your existing DevOps folks/processes (i.e., there is a steep learning curve); and it may tie you to that specific cloud provider.
A Simple Alternative
If you are looking to minimize impact to your DevOps processes (that is, the first two approaches mentioned above), but still save money, then ParkMyCloud can help.
Normally, with the first two options, there are cron jobs scheduled to kick-off batch jobs at the appropriate times throughout the day, but the underlying instances must be running for cron to do its thing. You could use ParkMyCloud to put parking schedules on these resources, such they are turned OFF for most of the day, but are turned ON just-in-time to still allow the cron jobs to execute.
We have been successfully using this approach in our own infrastructure for some time now, to control a batch server used to do database backups. This would, in fact, provide more savings than AWS reserved instances.
Let’s look at specific example in AWS. Suppose you have an m4.large server you use run batch jobs. Assuming Linux pricing in us-east-1, this server costs $0.10 per hour, or about $73 per month. Suppose you have configured cron to start batch jobs at midnight UTC and that they normally complete 1 to 1-½ hours later.
You could purchase a Reserved Instance for that server, where you either pay nothing upfront or all upfront and your savings would be 38%-42%.
Or, you could put a ParkMyCloud schedule where the instance is only ON from 11 pm-1 am UTC, allowing enough time for the cron jobs to start and run. The savings in that case would be 87.6% (including the cost of ParkMyCloud) without the need for a one year commitment. Depending on how many batch servers you run in your environment and their sizes, that could be some hefty savings.
Public cloud will offer you a lot of freedom and some potentially attractive cost savings as you move batch workloads from on premise. You are no longer constrained by having the same infrastructure serve two vastly different types of workloads — interactive and batch. The savings you can achieve by moving to public cloud can vary, depending on the approach you take and the provider/service you use.
The approach you take, depends on the amount of process change you’re willing to absorb in your DevOps processes. If you are willing to throw caution to the wind, the cloud provider PaaS offerings for batch can be quite compelling.
If you wish to take a more cautious approach, then we engineered ParkMyCloud to park servers without the need for scripting, or the need for you to be a DevOps expert. This approach allows you to achieve decent savings, with minimal change to your DevOps batch processes and without the need for Reserved Instances.
We’re happy to introduce ParkMyCloud’s new reporting dashboard! There’s now easy to access reports that provide greater insight into information regarding cloud costs, team rosters, and more. Details on this update can be found in our support portal
Now, when you click Reports in the left navigational panel, instead of getting the option to download a full savings report, you’ll see your ParkMyCloud reporting dashboard. This provides a quick view of cloud provider, team and resource costs, and information regarding your ParkMyCloud savings. At the top of the reporting dashboard, two drop-down menus are provided for selecting the report type and the time period. The default selections are Dashboard and Trailing 30 Days, which is what you see after clicking on reporting in the left navigational menu. Click on a drop-down menu to choose other available options.
Underneath the Report Type drop-down menu, you will see several options that are broken down into additional sections (Financial, Resource, Administrative, etc.) Click on an option in the menu to view that specific report within the dashboard. These reports can also be shown using a variety of time periods. Reports may be exported as an CSV or Excel file by clicking on the desired option on the right of the Report and Time Period drop-down menus as well.
Click on Legacy if you would prefer to still use the previous reporting functionality rather than the new reporting dashboard in ParkMyCloud. A pop-up window will appear for selecting the start and end date along with the type of legacy report. As part of this change, we have also moved Audit Logs underneath reporting. To access this option, you will need to select Reports in the left navigational panel and then Audit Log.
Check It Out
If you don’t yet use ParkMyCloud, you can try it now for free. We offer a 14-day free trial of all ParkMyCloud features, after which you can choose to subscribe to a premium plan or continue parking your instances using ParkMyCloud’s free tier forever.
If you already use ParkMyCloud, you’ll instantly see a visual representation of your cloud savings just by logging in to the platform. We challenge you to use this as a scoreboard, and try to drive your monthly savings as high as you can!
We chatted with Ryan Alexander, DevOps Engineer at Decision Resources Group (DRG) about his company’s use of AWS and how they automate cloud cost savings. Below is a transcript of our conversion.
Hi Ryan, thanks for speaking with us. To start out, can you please describe what your company does?
Decision Resources Group offers market information and data for the medtech industry. For example, let’s say a medical graduate student is doing a thesis on Viagra use in the Boston area. They can use our tool to see information such as age groups, ethnicities, number of hospitals, and number of people who were issued Viagra in the city of Boston.
What does your team do within the company? What is your role?
I’m a DevOps engineer on a team of two. We provide infrastructure automation to the other teams in the organization. We report to senior tech management, which makes us somewhat of an island within the organization.
Can you describe how you are using AWS?
We have an infrastructure team internally. Once a server or infrastructure is built, we take over to build clusters and environments for what’s required. We utilize pretty much every tool AWS offers — EBS, ELB, RDS, Aurora, CloudFormation, etc.
What prompted you to look for a cost control solution?
When I joined DRG in December, there was a new cost saving initiative developing within the organization. It came from our CTO, who knew we could be doing better and wanted to see where we might be leaving money on the table.
How did you hear about ParkMyCloud?
One of my colleagues actually spoke with your CTO, Dale, at AWS re:Invent, and I had also heard about ParkMyCloud at DevOpsDays Toronto 2016. We realized it could help solve some of our cloud cost control problems and decided to take a look.
What challenges were contributing to the high costs? How has ParkMyCloud helped you solve them?
We knew we had a problem where development, staging, and QA environments were only used for 8 hours a day – but they were running for 24 hours a day. We wanted to shut them down and save money on the off hours, which ParkMyCloud helps us do automatically.
We also have “worker” machines that are used a few times a month, but they need to be there. It was tedious to go in and shut them down individually. Now with ParkMyCloud, I put those in a group and shut them down with one click. It is really just that easy to automate cloud cost savings with ParkMyCloud.
We also have security measures in place, where not everyone has the ability to sign in to AWS and shut down instances. If there was a team that needed them started on demand, but they’re in another country and I’m sleeping, they have to wait until I wake up the next morning, or I get up at 2 AM. Now that we set up Single Sign-On, I can set up the guys who use those servers, and give them the rights to startup and shutdown those servers. This has been more efficient for all of us. I no longer have to babysit and turn those on/off as needed, which saves time for all of us.
With ParkMyCloud, we set up teams and users so they can only see their own instances, so they can’t cause a cascading failure because they can only see the servers they need.
Were there any unexpected benefits of ParkMyCloud?
When I started, I deleted 3 servers that were sitting there doing nothing for a year and costing the company lots of money. With ParkMyCloud, that kind of stuff won’t happen, because everything gets sorted into teams. We can see the costs by team and ask the right questions, like, “why is your team’s cost so expensive right now? Why are you ignoring these recommendations from ParkMyCloud to park these instances?”
We rely on tagging to do all of this. Tagging is life in DevOps.