How to Use 9 Cloud DevOps Best Practices For Cost Control

How to Use 9 Cloud DevOps Best Practices For Cost Control

Any organization with a functioning cloud DevOps practice will have some common core tenants. While those tenants are frequently applied to things like code delivery and security, a company that fails to apply those tenants to cost control are destined to have a runaway cloud bill (or at least have a series of upcoming meetings with the CFO). Here are some of those tenants, and how they apply to cost control:

1. Leadership

One common excuse for wasted cloud spend is “well that other group has cloud waste too!” By aggressively targeting and eliminating cloud waste, you can set the tone for cost control within your team, which will spread throughout the rest of the organization. This also helps to get everyone thinking about the business, even if it doesn’t seem like wasting a few bucks here or there really matters (hint: it does).

2. Collaborative Culture

By tearing down silos and sharing ideas and services, cost control can be a normal part of DevOps cloud practice instead of a forced decree that no one wants to take part in. Writing a script that is more generally applicable, or finding a tool that others can be invited to will cause others to save money and join in. You may also get ideas from others that you never thought of, without having to waste time or replicate work.

3. Design for DevOps

Having cost control as a central priority within your team means that you end up building it in to your processes and software as you go. Attempting to control costs after-the-fact can be tough and can cause rewrites or rolling back instead of pressing forward. Also, tacked-on cost control is often less effective and saves less money than starting with it.

4. Continuous Integration

Integrating ideas and code from multiple teams with multiple codebases and processes can be daunting, which is why continually integrating as new commits happen is such a big step forward. Along the same lines, continually controlling costs during the integration phase means you can optimize your cloud spend by sharing resources, slimming down those resources, and shutting down resources until they are needed by the integration.

5. Continuous Testing

Continuous testing of software helps find bugs quickly and while developers are still working on those systems. Cost control during the testing phase can take multiple forms, including controlling the costs of those test servers, or doing continuous testing of the cost models and cost reduction strategies. New scripts and tools that are being used for cost control can also be tested during this phase.

6. Continuous Monitoring

Monitoring and reporting, like cost control, are often haphazardly tacked on to a software project instead of being a core component. For a lot of organizations, this means that costs aren’t actively being monitored and reported, which is what causes yelling from the Finance team when that cloud bill comes. By making everyone aware of how costs are trending and noting when huge spikes occur, you can keep those bills in check and help save yourself from those dreaded finance meetings.

7. Continuous Security

Cloud cost control can contribute to better security practices. For example, shutting down Virtual Machines when they aren’t in use decreases the number of entry points for would-be hackers, and helps mitigate various attack strategies. Reducing your total number of virtual machines also makes it easier for your security teams to harden and monitor the machines that exist.

8. Elastic Infrastructure

Auto-scaling resources are usually implemented by making services scale up automatically, while the “scaling down” part is an afterthought. It can be admittedly tricky to drain existing users and processes from under-utilized resources, but having lots of systems with low load is the leading cause of cloud waste. Additionally, having different scale patterns based on time of day, day of the week, and business need can be implemented, but requires thought and effort into this type of cost control.

9. Continuous Delivery/Deployment

Deploying your completed code to production can be exciting and terrifying at the same time. One factor that you need to consider is the size and cost of those production resources.  Cost savings for those resources is usually different from the dev/test/QA resources, as they typically need to be on 24/7 and can’t have high latency or long spin-up times. However, there are some cost control measures, like pre-paying for instances or having accurate usage patterns for your elastic environments, that should be considered by your production teams.

Full Cloud DevOps Cost Control

As you can see, there are a lot of paths to lowering your cloud bill by using some common cloud DevOps tenants. By working these ideas into your teams and weaving it throughout your processes, you can save money and help lead others to do the same. Controlling these costs can lead to less headaches, more time, and more money for future projects, which is what we’re all aiming to achieve with DevOps.

Should You Use the Cloud-Native Instance Scheduler Tools?

Should You Use the Cloud-Native Instance Scheduler Tools?

When adopting or optimizing your public cloud use, it’s important to eliminate wasted spend from idle resources – which is why you need to include an instance scheduler in your plan. An instance scheduler ensures that non-production resources – those used for development, staging, testing, and QA – are stopped when they’re not being used, so you aren’t charged for compute time you’re not actually using.

AWS, Azure, and Google Cloud each offer an instance scheduler option. Will these fit your needs – or will you need something more robust? Let’s take a look at the offerings and see the benefits and drawbacks of each.

AWS Instance Scheduler

AWS has a solution called the AWS Instance Scheduler. AWS provides a CloudFormation template that deploys all the infrastructure needed to schedule EC2 and RDS instances. This infrastructure includes DynamoDB tables, Lambda functions, and CloudWatch alarms and metrics, and relies on tagging of instances to shut down and turn on the resources.

The AWS Instance scheduler is fairly robust in that it allows you to have multiple schedules, override those schedules, connect to other AWS accounts, temporarily resize instances, and manage both EC2 instances and RDS databases.  However, that management is done exclusively through editing DynamoDB table entries, which is not the most user-friendly experience. All of those settings in DynamoDB are applied via instance tags, which is good if your organization is tag-savvy, but can be a problem if not all users have access to change tags.

If you will have multiple users adding and updating schedules, the Instance Scheduler does not provide good auditing or multi-user capabilities. You’ll want to strongly consider an alternative.

Microsoft Azure Automation

Microsoft has a feature called Azure Automation, which includes multiple solutions for VM management. One of those solutions is “Start/Stop VMs during off-hours”, which deploys runbooks, schedules, and log analytics in your Azure subscription for managing instances. Configuration is done in the runbook parameters and variables, and email notifications can be sent for each schedule.

This solution steps you through the setup for timing of start and stop, along with email configuration and the target VMs. However, multiple schedules require multiple deployments of the solution, and connecting to additional Azure subscriptions requires even more deployments. They do include the ability to order or sequence your start/stop, which can be very helpful for multi-component applications, but there’s no option for temporary overrides and no UI for self-service management. One really nice feature is the ability to recognize when instances are idle, and automatically stop them after a set time period, which the other tools don’t provide.

Google Cloud Scheduler

Google also has packaged some of their Cloud components together into a Google Cloud Scheduler. This includes usage of Google Cloud Functions for running the scripts, Google Cloud Pub/Sub messages for driving the actions, and Google Cloud Scheduler Jobs to actually kick-off the start and stop for the VMs. Unlike AWS and Azure, this requires individual setup (instead of being packaged into a deployment), but the documentation takes you step-by-step through the process.

Google Cloud Scheduler relies on instance names instead of tags by default, though the functions are all made available for you to modify as you need. The settings are all built in to those functions, which makes updating or modifying much more complicated than the other services. There’s also no real UI available, and the out-of-the-box experience is fairly limited in scope.

Cloud Native or Third Party?

Each of the instance scheduler tools provided by the cloud providers has a few limitations. One possible dealbreaker is that none of these tools are multi-cloud capable, so if your organization uses multiple public clouds then you may need to go for a third-party tool. They also don’t provide a self-service UI, built-in RBAC capabilities, Single Sign-On, or reporting capabilities. When it comes to cost, all of these tools are “free”, but you end up paying for the deployed infrastructure and services that are used, so the cost can be very hard to pin down.

We built ParkMyCloud to solve the instance scheduler problem (now with rightsizing too). Here’s how the functionality stacks up against the cloud-native options:

 

AWS Instance Scheduler Microsoft Azure Automation Google Cloud Scheduler ParkMyCloud
Virtual Machine scheduling
Database scheduling
Scale Set scheduling
Tag-based scheduling
Usage-based recommendations
Simple UI
Resize instances
Override Schedules
Reporting
Start/Stop notifications
Multi-Account
Multi-Cloud

Overall, the cloud-native instance scheduler tools can help you get started on your cost-saving journey, but may not fulfill your longer-term requirements due to their limitations.

Try ParkMyCloud with a free trial — we think you’ll find that it meets your needs in the long run.  

EC2 Instance Hibernation: Bridging the Gap with Spot

EC2 Instance Hibernation: Bridging the Gap with Spot

Amidst the truckload of announcements from AWS around re:Invent this year, one that caught my attention was the ability to perform EC2 instance hibernation. This isn’t going to be directly applicable to all workloads or all businesses, but it provides a needed way to bridge the gap between On-Demand EC2 and Spot instances. By having this option, it should be easier to go between both compute choices to solve more business cases.

Spot Instances 101

One way AWS helps you save money is by letting you utilize spare compute resources as instances called Spot. There’s a whole economy around Spot Instances, as the price can go up or down based on free resources in AWS data centers. To purchase Spot Instances, you establish your bid price, and if the price of your desired instance goes under the bid, then you get the resources. The biggest catch is that once the price goes above your bid price, your gets stopped in the middle of what it was doing.

This behavior means that you need to have workloads that can be paused. One big consideration is that you don’t want to have time-sensitive workloads operating in this environment, as it may take longer to complete the overall task if the processes keep getting interrupted. This also means that you’ll want to build your subtasks and processes in a way that they can be interrupted without breaking horribly.

Interruptible Workloads On-Demand

Now, with the ability to perform EC2 instance hibernation, the processes that you’ve already made interruptible can run on demand, with you choosing when to pause those workloads. By having this flexibility, you can eliminate the concern of not finishing a task before a desired date that comes with Spot instances, but still have the ability to switch to Spot (or out of Spot) if desired. This combines some of the best aspects of Spot and On-Demand Instances.

In addition to the benefit of workloads completing on your timetable, you can also utilize hibernation to pre-warm EC2 instances that have apps that might take a while to spin up. This can be especially true for memory-intensive applications, as any data that was in memory prior to hibernation will be immediately available upon restart. You could even use this as a workaround to long warm-up times for AWS Lambda functions, as instead of waiting for the Lambda to spin up, your instance could be running your function locally with everything pre-loaded.

EC2 Instance Hibernation: Supercharging Spot

Last year, AWS added the ability to hibernate Spot instances, which changed the game on how you plan your Spot workloads. Now, with EC2 Instance hibernation, you can take your workload management to the next level by having a wider array of options available to you.

This kind of hibernation seems like a great fit for image processing, video encoding, or after-hours high performance computing. Got any other good ideas or use-cases for EC2 instance hibernation? Let us know what you think!

How to Automate and Secure Your Environment with AWS Server Fleet Management

How to Automate and Secure Your Environment with AWS Server Fleet Management

AWS recently announced a combination of AWS Systems Manager and Amazon Inspector into a new offering called AWS Server Fleet Management. The goal of this service is to provide a way to secure, automate, and configure a large array of servers through multiple AWS services all working together. Some enterprises already have a config management tool in place, but might be looking for a more AWS-centric way to manage their numerous EC2 servers. Let’s look at how Server Fleet Management works, how it stacks up against other config management tools, and some of the pros and cons of using this solution.

How It Works

AWS Server Fleet Management utilizes quite a few AWS services under the hood.  The good news is that you don’t have to deploy these services manually, as there’s a Cloudformation template available that will build the entire stack for you. The services include:

  • Amazon Cloudwatch – for kicking off events to trigger other services
  • Amazon Inspector – manages the assessment rules for configuration and security
  • Amazon SNS – message queue for tracking instance IDs and email addresses
  • Amazon Lambda – various tasks, including querying Inspector and updating Systems Manager
  • AWS Systems Manager – tracks inventory and configuration for EC2 instances and manages OS patches
  • Amazon S3 – secure storage of artifacts

Before deploying the Cloudformation stack, you’ll need to enter a few configuration details. The main configuration detail is the “Managed Instances Tag Value”, which is the tag on your EC2 servers that you’ll place if you want them managed via Server Fleet Management. This can work in conjunction with the “Patch Group” tag in AWS Systems Manager if you want the instance to be automatically patched. Once you specify the tag, an email address, and whether you want a sample fleet to be deployed, you’re ready to create the stack!

Comparison to other tools

In the config management world, there are a few major players, including Chef, Puppet, Ansible, and SaltStack. From a purely configuration perspective, Server Fleet Management doesn’t offer anything new. However, if you’re fully bought-in to running everything within AWS, the flexibility of using Lambda functions in addition to other AWS services can be a huge advantage. On the flip side of that, enterprises that are multi-cloud may want to keep using a cloud-agnostic tool.

Pros and Cons

Along with the possible benefit of being purely within the AWS ecosystem, another major pro of AWS Server Fleet Management is the combination of security enforcement and patch management. Solving both of those problems often requires multiple tools, so this can trim down your list of applications. This solution also has lots of opportunities to tie into other existing AWS solutions or to be customized to fit your use cases.

The expandability can also be considered a con, as the built-in uses are fairly specific and require more customization for larger fleets. Some things that aren’t included are topics like cost management (we’ve got you covered), non-EC2 services that need security audits, application grouping, and cross-account access. There also aren’t any built-in hooks to existing config management tools that are likely already in use.

Automated Security and Patching

All in all, AWS Server Fleet Management is worth looking into if you’ve got a large EC2 deployment. Even if you don’t use the pre-made stack, it might give you some ideas on how to use the underlying AWS services to help secure and manage your fleet. With the included sample fleet, it’s easy to get it set up and try it out!

3 Ways To Use Google Cloud Cron for Automation

3 Ways To Use Google Cloud Cron for Automation

If you use DevOps processes, automation and orchestration are king — which is why the Google Cloud cron service can be a great tool for managing your Google Compute Engine instances via Google App Engine code. This kind of automation can often involve multiple Google Cloud services, which is great for learning about them or running scheduled tasks that might need to touch multiple instances.  Here are a few ideas on how to use the Google Cloud cron service:

1. Automated Snapshots

Since Google Compute Engine lets you take incremental snapshots of the attached disks, you can use the Google App Engine cron to take these snapshots on a daily or weekly basis. This lets you go back in time on any of your compute instances if you mess something up or have some systems fail. If you use Google’s Pub/Sub service, you can have the snapshots take place on all instances that are subscribed to that topic.

As a bonus, you can use a similar idea to manage old snapshots and deleting things you don’t need anymore. For example, schedule a Google Cloud cron to clean up snapshots three months after a server is decommissioned, or to migrate those snapshots to long-term storage.

2. Autoscaling a Kubernetes Cluster

With Google on the forefront of Kubernetes development, many GCP users make heavy use of GKE, the managed Kubernetes service. In order to save some money and make sure your containers aren’t running when they aren’t needed, you could set up a cron job to run at 5:00 p.m. each weekday to scale down your Kubernetes cluster to a size of 0. For maximum cost savings, you can just leave it off until you need it, then manually spin up the cluster, or you could use a second cron to spin you clusters up at 8:00 a.m. so it’s ready for the day.

(By the way — we’re working on functionality to let you do this automatically in ParkMyCloud, just like you can for VMs. Interested? Let us know & we’ll notify you on release.)

3. Send Weekly Reports

Is your boss hounding you for updates? Does your team need to know the status of the service? Is your finance group wondering how your GCP costs are trending for this week? Automate these reports using the Google Cloud cron service! You can gather the info needed and post these reports to a Pub/Sub topic, send them out directly, or display it on your internal dashboard or charting tool for mass consumption. These reports can be for various metrics or services, including Google Compute, Cloud SQL, or your billing information for your various projects.

Other Google Cloud Cron Ideas? Think Outside The Box!

Got any other ideas or existing uses to use the Google Cloud cron service to automate your Google Cloud environment? Let us know how you’re using it and why it helps you manage your cloud infrastructure.