Like other cloud providers, the Google Cloud Platform (GCP) charges for compute virtual machine instances by the amount of time they are running — which may lead you to search for a Google Cloud instance scheduling solution. If your GCP instances are only busy during or after normal business hours, or only at certain times of the week or month, you can save money by shutting these instances down when they are not being used. So can you set up this scheduling through the Google Cloud console? And if not – what’s the best way to do it?
This post was originally written by Bill Supernor in 2018. I have revised and updated it for 2020.
Why bother scheduling a Google VM to turn off?
As mentioned, depending on your purchasing option, Google Cloud pricing is based on the amount of time an instance is running, charged at a per-second rate. We find that at least 40% of an organization’s cloud resources (and often much more) are for non-production purposes such as development, testing, staging, and QA. These resources are only needed when employees are actively using them for those purposes — so every second that they are left running when not being used is wasted spend. Since non-production VM instances often have predictable workloads, such as a 7 AM to 7 PM work week, 5 days a week, the other 64% of spend is completely wasted. Inconceivable!
The good news is, that means these resources can be scheduled to turn off during nights and weekends to save money. So, let’s take a look at a couple of cloud scheduling options.
Scheduling Option 1: GCP set-scheduling Command
If you were to do a Google search on “google cloud instance scheduling,” hoping to find out how to shut your compute instances down when they are not in use, you would see numerous promising links. The first couple of references appear to discuss how to set instance availability policies and mention a gcloud command line interface for “compute instances set-scheduling”. However, a little digging shows that these interfaces and commands simply describe how to fine-tune what happens when the underlying hardware for your Google virtual machine goes down for maintenance. The options in this case are to migrate the VM to another host (which appears to be a live migration), or to terminate the VM, and if the instance should be restarted if it is terminated. The documentation for the command goes so far as to say that the command is intended to let you set “scheduling options.” While it is great to have control over these behaviors, I feel I have to paraphrase Inigo Montoya – You keep using that word “scheduling” – I do not think it means what you think it means…
Scheduling Option 2: GCP Compute Task Scheduling
The next thing that looks schedule-like is the GCP Cron Service. This is a highly reliable networked version of the Unix cron service, letting you leverage the Google App Engine service to do all sorts of interesting things. One article describes how to use the Cron Service and Google App Engine to schedule tasks to execute on your Compute Instances. With some App Engine code, you could use this system to start and stop instances as part of regularly recurring task sequences. This could be an excellent technique for controlling instances for scheduled builds, or calculations that happen at the same time of a day/week/month/etc.
While very useful for certain tasks, this technique really lacks flexibility. Google Cloud Cron Service schedules are configured by creating a cron.yaml file inside the app engine application. The GCP Cron Service triggers events in the application, and getting the application to do things like start/stop instances are left as an exercise for the developer. If you need to modify the schedule, you need to go back in and modify the cron.yaml. Also, it can be non-intuitive to build a schedule around your working hours, in that you would need one event for when you want to start an instance, and another when you want to stop it. If you want to set multiple instances to be on different schedules, they would each need to have their own events. This brings us to the final issue, which is that any given application is limited to 20 events for free, up to a maximum of 250 events for a paid application. Those sound like some eel-infested waters.
Scheduling Option 3: ParkMyCloud Google Cloud Instance Scheduling
Google Cloud Platform and ParkMyCloud – mawwage – that dweam within a dweam….
Given the lack of other viable instance scheduling options, we at ParkMyCloud created a SaaS application to automate instance scheduling, helping organizations cut cloud costs by 65% or more on their monthly cloud bill with AWS, Azure, and, of course, Google Cloud.
We aim to provide a number of benefits that you won’t find with, say, the GCP Cron Service. ParkMyCloud’s cloud management software:
Automates the process of switching non-production instances on and off with a simple, easy-to-use platform – more reliable than the manual process of switching GCP Compute instances off via the GCP console. Automatic on/off schedules make resource states easy to manage.
Provides a single-pane-of-glass view, allowing you to consolidate multiple clouds, multiple accounts within each cloud, and multiple regions within each account, all in one easy-to-use interface.
Does not require a developer background, coding, or custom scripting. It is also more flexible and cost-effective than having developers write scheduling scripts.
Avoids the hard-coded schedules of the Cron Service. Users of ParkMyCloud’s GCP scheduler can temporarily override schedules if they need to use an instance on short notice.
Supports Teams and User Roles (with optional SSO), ensuring users will only have access to the resources you grant.
Helps you identify idle instances by monitoring instance performance metrics, displaying utilization heatmaps, and automatically generating utilization-based “SmartParking” schedule recommendations, which you can accept or modify as you wish...
Provides “rightsizing” recommendations to identify resources that are routinely underutilized and can be converted to a different Google Cloud server size to save 50-75% of the cost of the resource. These recommendations incorporate custom GCP sizes, so you can adjust specifics around memory and CPU independent of each other.
Has a 14-day free trial, so you can try the GCP cloud scheduler platform out in your own environment. There’s also a free-forever tier, useful for startups and those on the Google Cloud free tier, as well as paid tiers with more advanced options for enterprises with a larger Google Cloud footprint.
Supports multiple GCP products, including Virtual Machines, CloudSQL databases, Autoscaling Groups, and GKE clusters and nodes.
Notifies users and admins of resource shutdowns, startups, and actions taken via Google Hangouts, Slack, Microsoft Teams, or Email.
How Much Can You Save with Google Cloud Scheduling?
While it depends on your exact schedule, many non-production Google Cloud VMs – those used for development, testing, staging, and QA – can be turned off for 12 hours/day on weekdays, and 24 hours/day on weekends. For example, the resource might be running from 7 AM to 7 PM Monday through Friday, and “parked” the rest of the week. This comes out to about 64% savings per resource.
Currently, the average savings per scheduled VM in the ParkMyCloud platform is about $245/month. That does not account for any additional savings gained from rightsizing.
How Enterprises Are Benefitting from ParkMyCloud’s Google Cloud Scheduler + Optimizer
If you’re not quite ready to start your own trial, check out this interview with Workfront, a work management software provider. Workfront uses both AWS and Google Cloud Compute Engine, and needed to coordinate cloud management software across both public clouds. They required automation in order to optimize and control cloud resource costs, especially given users’ tendency to leave resources running when they weren’t being used.
Workfront found that ParkMyCloud would meet their automatic scheduling needs. Now, 200 users throughout the company use ParkMyCloud to:
Get recommendations of resources that are not being used 24×7, and use policies to automatically apply on/off schedules to them
Get notifications and control the state of their resources through Slack
Easily report savings to management
Save hundreds of thousands per year
Ways to Save on Google Cloud VMs, Beyond Scheduling
Google has done a great job of creating offerings for customers to save money through regular cloud usage. The two you’ll see mentioned the most are sustained use discounts and committed use discounts. Sustained use discounts give Google Cloud users automatic discounts the longer an instance is run. This post outlines the break-even points between letting an instance run for the discount vs. parking it. Sustained use discounts have also been expanded with resource-based pricing, which allows the sustained use to be applied based on your use of individual vCPUs and GB of memory regardless of the machine type you use.
Committed use discounts, on the other hand, require an upfront commitment for 1 or 3 years’ usage. We have found that they’re best applicable for predictable workloads such as production environments. There are also the pre-emptible VMs, which are offered at a discount from on demand VMs in exchange for being short-lived – up to 24 hours.
In addition to these discounts, you can also save money by rightsizing your instances. Provisioning your resources to be larger than you need is another form of cloud waste. On average, rightsizing a resource reduces the cost by 50%. Google Cloud makes it easy to change the CPU or the Memory amounts using custom instance sizes. If you’d rather use standard sizing, they offer that as well. By keeping an eye on the usage patterns of your servers, you can make sure that you’re getting the most use of the resources you are paying for.
How to Create a Google Cloud Schedule with ParkMyCloud
Getting started with ParkMyCloud is easy. Simply register for a free trial with your email address and connect to your Google Cloud Platform to allow ParkMyCloud to discover and manage your resources. A 14-day free trial free gives your organization the opportunity to evaluate the benefits of ParkMyCloud while you only pay for the cloud computing power you use. At the end of the trial, there is no obligation on you to continue with our service, and all the money your organization has saved is, of course, yours to keep.
If you do choose to continue, our Google Cloud scheduler/optimizer pricing is straightforward. You will choose a functionality tier and pay per resource per month. There is a free forever tier available – so have at it.
ITSM tools like ServiceNow have capabilities for creating approval workflows and processes for change control tracking such as cloud resource provisioning. Managers and C-level executives love the governance this provides, as they can use this to make sure they have full compliance with regulations and laws while also preventing rogue IT usage across the enterprise. Without these approval workflows, a user could provision a giant virtual machine without anyone knowing about it, or a firewall change could inadvertently take down your network. Surely these are good uses of approvals, right?
Training Users To Spend Money
One downside I’ve covered before is that these processes often slow users down, wasting time or leading them to circumvent the system altogether. Another consequence of using approval workflows for cloud resource provisioning is that it trains your users to always provision as much as possible. For example, let’s say you have an approval workflow that says your cloud operations team must approve all new VM requests in AWS, as well as all VM size change requests. If I’m a user, and I’m on the fence about needing an m5.large or an m5.xlarge server, I might as well request an m5.xlarge server now instead of having to submit another request to change the size if I need it. Each size up doubles the cost – so this single VM is now costing the company twice as much just because the user doesn’t want to go through additional approvals!
Let’s look at another example that we at ParkMyCloud are very familiar with, which is turning off resources overnight. An organization might set up a workflow that resources with a schedule that shuts them down overnight requires approval to turn back on outside of business hours. If a user has the choice of not applying that schedule versus needing approval to turn it on if they need it, they’ll do whatever it takes to not schedule that resource. This leads to unnecessary cloud waste.
Give Users The Tools To Succeed
At ParkMyCloud, we believe in guardrails and RBAC. However, we also know that empowering users to manage their own servers and databases through a self-service portal leads to more scheduling, rightsizing, and cost savings. By allowing a user to override a scheduled instance for a couple of hours when they need it instead of hassling their IT manager for approval, a user can get their job done while still allowing the instance to shut down.
Approval workflows for cloud resource provisioning, resource scheduling, and other infrastructure tasks do help to a certain extent. Compliance and regulatory adherence is a necessity. With that said, don’t be the CloudOps team that starts putting approval processes on every minor detail or dev server, or you’re going to hinder your users more than help them. Cost savings is a team effort, so when you implement tools, look for a self-service model like ParkMyCloud’s to help you manage resources while empowering users. You’ll save money while making users happy, which is a win-win for your enterprise.
Kubernetes – The Latest Holy War
Kubernetes v1.0 was released in 2015, after being developed and used internally at Google since 2014. It quickly emerged as the most popular way to manage and orchestrate large numbers of containers, despite competition from Docker Swarm, Apache Mesos, Hashicorp’s Nomad, and various other software from the usual suspects at IBM, HP, and Microsoft. Google partnered with the Linux Foundation in 2015 to form the Cloud Native Computing Foundation (CNCF) and had Kubernetes as one of the seed technologies.
This public dedication to open-source technology gave Kubernetes instant nerd bonus points, and it being used internally at one of the largest software companies in the world made it the hot new thing. Soon, there were thousands of articles, tweets, blog posts, and conference talks about moving to a microservices architecture built on containers using Kubernetes to manage the pods and services. It wasn’t long before misguided job postings were looking for 10+ years of Kubernetes experience and every startup pivoted from their blockchain-based app to a Kubernetes-based app overnight.
As with any tech fad, the counterarguments started to mount while new fads rose up. Serverless became the new thing, but those who put their eggs in the Kubernetes basket resisted the shift from containers to functions. Zealots from the serverless side argued that you can ditch your entire code base and move to Lambda or Azure scripts, while fanatics of Kubernetes said that you didn’t need functions-as-a-service if you just packaged up your entire OS into a container and just spun up a million of them. So, do you need Kubernetes?
You’re (Probably) Not Google
Here’s the big thing that gets missed when a huge company open-sources their internal tooling – you’re most likely not on their scale. You don’t have the same resources, or the same problems as that huge company. Sure, you are working your hardest to make your company so big that you have the same scaling problems as Google, but you’re probably not there yet. Don’t get me wrong: I love when large enterprises open-source some of their internal tooling (such as Netflix or Amazon), as it’s beneficial to the open-source community and it’s a great learning opportunity, but I have to remind myself that they are solving a fundamentally different problem than I am.
While I’m not suggesting that you avoid planning ahead for scalability, getting something like Kubernetes set up and configured instead of developing your main business application can waste valuable time and funds. There’s a lot of overhead with learning, deploying, and managing Kubernetes that companies like Google can afford. If you can get the same effect from an autoscaling group of VMs with less headache, why wouldn’t you go that route? Remember: something like 60% of global AWS spend is on EC2, and with good reason. You can get surprisingly far using tried-and-true technologies and basics without having to rip everything out and implement the latest fad, which is why Kubernetes (or serverless, or blockchain, or multi-cloud…) shouldn’t be your focus. Kubernetes certainly has its place, and can be the tool you need. But most likely, it’s making things more complex for you without a corresponding benefit.
When most enterprise users hear that their organization will start heavily using ServiceNow governance, they assume that their job is about to get much harder, not easier. This stems from admins putting overly-restrictive policies in place, even with the good intentions of preventing security or financial problems. The negative side effect of this often manifests itself as a huge limitation for users who are just trying to do their job. Ultimately, this can lead to “shadow IT”, angry users, and inefficient business processes. So how can you use ServiceNow governance to increase efficiency rather than prohibit it?
What is ServiceNow governance?
One of the main features of ServiceNow is the ability to implement processes for approvals, requests, and delegation. Governance in ServiceNow includes the policies and definitions of how decisions are made and who can make those decisions. For example, if a user needs a new virtual machine in AWS, they can be required to request one through the ServiceNow portal. Depending on the choices made during this request, cloud admins or finance team members can be alerted to this request and be asked to approve the request before it is carried out. Once approved, the VM will have specific tags and configuration options that match compliance and risk profiles.
What Drives Governance?
Governance policies are implemented with some presumably well-intentioned business goal in mind. Some organizations are trying to keep risk managed through approvals and visibility. Others are trying to rein in IT spending by guiding users to lower-cost alternatives to what they were requesting.
Too often, to the end user, the goal gets lost behind the frustration of actions being slowed, blocked, or redirected by the (beautifully automated) red tape. Admins lose sight of the central business needs while implementing a governance structure that is trying to protect those business needs. For users to comply with these policies, it’s crucial that they understand the motivations behind them – so they don’t work around them.
In practice, this turns into a balancing act. The guiding question that needs to be answered by ServiceNow governance is, “How can we enable our users to do their jobs while preventing costly (or risky) behavior?”
Additionally, it’s critical that new policies are clearly communicated, and that they hook into existing processes. Not to say that this is easy. To be done well, it requires a team of technical and business stakeholders to provide their needs and perspectives. Knowing the technical possibilities and limitations must match up with the business needs and overall organizational plans, while avoiding roadblocks and managing edge cases. There’s a lot to mesh together, and each organization has unique challenges and desires, which makes this whole process hard to generalize.
The End Result
At ParkMyCloud, we try to help facilitate these kinds of governance frameworks. The ParkMyCloud application allows admins to set permissions on users and give access to teams. By reading from resource tags, existing processes for tagging and naming can be utilized. New processes around resource schedule management can be easily communicated via chat or email notifications. Users get the access they need to keep doing their job, but don’t get more access than required. Employing similar ideas in your ServiceNow governance policies can make your users successful and your admins happy.
Google Cloud IAM (Identity and Access Management) is the core component of Google Cloud that keeps you secure. By adopting the “principle of least privilege” methodology, you can work towards having your infrastructure be only accessible by those who need it. As your organization grows in size, the idea of keeping your IAM permissions correct can seem daunting, so here’s a checklist of what you should think about prior to changing permissions. This can also help you as you continuously enforce your access management.
1. Who? (The “Identity”)
Narrowing down the person or thing who will be accessing resources is the first step in granting IAM permissions. This can be one of several options, including:
A Google account (usually used by a human)
A service account (usually used by a script/tool)
A Google group
A G-Suite domain
Our biggest recommendation for this step is to keep this limited to as few identities as possible. While you may need to assign permissions to a larger group, it’s much safer to start with a smaller subset and add permissions as necessary over time. Consider whether this is an automated task or a real person using the access as well, since service accounts with distinct uses makes it easier to track and limit those accounts.
2. What Access? (The “Role”)
Google Cloud permissions often correspond directly with a specific Google Cloud REST API method. These permissions are named based on the GCP service, the specific resource, and the verb that is being allowed. For example, ParkMyCloud requires a permission named “compute.instances.start” in order to issue a start command to Google Compute Engine instances.
These permissions are not granted directly, but instead are included in a role that gets assigned to the identity you’ve chosen. There are three different types of roles:
Primitive Roles – These specific roles (Owner, Editor, and Viewer) include a huge amount of permissions across all GCP services, and should be avoided in favor of more specific roles based on need.
Predefined Roles – Google provides many roles that describe a collection of permissions for a specific service, like “roles/cloudsql.client” (which includes the permissions “cloudsql.instances.connect” and “cloudsql.instances.get”). Some roles are broad, while others are limited.
Custom Roles – If a predefined role doesn’t exist that matches what you need, you can create a custom role that includes a list of specific permissions.
Our recommendation for this step is to use a predefined role where possible, but don’t hesitate to use a custom role. The ParkMyCloud setup has a custom role that specifically lists the exact REST API commands that are used by the system. This ensures that there are no possible ways for our platform to do anything that you don’t intend. When following the “least privilege” methodology, you will find that custom roles are often used.
3. Which Item? (The “Resource”)
Once you’ve decided on the identity and the permissions, you’ll need to assign those permissions to a resource using a Cloud IAM policy. A resource can be very granular or very broad, including things like:
Single Compute Engine instances
Cloud Storage buckets
Each predefined role has a “lowest level” of resource that can be set. For example, the “App Engine Admin” role must be set at the project level, but the “Compute Load Balancer Admin” can be set at the compute instance level. You can always go higher up the resource hierarchy than the minimum. In the hierarchy, you have individual service resources, which all belong to a project, which can either be a part of a folder (in an organization) or directly a part of the organization.
Our recommendation, as with the Identity question, is to limit this to as few resources as possible. In practice, this might mean making a separate project to group together resources so you can assign a project-level role to an identity. Alternatively, you can just select a few resources within a project, or even an individual resource if possible.
And That’s All That IAM
These three questions provide the crucial decisions that you must make regarding Google Cloud IAM assignments. By thinking through these items, you can ensure that security is higher and risks are lower. For an example of how ParkMyCloud recommends a custom role assigned to a new service account in order to schedule and resize your VMs and databases, check out the documentation for ParkMyCloud GCP access, and sign up for a free trial today to get it connected securely to your environment.