Google Cloud IAM (Identity and Access Management) is the core component of Google Cloud that keeps you secure. By adopting the “principle of least privilege” methodology, you can work towards having your infrastructure be only accessible by those who need it. As your organization grows in size, the idea of keeping your IAM permissions correct can seem daunting, so here’s a checklist of what you should think about prior to changing permissions. This can also help you as you continuously enforce your access management.
1. Who? (The “Identity”)
Narrowing down the person or thing who will be accessing resources is the first step in granting IAM permissions. This can be one of several options, including:
A Google account (usually used by a human)
A service account (usually used by a script/tool)
A Google group
A G-Suite domain
Our biggest recommendation for this step is to keep this limited to as few identities as possible. While you may need to assign permissions to a larger group, it’s much safer to start with a smaller subset and add permissions as necessary over time. Consider whether this is an automated task or a real person using the access as well, since service accounts with distinct uses makes it easier to track and limit those accounts.
2. What Access? (The “Role”)
Google Cloud permissions often correspond directly with a specific Google Cloud REST API method. These permissions are named based on the GCP service, the specific resource, and the verb that is being allowed. For example, ParkMyCloud requires a permission named “compute.instances.start” in order to issue a start command to Google Compute Engine instances.
These permissions are not granted directly, but instead are included in a role that gets assigned to the identity you’ve chosen. There are three different types of roles:
Primitive Roles – These specific roles (Owner, Editor, and Viewer) include a huge amount of permissions across all GCP services, and should be avoided in favor of more specific roles based on need.
Predefined Roles – Google provides many roles that describe a collection of permissions for a specific service, like “roles/cloudsql.client” (which includes the permissions “cloudsql.instances.connect” and “cloudsql.instances.get”). Some roles are broad, while others are limited.
Custom Roles – If a predefined role doesn’t exist that matches what you need, you can create a custom role that includes a list of specific permissions.
Our recommendation for this step is to use a predefined role where possible, but don’t hesitate to use a custom role. The ParkMyCloud setup has a custom role that specifically lists the exact REST API commands that are used by the system. This ensures that there are no possible ways for our platform to do anything that you don’t intend. When following the “least privilege” methodology, you will find that custom roles are often used.
3. Which Item? (The “Resource”)
Once you’ve decided on the identity and the permissions, you’ll need to assign those permissions to a resource using a Cloud IAM policy. A resource can be very granular or very broad, including things like:
Single Compute Engine instances
Cloud Storage buckets
Each predefined role has a “lowest level” of resource that can be set. For example, the “App Engine Admin” role must be set at the project level, but the “Compute Load Balancer Admin” can be set at the compute instance level. You can always go higher up the resource hierarchy than the minimum. In the hierarchy, you have individual service resources, which all belong to a project, which can either be a part of a folder (in an organization) or directly a part of the organization.
Our recommendation, as with the Identity question, is to limit this to as few resources as possible. In practice, this might mean making a separate project to group together resources so you can assign a project-level role to an identity. Alternatively, you can just select a few resources within a project, or even an individual resource if possible.
And That’s All That IAM
These three questions provide the crucial decisions that you must make regarding Google Cloud IAM assignments. By thinking through these items, you can ensure that security is higher and risks are lower. For an example of how ParkMyCloud recommends a custom role assigned to a new service account in order to schedule and resize your VMs and databases, check out the documentation for ParkMyCloud GCP access, and sign up for a free trial today to get it connected securely to your environment.
Analysts are reporting that IT budget cuts are expected to continue, dropping 5-8% this year overall. That puts IT departments in a difficult position: what should they cut, and how? While there is no magic bullet, there are places to trim the fat that will require no sacrifice and make no impact on operations.
Public Cloud Spend is High – And Users Want to Optimize
The largest cost in many enterprises’ IT budget is, of course, labor. You already know that the layoffs are happening and that engineering and operations departments are not immune. Whether you’re trying to avoid layoffs or trying to make the most of a reduced budget and workforce after them, you can look at other portions of your budget, including public cloud – often ranked the third-highest area of spend.
Even before COVID-19 wreaked havoc on businesses the world over, cloud customers ranked cloud cost optimization as a priority. Like water and electricity in your home, public cloud is a utility. It needs to be turned off when not being used.
This is made more urgent by today’s economic climate. There’s a lot of pressure in certain verticals, industries, and enterprises to reduce cloud spend and overall operational expenditures.
The Least Controversial Fix: Wasted Cloud Spend
There’s a reason “optimization” is so important: it implies waste. That faucet running when no one’s in the room – there’s simply no reason for the spend, which makes it an “easy” fix. No one will miss it.
The first step is identifying the waste. We estimate that almost $18 billion will be wasted this year in two major categories. The first is idle resources – these are resources being paid for by the hour, minute, or second, that are not actually being used every hour, minute, or second. The most common type is non-production resources provisioned for development, staging, testing, and QA, which are often only used during a 40-hour work week, That means that for the other 128 hours of the week, the resources sit idle, but are still paid for.
The second-largest swath of wasted spend is overprovisioned infrastructure — that is, paying for resources that are larger in capacity than needed. About 40% of instances are oversized. Just by reducing an instance by one size, the cost is reduced by 50%. Or look at it the other way – every size up costs you double.
Other sources of waste not included in this calculation include orphaned volumes, inefficient containerization, underutilized databases, instances running on legacy resource types, unused reserved instances, and more.
How to Activate Optimization
Cutting this waste from your budget is an opportunity to keep the spend you actually need, and make more investment in applications to produce revenue for your business. The people who use this infrastructure on a daily basis need to get on board, and that can be challenging.
The key to taking action to address this wasted spend is to bridge the gap between the people who care about the cloud bill – Finance, IT, etc. – and the people working in the cloud infrastructure every day – the app owners, the lines of business, developers, engineers, testers, people in QA, DevOps, SREs, etc. Those internal “end users” need a self-service tool or platform to take action.
However, app owners have a stack of priorities ahead of cost, and a lack of time to evaluate solutions. Ideally, the cloud operations team will administer a platform, and have that platform enable the app owners or lines of business to take actions, make changes, based on recommendations from that platform. Then you get Finance and IT to see a reducing – or at least flat – cloud bill, with optimized costs.
For an example of how enterprise Cloud Operations departments can approach this, learn from Sysco. They deployed ParkMyCloud to hundreds of app owners and end users across the globe, and used gamification to get them all on board with reducing costs.
Over the past five years, we’ve seen the challenges of cloud computing evolve – but ultimately, their core needs are the same as ever.
It’s interesting to experience and observe how these needs get translated into products, both our own and others. Depending on company growth and culture, Build ↔ Measure ↔ Learn cycles can continue to turn in a rapid fashion as ideas get adopted and refined over time. Or, they can become bogged down in supporting a large and often demanding installed base of customers.
In a few short years, tools built for optimizing public cloud have evolved into a number of sub-segments, each of which in turn has developed to meet customer needs. In part, this reflects a predictable maturation of enterprises using cloud computing as they have migrated from on prem to public or hybrid cloud, and adopted best practices to enhance performance and security while tackling overall growth in monthly spend.
How This Year’s State of the Cloud Report Stacks Up
Flicking through various analyst reports and “Cool Vendor” white papers, it’s fascinating to see how quickly cool becomes uncool as industry needs develop. Being a social scientist by training, longitudinal panel type surveys always grab my attention. RightScale/Flexera’s annual customer survey ticks a few of these boxes. No doubt the participants have changed, but it likely provides a valuable source of data on customer needs and challenges in cloud computing.
You do not need to go back to the first RightScale survey in 2013 to see some big changes. Even when comparing the 2016 to the 2020 survey in terms of company priorities in the cloud, it’s hard to believe that just a few years ago, the number one challenge regardless of maturity was related to a skills/resources gap. These priorities were followed by security and compliance issues, with the focus on cost optimization being a lower priority and then only for those at a more mature state. Roll forward to 2020 and cost management is now the number one cloud initiative in all but the most recent adopters of cloud where it sits at number two. Interestingly, security seems to have dropped off the top-five list. Governance has held on, although likely headed the same way. Conversely, cost optimization now sits atop all other initiatives.
Why Is Cost Optimization Still #1?
What seems to be apparent when reading between the lines of such reports and when talking with customers is that unlike migration, security, and governance, there are still some large holes in companies practices when it comes to optimization and reducing cloud waste. Despite a plethora of tools on offer in 2020 that offer to bring visibility and cost management that the overall cloud waste number is actually still growing as infrastructure grows.
More money has been spent tackling security and governance issues – and these challenges in cloud computing need to be dealt with. But cost optimization can deliver ROI to free up budget to deal with these issues.
In the wake of COVID-19, finance teams across the world will now be sharpening their pencils and looking more aggressively at such measures. While cloud spending may rise, Gartner and IDC have both forecasted overall IT spending to drop 5-8%
Yes, You Can Optimize Now
As with security and governance, a mix of human behavioral and business process changes will be required, both of which can be supported by effective tooling, both native cloud provider and 3rd party ISV tools. Incentives to implement such changes are likely to be higher than in the past, albeit in a more cash-constrained world where low cost, ease of use, and most of all, quantifiable ROI will be prioritized. It has always appeared to me somewhat oxymoronic when I hear promises of reducing cloud waste through the use of expensive cloud management tools that charge based on a percent of your spend.
I foresee a wave of low cost, multi-cloud, and simple to use tools emerging. These tools will need to demonstrate a rapid ROI and be built to be used across engineering and operations (not just in the offices of the CIO/CTO/CFO) to ensure the self-service nature of cloud is not disrupted. A similar pattern will emerge as these tools become part of day-to-day cloud operations where cost optimization is part of the cloud culture. With this the need for specific cost optimization initiatives should be replaced by a new wave of needs, like application resource management.
There’s a vast amount of available resources that give advice on Azure best practices. Based on recent recommendations given by experts in the field, we’ve put together this list of 10 of the best practices for 2020 to help you fully utilize and optimize your Azure environment.
1. Ensure Your Azure VMs are the Correct Size
“There are default VM sizes depending on the image that you choose and the affected Region so be careful and check if the proposed one is really what you need. The majority of the times you can reduce the size to something that fits you better and at a lower cost.”
2. If you use the Azure Cost Management Tool, Know the Limitations
Azure Cost Management can be a useful tool in your arsenal: “Listed as “cost management + billing” in the Azure portal, the Azure Cost Management service’s cost analysis feature offers comprehensive insights into the costs incurred by your Azure resources—starting from the subscription level. This can then be drilled down to specific resource groups and/or resources. The service also provides an overview of current costs as well as a monthly forecast based on the current consumption rate.”
However, know that visibility and action are not equivalent: “Even though [cloud efficiency] is a core tenant of Microsoft Azure Cost Management, optimization is one of the weakest features of the product. The essence of the documentation around this is that you should manually eliminate waste, without going into much detail about what is being wasted or how to eliminate it. Plus, this expects manual intervention and review of each resource without giving direct actions to eliminate the waste.”
3. Approach Role-Based Access Control (RBAC) Systematically
“Using Azure RBAC, you can segregate duties within your team and grant only the amount of access to users that they need to perform their jobs. Instead of giving everybody unrestricted permissions in your Azure subscription or resources, you can allow only certain actions at a particular scope.”
“Even with these specific pre-defined roles, the principle of least privilege shows that you’re almost always giving more access than is truly needed. For even more granular permissions, you can create Azure custom roles and list specific commands that can be run.”
“When you delete a virtual machine in Azure, by default, in order to protect against data loss, any disks that are attached to the VM aren’t deleted. One thing to remember is that after a VM is deleted, you will continue to pay for these “orphaned” unattached disks. In order to minimise storage costs, make sure that you identify and remove any orphaned disk resource.”
“Centralize tagging across your Azure environments. This enables you to discover, group and consistently tag cloud resources across your cloud providers – manually or through automated tag rules. Maintaining a consistent tagging structure allows you to see resource information from all cloud providers for enhanced governance, cost analytics and chargeback.”
6. Decipher how and when to utilize the Azure logging services
“Logs are a major factor when it comes to successful cloud management. Azure users can access a variety of native logging services to maintain reliable and secure operations. These logging options can be broken down into three overarching types, as well as eight log categories. The granular data collected by Azure logs enables enterprises to monitor resources and helps identify potential system breaches.”
“Serverless computing provides a layer of abstraction that offloads maintenance of the underlying infrastructure to the cloud provider. That’s a form of workload automation in and of itself, but IT teams can take it a step further with the right tools.
Developers and admins can use a range of serverless offerings in Azure, but they need to understand how they want their workflow to operate in order to select the right services. To start, determine whether your application has its own logic to direct events and triggers, or whether that orchestration is defined by something else.”
“APIs handle an immense amount of data, which is why it’s imperative to invest in API security. Think of authentication as an identification card that proves you are who you say you are. Although Azure Database provides a range of security features, end users are required to practice additional security measures. For example, you must manage strong credentials yourself. Active Directory is the authentication solution of choice for enterprises around the world, and the Azure-hosted version only adds to the attraction as companies continue migrating to the cloud.”
10. Multi-Factor Authentication for all standard users
“Businesses that don’t add extra layers of access protection – such as two-step authentication – are more susceptible to credential theft. Credential thefts are usually achieved by phishing or by planting key-logging malware on a user’s device; and it only takes one compromised credential for a cybercriminal to potentially gain access to the whole network.
Enforcing multi-factor authentication for all users is one of the easiest – yet most effective – of the seven Azure security best practices, as it can be done via Azure Active Directory within a few minutes.”
You can use these best practices as a reference to help you ensure you are fully optimizing all available features in your Azure environment. Have any Azure best practices you’ve learned recently? Let us know in the comments below!
Google Sustainability is an effort that ranges across their business, from the Global Fishing Watch to environmental consciousness in the supply chain. Given that cloud computing has been a major draw of global energy in recent years, the amount of computing done in data centers more than quintupled between 2010 and 2018. But, the amount of energy consumed by the world’s data centers grew only six percent during that period, thanks to improvements in energy efficiency. However, that’s still a lot of power. That’s why Google’s sustainability efforts for data centers and cloud computing are especially important.
Google Cloud Sustainability Efforts – As Old as Their Data Centers
Reducing energy usage has been an initiative for Google for more than 10 years. Google has been carbon neutral since 2007, and 2019 marked the third year in a row that they’ve matched their energy usage with 100 percent renewable energy purchases. Google’s innovation in the data center market also comes from the process of building facilities from the ground up instead of buying existing infrastructures and using machine learning technology to monitor and improve power-usage-effectiveness (PUE) and find new ways to save energy in their data centers.
When comparing the big three cloud providers in terms of sustainability efforts, AWS is by far the largest source of carbon emissions from the cloud globally, due to its dominance. However, AWS’s sustainability team is investing in green energy initiatives and is striving to commit to an ambitious goal of 100% use of renewable energy by 2040 to become as carbon-neutral as Google has been. Microsoft Azure, on the other hand, has run on 100 percent renewable energy since 2014 but would be considered a low-carbon electricity consumer and that’s in part because it runs less of the world than Amazon or Google.
Nonetheless, data centers from the big three cloud providers, wherever they are, all run on electricity. How the electricity is generated is the important factor in whether they are more or less favorable for the environment. For Google, reaching 100% renewable energy purchasing on a global and annual basis was just the beginning. In addition to continuing their aggressive move forward with renewable energy technologies like wind and solar, they wanted to achieve the much more challenging long-term goal of powering operations on a region-specific, 24-7 basis with clean, zero-carbon energy.
Why Renewable Energy Needs to Be the Norm for Cloud Computing
It’s no secret that cloud computing is a drain of resources, roughly three percent of all electricity generated on the planet. That’s why it’s important for Google and other cloud providers to be part of the solution to solving global climate change. Renewable energy is an important element, as is matching the energy use from operations and by helping to create pathways for others to purchase clean energy. However, it’s not just about fighting climate change. Purchasing energy from renewable resources also makes good business sense, for two key reasons:
Renewables are cost-effective – The cost to produce renewable energy technologies like wind and solar had come down precipitously in recent years. By 2016, the levelized cost of wind had come down 60% and the levelized cost of solar had come down 80%. In fact, in some areas, renewable energy is the cheapest form of energy available on the grid. Reducing the cost to run servers reduces the cost for public cloud customers – and we’re in favor of anything that does that.
Renewable energy inputs like wind and sunlight are essentially free – Having no fuel input for most renewables allows Google to eliminate exposure to fuel-price volatility and especially helpful when managing a global portfolio of operations in a wide variety of markets.
Google Sustainability in the Cloud Goes “Carbon Intelligent”
In continuum with their goals for data centers to consume more energy from renewable resources, Google recently revealed in their latest announcement that it will also be time-shifting workloads to take advantage of these resources and make data centers run harder when the sun shines and the wind blows.
“We designed and deployed this first-of-its-kind system for our hyperscale (meaning very large) data centers to shift the timing of many compute tasks to when low-carbon power sources, like wind and solar, are most plentiful.”, Google announced.
Google’s latest advancement in sustainability is a newly developed carbon-intelligent computing platform that seems to work by using two forecasts – one indicating future carbon intensity of the local electrical grid near its data center and another of its own capacity requirements – and using that data “align compute tasks with times of low-carbon electricity supply.” The result is that workloads run when Google believes it can do so while generating the lowest-possible CO2 emissions.
The carbon-intelligent computing platform’s first version will focus on shifting tasks to different times of the day, within the same data center. But, Google already has plans to expand its capability, in addition to shifting time, it will also move flexible compute tasks between different data centers so that more work is completed when and where doing so is more environmentally friendly. As the platform continues to generate data, Google will document its research and share it with other organizations in hopes they can also develop similar tools and follow suit.
Leveraging forecasting with artificial intelligence and machine learning is the next best thing and Google is utilizing this powerful combination in their platform to anticipate workloads and improve the overall health and performance of their data center to be more efficient. Combined with efforts to use cloud resources efficiently by only running VMs when needed, and not oversizing, resource utilization can be improved to reduce your carbon footprint and save money.
Today we are happy to announce that ParkMyCloud now offers GKE cost optimization! You can now capitalize on your utilization data to automatically schedule Google Kubernetes Engine (GKE) to turn off when not needed in order to reduce costs.
GKE Cost Control is a Priority
GKE is the third Kubernetes service ParkMyCloud has rolled out support for in the past six weeks, following Amazon’s EKS and Azure’s AKS. Inbound requests for container cost control have been on the rise this year, and cloud users continue to tell us that container cost control is a major priority.
For example, Flexera’s 2020 State of the Cloud report found that the #1 cloud initiative for this year is to optimize existing use of cloud, and the #3 initiative is to expand use of containers. The report also found that 58% of cloud users use Kubernetes, and container-as-a-service offerings from AWS, Azure, and Google Cloud are all growing. 451 Research predicts that container spending will rise from $2.7 billion this year to $4.3 billion by 2022.
Wasted spend on inefficient containerization is among the problems contributing to $17.6 billion in cloud waste this year alone. Sources of waste include: nonproduction pods that are idle outside of working hours, oversized pods, oversized nodes, and overprovisioned persistent storage.
How to Reduce GKE Costs with ParkMyCloud
ParkMyCloud now offers optimization of GKE clusters and nodepools through scheduling. As with other cloud resources such as Google Cloud VM instances, preemptible VMs, SQL Databases, and Managed Instance groups – as well as resources in AWS, Azure, and Alibaba Cloud – you can create on/off schedules based on your team’s working hours and automatically assign those schedules with the platform’s policy engine. Better yet, get recommended schedules from ParkMyCloud based on your resources’ utilization data.
This comes with full user governance, self-service management of all projects in a single view, and flexible features such as schedule overrides (which you can even do through Slack!) Manage your application stacks with intuitive resource grouping and ordered scheduling.
If you haven’t yet tried out ParkMyCloud, please start a free trial and connect to your Google Cloud account through a secure limited access role.
If you already use ParkMyCloud, you will need to update your Google Cloud IAM policy to allow scheduling actions for GKE. Details available in the release notes.
Questions? Requests for features or more cloud services ParkMyCloud should optimize? Let us know – comment below or contact us directly.