Analysts are reporting that IT budget cuts are expected to continue, dropping 5-8% this year overall. That puts IT departments in a difficult position: what should they cut, and how? While there is no magic bullet, there are places to trim the fat that will require no sacrifice and make no impact on operations.
Public Cloud Spend is High – And Users Want to Optimize
The largest cost in many enterprises’ IT budget is, of course, labor. You already know that the layoffs are happening and that engineering and operations departments are not immune. Whether you’re trying to avoid layoffs or trying to make the most of a reduced budget and workforce after them, you can look at other portions of your budget, including public cloud – often ranked the third-highest area of spend.
Even before COVID-19 wreaked havoc on businesses the world over, cloud customers ranked cloud cost optimization as a priority. Like water and electricity in your home, public cloud is a utility. It needs to be turned off when not being used.
This is made more urgent by today’s economic climate. There’s a lot of pressure in certain verticals, industries, and enterprises to reduce cloud spend and overall operational expenditures.
The Least Controversial Fix: Wasted Cloud Spend
There’s a reason “optimization” is so important: it implies waste. That faucet running when no one’s in the room – there’s simply no reason for the spend, which makes it an “easy” fix. No one will miss it.
The first step is identifying the waste. We estimate that almost $18 billion will be wasted this year in two major categories. The first is idle resources – these are resources being paid for by the hour, minute, or second, that are not actually being used every hour, minute, or second. The most common type is non-production resources provisioned for development, staging, testing, and QA, which are often only used during a 40-hour work week, That means that for the other 128 hours of the week, the resources sit idle, but are still paid for.
The second-largest swath of wasted spend is overprovisioned infrastructure — that is, paying for resources that are larger in capacity than needed. About 40% of instances are oversized. Just by reducing an instance by one size, the cost is reduced by 50%. Or look at it the other way – every size up costs you double.
Other sources of waste not included in this calculation include orphaned volumes, inefficient containerization, underutilized databases, instances running on legacy resource types, unused reserved instances, and more.
How to Activate Optimization
Cutting this waste from your budget is an opportunity to keep the spend you actually need, and make more investment in applications to produce revenue for your business. The people who use this infrastructure on a daily basis need to get on board, and that can be challenging.
The key to taking action to address this wasted spend is to bridge the gap between the people who care about the cloud bill – Finance, IT, etc. – and the people working in the cloud infrastructure every day – the app owners, the lines of business, developers, engineers, testers, people in QA, DevOps, SREs, etc. Those internal “end users” need a self-service tool or platform to take action.
However, app owners have a stack of priorities ahead of cost, and a lack of time to evaluate solutions. Ideally, the cloud operations team will administer a platform, and have that platform enable the app owners or lines of business to take actions, make changes, based on recommendations from that platform. Then you get Finance and IT to see a reducing – or at least flat – cloud bill, with optimized costs.
For an example of how enterprise Cloud Operations departments can approach this, learn from Sysco. They deployed ParkMyCloud to hundreds of app owners and end users across the globe, and used gamification to get them all on board with reducing costs.
Over the past five years, we’ve seen the challenges of cloud computing evolve – but ultimately, their core needs are the same as ever.
It’s interesting to experience and observe how these needs get translated into products, both our own and others. Depending on company growth and culture, Build ↔ Measure ↔ Learn cycles can continue to turn in a rapid fashion as ideas get adopted and refined over time. Or, they can become bogged down in supporting a large and often demanding installed base of customers.
In a few short years, tools built for optimizing public cloud have evolved into a number of sub-segments, each of which in turn has developed to meet customer needs. In part, this reflects a predictable maturation of enterprises using cloud computing as they have migrated from on prem to public or hybrid cloud, and adopted best practices to enhance performance and security while tackling overall growth in monthly spend.
How This Year’s State of the Cloud Report Stacks Up
Flicking through various analyst reports and “Cool Vendor” white papers, it’s fascinating to see how quickly cool becomes uncool as industry needs develop. Being a social scientist by training, longitudinal panel type surveys always grab my attention. RightScale/Flexera’s annual customer survey ticks a few of these boxes. No doubt the participants have changed, but it likely provides a valuable source of data on customer needs and challenges in cloud computing.
You do not need to go back to the first RightScale survey in 2013 to see some big changes. Even when comparing the 2016 to the 2020 survey in terms of company priorities in the cloud, it’s hard to believe that just a few years ago, the number one challenge regardless of maturity was related to a skills/resources gap. These priorities were followed by security and compliance issues, with the focus on cost optimization being a lower priority and then only for those at a more mature state. Roll forward to 2020 and cost management is now the number one cloud initiative in all but the most recent adopters of cloud where it sits at number two. Interestingly, security seems to have dropped off the top-five list. Governance has held on, although likely headed the same way. Conversely, cost optimization now sits atop all other initiatives.
Why Is Cost Optimization Still #1?
What seems to be apparent when reading between the lines of such reports and when talking with customers is that unlike migration, security, and governance, there are still some large holes in companies practices when it comes to optimization and reducing cloud waste. Despite a plethora of tools on offer in 2020 that offer to bring visibility and cost management that the overall cloud waste number is actually still growing as infrastructure grows.
More money has been spent tackling security and governance issues – and these challenges in cloud computing need to be dealt with. But cost optimization can deliver ROI to free up budget to deal with these issues.
In the wake of COVID-19, finance teams across the world will now be sharpening their pencils and looking more aggressively at such measures. While cloud spending may rise, Gartner and IDC have both forecasted overall IT spending to drop 5-8%
Yes, You Can Optimize Now
As with security and governance, a mix of human behavioral and business process changes will be required, both of which can be supported by effective tooling, both native cloud provider and 3rd party ISV tools. Incentives to implement such changes are likely to be higher than in the past, albeit in a more cash-constrained world where low cost, ease of use, and most of all, quantifiable ROI will be prioritized. It has always appeared to me somewhat oxymoronic when I hear promises of reducing cloud waste through the use of expensive cloud management tools that charge based on a percent of your spend.
I foresee a wave of low cost, multi-cloud, and simple to use tools emerging. These tools will need to demonstrate a rapid ROI and be built to be used across engineering and operations (not just in the offices of the CIO/CTO/CFO) to ensure the self-service nature of cloud is not disrupted. A similar pattern will emerge as these tools become part of day-to-day cloud operations where cost optimization is part of the cloud culture. With this the need for specific cost optimization initiatives should be replaced by a new wave of needs, like application resource management.
There’s a vast amount of available resources that give advice on Azure best practices. Based on recent recommendations given by experts in the field, we’ve put together this list of 10 of the best practices for 2020 to help you fully utilize and optimize your Azure environment.
1. Ensure Your Azure VMs are the Correct Size
“There are default VM sizes depending on the image that you choose and the affected Region so be careful and check if the proposed one is really what you need. The majority of the times you can reduce the size to something that fits you better and at a lower cost.”
2. If you use the Azure Cost Management Tool, Know the Limitations
Azure Cost Management can be a useful tool in your arsenal: “Listed as “cost management + billing” in the Azure portal, the Azure Cost Management service’s cost analysis feature offers comprehensive insights into the costs incurred by your Azure resources—starting from the subscription level. This can then be drilled down to specific resource groups and/or resources. The service also provides an overview of current costs as well as a monthly forecast based on the current consumption rate.”
However, know that visibility and action are not equivalent: “Even though [cloud efficiency] is a core tenant of Microsoft Azure Cost Management, optimization is one of the weakest features of the product. The essence of the documentation around this is that you should manually eliminate waste, without going into much detail about what is being wasted or how to eliminate it. Plus, this expects manual intervention and review of each resource without giving direct actions to eliminate the waste.”
3. Approach Role-Based Access Control (RBAC) Systematically
“Using Azure RBAC, you can segregate duties within your team and grant only the amount of access to users that they need to perform their jobs. Instead of giving everybody unrestricted permissions in your Azure subscription or resources, you can allow only certain actions at a particular scope.”
“Even with these specific pre-defined roles, the principle of least privilege shows that you’re almost always giving more access than is truly needed. For even more granular permissions, you can create Azure custom roles and list specific commands that can be run.”
“When you delete a virtual machine in Azure, by default, in order to protect against data loss, any disks that are attached to the VM aren’t deleted. One thing to remember is that after a VM is deleted, you will continue to pay for these “orphaned” unattached disks. In order to minimise storage costs, make sure that you identify and remove any orphaned disk resource.”
“Centralize tagging across your Azure environments. This enables you to discover, group and consistently tag cloud resources across your cloud providers – manually or through automated tag rules. Maintaining a consistent tagging structure allows you to see resource information from all cloud providers for enhanced governance, cost analytics and chargeback.”
6. Decipher how and when to utilize the Azure logging services
“Logs are a major factor when it comes to successful cloud management. Azure users can access a variety of native logging services to maintain reliable and secure operations. These logging options can be broken down into three overarching types, as well as eight log categories. The granular data collected by Azure logs enables enterprises to monitor resources and helps identify potential system breaches.”
“Serverless computing provides a layer of abstraction that offloads maintenance of the underlying infrastructure to the cloud provider. That’s a form of workload automation in and of itself, but IT teams can take it a step further with the right tools.
Developers and admins can use a range of serverless offerings in Azure, but they need to understand how they want their workflow to operate in order to select the right services. To start, determine whether your application has its own logic to direct events and triggers, or whether that orchestration is defined by something else.”
“APIs handle an immense amount of data, which is why it’s imperative to invest in API security. Think of authentication as an identification card that proves you are who you say you are. Although Azure Database provides a range of security features, end users are required to practice additional security measures. For example, you must manage strong credentials yourself. Active Directory is the authentication solution of choice for enterprises around the world, and the Azure-hosted version only adds to the attraction as companies continue migrating to the cloud.”
10. Multi-Factor Authentication for all standard users
“Businesses that don’t add extra layers of access protection – such as two-step authentication – are more susceptible to credential theft. Credential thefts are usually achieved by phishing or by planting key-logging malware on a user’s device; and it only takes one compromised credential for a cybercriminal to potentially gain access to the whole network.
Enforcing multi-factor authentication for all users is one of the easiest – yet most effective – of the seven Azure security best practices, as it can be done via Azure Active Directory within a few minutes.”
You can use these best practices as a reference to help you ensure you are fully optimizing all available features in your Azure environment. Have any Azure best practices you’ve learned recently? Let us know in the comments below!
Google Sustainability is an effort that ranges across their business, from the Global Fishing Watch to environmental consciousness in the supply chain. Given that cloud computing has been a major draw of global energy in recent years, the amount of computing done in data centers more than quintupled between 2010 and 2018. But, the amount of energy consumed by the world’s data centers grew only six percent during that period, thanks to improvements in energy efficiency. However, that’s still a lot of power. That’s why Google’s sustainability efforts for data centers and cloud computing are especially important.
Google Cloud Sustainability Efforts – As Old as Their Data Centers
Reducing energy usage has been an initiative for Google for more than 10 years. Google has been carbon neutral since 2007, and 2019 marked the third year in a row that they’ve matched their energy usage with 100 percent renewable energy purchases. Google’s innovation in the data center market also comes from the process of building facilities from the ground up instead of buying existing infrastructures and using machine learning technology to monitor and improve power-usage-effectiveness (PUE) and find new ways to save energy in their data centers.
When comparing the big three cloud providers in terms of sustainability efforts, AWS is by far the largest source of carbon emissions from the cloud globally, due to its dominance. However, AWS’s sustainability team is investing in green energy initiatives and is striving to commit to an ambitious goal of 100% use of renewable energy by 2040 to become as carbon-neutral as Google has been. Microsoft Azure, on the other hand, has run on 100 percent renewable energy since 2014 but would be considered a low-carbon electricity consumer and that’s in part because it runs less of the world than Amazon or Google.
Nonetheless, data centers from the big three cloud providers, wherever they are, all run on electricity. How the electricity is generated is the important factor in whether they are more or less favorable for the environment. For Google, reaching 100% renewable energy purchasing on a global and annual basis was just the beginning. In addition to continuing their aggressive move forward with renewable energy technologies like wind and solar, they wanted to achieve the much more challenging long-term goal of powering operations on a region-specific, 24-7 basis with clean, zero-carbon energy.
Why Renewable Energy Needs to Be the Norm for Cloud Computing
It’s no secret that cloud computing is a drain of resources, roughly three percent of all electricity generated on the planet. That’s why it’s important for Google and other cloud providers to be part of the solution to solving global climate change. Renewable energy is an important element, as is matching the energy use from operations and by helping to create pathways for others to purchase clean energy. However, it’s not just about fighting climate change. Purchasing energy from renewable resources also makes good business sense, for two key reasons:
Renewables are cost-effective – The cost to produce renewable energy technologies like wind and solar had come down precipitously in recent years. By 2016, the levelized cost of wind had come down 60% and the levelized cost of solar had come down 80%. In fact, in some areas, renewable energy is the cheapest form of energy available on the grid. Reducing the cost to run servers reduces the cost for public cloud customers – and we’re in favor of anything that does that.
Renewable energy inputs like wind and sunlight are essentially free – Having no fuel input for most renewables allows Google to eliminate exposure to fuel-price volatility and especially helpful when managing a global portfolio of operations in a wide variety of markets.
Google Sustainability in the Cloud Goes “Carbon Intelligent”
In continuum with their goals for data centers to consume more energy from renewable resources, Google recently revealed in their latest announcement that it will also be time-shifting workloads to take advantage of these resources and make data centers run harder when the sun shines and the wind blows.
“We designed and deployed this first-of-its-kind system for our hyperscale (meaning very large) data centers to shift the timing of many compute tasks to when low-carbon power sources, like wind and solar, are most plentiful.”, Google announced.
Google’s latest advancement in sustainability is a newly developed carbon-intelligent computing platform that seems to work by using two forecasts – one indicating future carbon intensity of the local electrical grid near its data center and another of its own capacity requirements – and using that data “align compute tasks with times of low-carbon electricity supply.” The result is that workloads run when Google believes it can do so while generating the lowest-possible CO2 emissions.
The carbon-intelligent computing platform’s first version will focus on shifting tasks to different times of the day, within the same data center. But, Google already has plans to expand its capability, in addition to shifting time, it will also move flexible compute tasks between different data centers so that more work is completed when and where doing so is more environmentally friendly. As the platform continues to generate data, Google will document its research and share it with other organizations in hopes they can also develop similar tools and follow suit.
Leveraging forecasting with artificial intelligence and machine learning is the next best thing and Google is utilizing this powerful combination in their platform to anticipate workloads and improve the overall health and performance of their data center to be more efficient. Combined with efforts to use cloud resources efficiently by only running VMs when needed, and not oversizing, resource utilization can be improved to reduce your carbon footprint and save money.
Today we are happy to announce that ParkMyCloud now offers GKE cost optimization! You can now capitalize on your utilization data to automatically schedule Google Kubernetes Engine (GKE) to turn off when not needed in order to reduce costs.
GKE Cost Control is a Priority
GKE is the third Kubernetes service ParkMyCloud has rolled out support for in the past six weeks, following Amazon’s EKS and Azure’s AKS. Inbound requests for container cost control have been on the rise this year, and cloud users continue to tell us that container cost control is a major priority.
For example, Flexera’s 2020 State of the Cloud report found that the #1 cloud initiative for this year is to optimize existing use of cloud, and the #3 initiative is to expand use of containers. The report also found that 58% of cloud users use Kubernetes, and container-as-a-service offerings from AWS, Azure, and Google Cloud are all growing. 451 Research predicts that container spending will rise from $2.7 billion this year to $4.3 billion by 2022.
Wasted spend on inefficient containerization is among the problems contributing to $17.6 billion in cloud waste this year alone. Sources of waste include: nonproduction pods that are idle outside of working hours, oversized pods, oversized nodes, and overprovisioned persistent storage.
How to Reduce GKE Costs with ParkMyCloud
ParkMyCloud now offers optimization of GKE clusters and nodepools through scheduling. As with other cloud resources such as Google Cloud VM instances, preemptible VMs, SQL Databases, and Managed Instance groups – as well as resources in AWS, Azure, and Alibaba Cloud – you can create on/off schedules based on your team’s working hours and automatically assign those schedules with the platform’s policy engine. Better yet, get recommended schedules from ParkMyCloud based on your resources’ utilization data.
This comes with full user governance, self-service management of all projects in a single view, and flexible features such as schedule overrides (which you can even do through Slack!) Manage your application stacks with intuitive resource grouping and ordered scheduling.
If you haven’t yet tried out ParkMyCloud, please start a free trial and connect to your Google Cloud account through a secure limited access role.
If you already use ParkMyCloud, you will need to update your Google Cloud IAM policy to allow scheduling actions for GKE. Details available in the release notes.
Questions? Requests for features or more cloud services ParkMyCloud should optimize? Let us know – comment below or contact us directly.
Microsoft Azure IAM, also known as Access Control (IAM), is the product provided in Azure for RBAC and governance of users and roles. Identity management is a crucial part of cloud operations due to security risks that can come from misapplied permissions. Whenever you have a new identity (a user, group, or service principal) or a new resource (such as a virtual machine, database, or storage blob), you should provide proper access with as limited of a scope as possible. Here are some of the questions you should ask yourself to maintain maximum security:
1. Who needs access?
Granting access to an identity includes both human users and programmatic access from applications and scripts. If you are utilizing Azure Active Directory, then you likely want to use those managed identities for role assignments. Consider using an existing group of users or making a new group to apply similar permissions across a set of users, as you can then remove a user from that group in the future to revoke those permissions.
Programmatic access is typically granted through Azure service principals. Since it’s not a user logging in, the application or script will use the App Registration credentials to connect and run any commands. As an example, ParkMyCloud uses a service principal to get a list of managed resources, start them, stop them, and resize them.
2. What role do they need?
Azure IAM uses roles to give specific permissions to identities. Azure has a number of built-in roles based on a few common functions:
Owner – Full management access, including granting access to others
Contributor – Management access to perform all actions except granting access to others
User Access Administrator – Specific access to grant access to others
Reader – View-only access
These built-in roles can be more specific, such as “Virtual Machine Contributor” or “Log Analytics Reader”. However, even with these specific pre-defined roles, the principle of least privilege shows that you’re almost always giving more access than is truly needed.
For even more granular permissions, you can create Azure custom roles and list specific commands that can be run. As an example, ParkMyCloud recommends creating a custom role to list the specific commands that are available as features. This ensures that you start with too few permissions, and slowly build up based on the needs of the user or service account. Not only can this prevent data leaks or data theft, but it can also protect against attacks like malware, former employee revenge, and rogue bitcoin mining.
3. Where do they need access?
The final piece of an Azure IAM permission set is deciding the specific resource that the identity should be able to access. This should be at the most granular level possible to maintain maximum security. For example, a Cloud Operations Manager may need access at the management group or subscription level, while a SQL Server utility may just need access to specific database resources. When creating or assigning the role, this is typically referred to as the “scope” in Azure.
Our suggestion for the scope of a role is to always think twice before using the subscription or management group as a scope. The scale of your subscription is going to come into consideration, as organizations with many smaller subscriptions that have very focused purposes may be able to use the subscription-level scope more frequently. On the flip side, some companies have broader subscriptions, then use resource groups or tags to limit access, which means the scope is often smaller than a whole subscription.
More Secure, Less Worry
By revisiting these questions for each new resource or new identity that is created, you can quickly develop habits to maintain a high level of security using Azure IAM. For a real-world look at how we suggest setting up a service principal with a custom role to manage the power scheduling and rightsizing of your VMs, scale sets, and AKS clusters, check out the documentation for ParkMyCloud Azure access, and sign up for a free trial today to get it connected securely to your environment.