Analysts are reporting that IT budget cuts are expected to continue, dropping 5-8% this year overall. That puts IT departments in a difficult position: what should they cut, and how? While there is no magic bullet, there are places to trim the fat that will require no sacrifice and make no impact on operations.
Public Cloud Spend is High – And Users Want to Optimize
The largest cost in many enterprises’ IT budget is, of course, labor. You already know that the layoffs are happening and that engineering and operations departments are not immune. Whether you’re trying to avoid layoffs or trying to make the most of a reduced budget and workforce after them, you can look at other portions of your budget, including public cloud – often ranked the third-highest area of spend.
Even before COVID-19 wreaked havoc on businesses the world over, cloud customers ranked cloud cost optimization as a priority. Like water and electricity in your home, public cloud is a utility. It needs to be turned off when not being used.
This is made more urgent by today’s economic climate. There’s a lot of pressure in certain verticals, industries, and enterprises to reduce cloud spend and overall operational expenditures.
The Least Controversial Fix: Wasted Cloud Spend
There’s a reason “optimization” is so important: it implies waste. That faucet running when no one’s in the room – there’s simply no reason for the spend, which makes it an “easy” fix. No one will miss it.
The first step is identifying the waste. We estimate that almost $18 billion will be wasted this year in two major categories. The first is idle resources – these are resources being paid for by the hour, minute, or second, that are not actually being used every hour, minute, or second. The most common type is non-production resources provisioned for development, staging, testing, and QA, which are often only used during a 40-hour work week, That means that for the other 128 hours of the week, the resources sit idle, but are still paid for.
The second-largest swath of wasted spend is overprovisioned infrastructure — that is, paying for resources that are larger in capacity than needed. About 40% of instances are oversized. Just by reducing an instance by one size, the cost is reduced by 50%. Or look at it the other way – every size up costs you double.
Other sources of waste not included in this calculation include orphaned volumes, inefficient containerization, underutilized databases, instances running on legacy resource types, unused reserved instances, and more.
How to Activate Optimization
Cutting this waste from your budget is an opportunity to keep the spend you actually need, and make more investment in applications to produce revenue for your business. The people who use this infrastructure on a daily basis need to get on board, and that can be challenging.
The key to taking action to address this wasted spend is to bridge the gap between the people who care about the cloud bill – Finance, IT, etc. – and the people working in the cloud infrastructure every day – the app owners, the lines of business, developers, engineers, testers, people in QA, DevOps, SREs, etc. Those internal “end users” need a self-service tool or platform to take action.
However, app owners have a stack of priorities ahead of cost, and a lack of time to evaluate solutions. Ideally, the cloud operations team will administer a platform, and have that platform enable the app owners or lines of business to take actions, make changes, based on recommendations from that platform. Then you get Finance and IT to see a reducing – or at least flat – cloud bill, with optimized costs.
For an example of how enterprise Cloud Operations departments can approach this, learn from Sysco. They deployed ParkMyCloud to hundreds of app owners and end users across the globe, and used gamification to get them all on board with reducing costs.
There’s a vast amount of available resources that give advice on Azure best practices. Based on recent recommendations given by experts in the field, we’ve put together this list of 10 of the best practices for 2020 to help you fully utilize and optimize your Azure environment.
1. Ensure Your Azure VMs are the Correct Size
“There are default VM sizes depending on the image that you choose and the affected Region so be careful and check if the proposed one is really what you need. The majority of the times you can reduce the size to something that fits you better and at a lower cost.”
2. If you use the Azure Cost Management Tool, Know the Limitations
Azure Cost Management can be a useful tool in your arsenal: “Listed as “cost management + billing” in the Azure portal, the Azure Cost Management service’s cost analysis feature offers comprehensive insights into the costs incurred by your Azure resources—starting from the subscription level. This can then be drilled down to specific resource groups and/or resources. The service also provides an overview of current costs as well as a monthly forecast based on the current consumption rate.”
However, know that visibility and action are not equivalent: “Even though [cloud efficiency] is a core tenant of Microsoft Azure Cost Management, optimization is one of the weakest features of the product. The essence of the documentation around this is that you should manually eliminate waste, without going into much detail about what is being wasted or how to eliminate it. Plus, this expects manual intervention and review of each resource without giving direct actions to eliminate the waste.”
3. Approach Role-Based Access Control (RBAC) Systematically
“Using Azure RBAC, you can segregate duties within your team and grant only the amount of access to users that they need to perform their jobs. Instead of giving everybody unrestricted permissions in your Azure subscription or resources, you can allow only certain actions at a particular scope.”
“Even with these specific pre-defined roles, the principle of least privilege shows that you’re almost always giving more access than is truly needed. For even more granular permissions, you can create Azure custom roles and list specific commands that can be run.”
“When you delete a virtual machine in Azure, by default, in order to protect against data loss, any disks that are attached to the VM aren’t deleted. One thing to remember is that after a VM is deleted, you will continue to pay for these “orphaned” unattached disks. In order to minimise storage costs, make sure that you identify and remove any orphaned disk resource.”
“Centralize tagging across your Azure environments. This enables you to discover, group and consistently tag cloud resources across your cloud providers – manually or through automated tag rules. Maintaining a consistent tagging structure allows you to see resource information from all cloud providers for enhanced governance, cost analytics and chargeback.”
6. Decipher how and when to utilize the Azure logging services
“Logs are a major factor when it comes to successful cloud management. Azure users can access a variety of native logging services to maintain reliable and secure operations. These logging options can be broken down into three overarching types, as well as eight log categories. The granular data collected by Azure logs enables enterprises to monitor resources and helps identify potential system breaches.”
“Serverless computing provides a layer of abstraction that offloads maintenance of the underlying infrastructure to the cloud provider. That’s a form of workload automation in and of itself, but IT teams can take it a step further with the right tools.
Developers and admins can use a range of serverless offerings in Azure, but they need to understand how they want their workflow to operate in order to select the right services. To start, determine whether your application has its own logic to direct events and triggers, or whether that orchestration is defined by something else.”
“APIs handle an immense amount of data, which is why it’s imperative to invest in API security. Think of authentication as an identification card that proves you are who you say you are. Although Azure Database provides a range of security features, end users are required to practice additional security measures. For example, you must manage strong credentials yourself. Active Directory is the authentication solution of choice for enterprises around the world, and the Azure-hosted version only adds to the attraction as companies continue migrating to the cloud.”
10. Multi-Factor Authentication for all standard users
“Businesses that don’t add extra layers of access protection – such as two-step authentication – are more susceptible to credential theft. Credential thefts are usually achieved by phishing or by planting key-logging malware on a user’s device; and it only takes one compromised credential for a cybercriminal to potentially gain access to the whole network.
Enforcing multi-factor authentication for all users is one of the easiest – yet most effective – of the seven Azure security best practices, as it can be done via Azure Active Directory within a few minutes.”
You can use these best practices as a reference to help you ensure you are fully optimizing all available features in your Azure environment. Have any Azure best practices you’ve learned recently? Let us know in the comments below!
Google Sustainability is an effort that ranges across their business, from the Global Fishing Watch to environmental consciousness in the supply chain. Given that cloud computing has been a major draw of global energy in recent years, the amount of computing done in data centers more than quintupled between 2010 and 2018. But, the amount of energy consumed by the world’s data centers grew only six percent during that period, thanks to improvements in energy efficiency. However, that’s still a lot of power. That’s why Google’s sustainability efforts for data centers and cloud computing are especially important.
Google Cloud Sustainability Efforts – As Old as Their Data Centers
Reducing energy usage has been an initiative for Google for more than 10 years. Google has been carbon neutral since 2007, and 2019 marked the third year in a row that they’ve matched their energy usage with 100 percent renewable energy purchases. Google’s innovation in the data center market also comes from the process of building facilities from the ground up instead of buying existing infrastructures and using machine learning technology to monitor and improve power-usage-effectiveness (PUE) and find new ways to save energy in their data centers.
When comparing the big three cloud providers in terms of sustainability efforts, AWS is by far the largest source of carbon emissions from the cloud globally, due to its dominance. However, AWS’s sustainability team is investing in green energy initiatives and is striving to commit to an ambitious goal of 100% use of renewable energy by 2040 to become as carbon-neutral as Google has been. Microsoft Azure, on the other hand, has run on 100 percent renewable energy since 2014 but would be considered a low-carbon electricity consumer and that’s in part because it runs less of the world than Amazon or Google.
Nonetheless, data centers from the big three cloud providers, wherever they are, all run on electricity. How the electricity is generated is the important factor in whether they are more or less favorable for the environment. For Google, reaching 100% renewable energy purchasing on a global and annual basis was just the beginning. In addition to continuing their aggressive move forward with renewable energy technologies like wind and solar, they wanted to achieve the much more challenging long-term goal of powering operations on a region-specific, 24-7 basis with clean, zero-carbon energy.
Why Renewable Energy Needs to Be the Norm for Cloud Computing
It’s no secret that cloud computing is a drain of resources, roughly three percent of all electricity generated on the planet. That’s why it’s important for Google and other cloud providers to be part of the solution to solving global climate change. Renewable energy is an important element, as is matching the energy use from operations and by helping to create pathways for others to purchase clean energy. However, it’s not just about fighting climate change. Purchasing energy from renewable resources also makes good business sense, for two key reasons:
Renewables are cost-effective – The cost to produce renewable energy technologies like wind and solar had come down precipitously in recent years. By 2016, the levelized cost of wind had come down 60% and the levelized cost of solar had come down 80%. In fact, in some areas, renewable energy is the cheapest form of energy available on the grid. Reducing the cost to run servers reduces the cost for public cloud customers – and we’re in favor of anything that does that.
Renewable energy inputs like wind and sunlight are essentially free – Having no fuel input for most renewables allows Google to eliminate exposure to fuel-price volatility and especially helpful when managing a global portfolio of operations in a wide variety of markets.
Google Sustainability in the Cloud Goes “Carbon Intelligent”
In continuum with their goals for data centers to consume more energy from renewable resources, Google recently revealed in their latest announcement that it will also be time-shifting workloads to take advantage of these resources and make data centers run harder when the sun shines and the wind blows.
“We designed and deployed this first-of-its-kind system for our hyperscale (meaning very large) data centers to shift the timing of many compute tasks to when low-carbon power sources, like wind and solar, are most plentiful.”, Google announced.
Google’s latest advancement in sustainability is a newly developed carbon-intelligent computing platform that seems to work by using two forecasts – one indicating future carbon intensity of the local electrical grid near its data center and another of its own capacity requirements – and using that data “align compute tasks with times of low-carbon electricity supply.” The result is that workloads run when Google believes it can do so while generating the lowest-possible CO2 emissions.
The carbon-intelligent computing platform’s first version will focus on shifting tasks to different times of the day, within the same data center. But, Google already has plans to expand its capability, in addition to shifting time, it will also move flexible compute tasks between different data centers so that more work is completed when and where doing so is more environmentally friendly. As the platform continues to generate data, Google will document its research and share it with other organizations in hopes they can also develop similar tools and follow suit.
Leveraging forecasting with artificial intelligence and machine learning is the next best thing and Google is utilizing this powerful combination in their platform to anticipate workloads and improve the overall health and performance of their data center to be more efficient. Combined with efforts to use cloud resources efficiently by only running VMs when needed, and not oversizing, resource utilization can be improved to reduce your carbon footprint and save money.
We speak to enterprises large and small about cloud cost optimization, and one of the more dominant themes we have been hearing lately is: who should manage app development costs? Cloud Operations teams (ITOps, DevOps, FinOps, Cloud Center of Excellence, etc.) that are responsible for the management, governance and optimization of an enterprise’s cloud resources need to get the Application owners or the lines of business owners to be responsible for cost. It can’t simply be the centralized cloud team who cares about cost. Folks using cloud services on a daily basis for engineering, development, QA, testing, etc. need to take actions related to optimizing cloud costs, managing user governance and security operations.
I liken this a bit to the response to the COVID-19 pandemic given this is the event that has defined 2020. The Federal Government can collect data from across the country, provide resources and publish guidelines but ultimately the State Governments need to take the actions to shut down schools and non-essential businesses, and certain counties or jurisdictions within those states can even decide if they will adhere to the state guidelines, there could be very good reasons they don’t based on data or essential businesses. We see the same underlying process in enterprises when it comes to cloud cost optimization and management.
Let’s play this out.
Cloud spend has become the largest single IT cost outside of labor and is growing 10-15% month on month. So, the CloudOps team is given a directive from Finance and/ or IT Management to find tools or solutions to identify cloud waste and control cloud spend primarily in AWS, Azure and Google clouds.
Then, the CloudOps team researches tools, both 3rd party and native cloud provider tools, and finds a couple important things:
If the enterprise is multi-cloud, the native CSP tools are a non-starter
Tools must be data-driven, so the recommendations to reduce the app development cost are believable and actually useful
The tools must be self-service, i.e., the application owners or the lines of business need to be able to take the actions. Otherwise, they will deem CloudOps as being draconian (and push back because they know their app better … sounds like the States).
Next, CloudOps brings in a tool to do a pilot. It starts small with a sandbox account, but as data and trust build, the pilot expands to include many AWS, Azure, and/or GCP accounts that are used by the application owners. Then CloudOps determines a “friendly” line of business where the app development cost owner is keen to identify waste, reduce costs, and increase their cloud efficiency.
CloudOps and the cloud optimization vendor provide a demo to the app owners using their own data and showing them where they have waste, such as idle resources, over-provisioned resources, orphaned resources, resources that could leverage reservations, and so forth. The app owners are intrigued and are keen to understand if they are the master of their own domain.
Where is this data coming from? Is it reliable?
Can we take our own actions? Is this self-service?
What about user governance? My QA team does not need to manage resources that belong to dev or staging. Can we reject a recommendation because the app we are running requires that configuration?
Can we group resources into application stacks and manage them as a single entity?
Can we override an action?
In order to effectively manage the app development cost, CloudOps needs to involve the owners and users of those applications and provide them with the data and tools to make decisions and take actions. The cloud is self-service, so in order to effectively manage your cloud services, you need the optimization and governance tools to also be self-service and adapt to the needs of each business unit within your organization.
Taking your organization into a full multi-cloud deployment can be a daunting task, but focusing on adopting just an AWS multi-account strategy can provide many benefits without a lot of extra effort. AWS makes it quite easy to create new accounts on a whim, and can simplify things with consolidated billing. Let’s take a look at why you might want to split your monolithic AWS account into micro accounts.
1. Logical Separation of Resources
There are a few options for separating your resources within a single AWS account, including tagging, isolated VPCs, or using different regions for different groups. However, these practices can still lead to extensive lists of resources within your account, making it hard to find what you need. By creating a new account for each project, business unit, or development stage, you can enforce a much better logical separation of your resources. You can still use separate VPCs or regions within an account, but you aren’t forced to do so.
2. Security and Governance
In addition to separation for logical purposes, multiple accounts can also help from a security perspective. For example, having a “production” account separate from a “development” account lets you give broader access to your developers and operations teams based on which account they need access to. AWS provides a great “IAM Analyzer” tool that can help you ensure proper security and roles for your users. And if you have ever had a developer hard-code account access information, separated accounts can help bring that to light (we have not this happen at ParkMyCloud, but we have definitely seen it a couple of times over the years…).
3. Cost Allocation
In addition to tagging your systems for cost reporting, separation into different accounts can help with the chargeback and showback to your business units. Knowing which accounts are spending too much money can help you tweak your processes and find cloud waste. The AWS Cost and Usage Reports show exactly which account is associated with each expense.
4. Cost Savings Automation
You can apply cost savings automation at a granular level – but it’s easier if you don’t have to. For example, you should enforce schedules to automatically turn off resources outside of business hours. Some of our customers are eager to add their development-focused account to ParkMyCloud to allow for scheduling automation, but are a bit leery of adding Production accounts where someone might turn something off by accident. Automated scripts and platforms such as ParkMyCloud can be fully adopted on dev and sandbox accounts to streamline your continuous cost control, while automation around your production environment can be used to make sure everything is up and running. AWS IAM policies can also allow you to set different policies on different accounts, for example, allowing scheduling and rightsizing automation in dev/test accounts, but only manual rightsizing in production.
5. Reserved Instances and Savings Plans
In an AWS environment where you have multiple accounts all rolling up to an Organization account, Reserved Instances and Savings Plans can be shared across all the associated accounts. Say you buy an RI or Savings plan in one account, but then end up not fully using it in that account. AWS will automatically allocate that RI to any other account in the Organization that has the right kind of system running at the right time. A couple of our larger customers with really mature cloud management practices take this a step further and carefully manage all RI purchases using a dedicated “cloud management” account within the Organization. This allows them to maintain a portfolio of RIs and Savings Plans (kind of like a stock market portfolio) designed to optimize spend across the entire company, and limiting commitments to RIs that might not be needed due to idle RI’s purchased by some other group on some other account. This allows them to smooth out the purchase of expensive multi-year and all-upfront RIs and Savings Plans over the course of time.
6. Keeping Your Options Open
Even if you aren’t multi-cloud at the moment, you never know how your cloud strategy might evolve over the next few years. By separating into multiple AWS accounts, it helps you keep your options available for individual groups or applications to move to different cloud providers without disrupting other departments. This flexibility can also help your management feel at ease with choosing AWS, as they won’t feel as locked-in as they otherwise might.
Get Started With An AWS Multi-Account Strategy
If you haven’t already started using multiple AWS accounts, Amazon provides a few different resources to help. One recent announcement was AWS Control Tower, which helps with the deployment of new accounts in an automated and repeatable fashion. This is a step beyond the AWS Landing Zone solution, which was provided by Amazon as an infrastructure-as-code deployment. Once you have more than one account, you’ll want to look into AWS Organizations to help with management and grouping of accounts and sharing reservations.
For maximum cost savings and cloud waste reduction, use ParkMyCloud to find and eliminate cloud waste – it’s fully multi-account aware, allowing you to see all of your accounts in a single pane of glass. Give it a try today and get recommended parking schedules across all of your AWS accounts.
When it was announced in December last year, AWS called the AWS IAM Access Analyzer “the sort of thing that will improve security for just about everyone that builds on AWS.” Last week, it was expanded to the AWS Organizations level. If you use AWS, use this tool to ensure your access is granted as intended across your accounts.
“IAM” Having Problems
AWS provides robust security and user/role management, but that doesn’t mean you’re protected from the issues that can arise from improperly configured IAM access. Here are a few we’ve seen the most often.
Creating a user when it should have been a role. IAM roles and IAM users can both be assigned policies, but they are intended to be used differently. IAM users should correspond to specific human users, who can be assigned long-term credentials and directly interact with AWS services. IAM roles are sets of capabilities that can be assumed by other entities – for example, third-party software that interacts with your AWS account (hi! 👋). Check out this post for more about roles vs. users.
Assigning a pre-built policy vs. creating a custom policy. There are plenty of pre-built policies – here are a few dozen examples – but you can also create custom policies. The problems arise when, in a hurry to grant access to users, you grant more than necessary, leaving holes. For example, we’ve seen people get frustrated when their users don’t have access to a VM but little insight into why – while it could be that the VM has been terminated or moved to a region the user can’t view, an “easy fix” is to broaden that user’s access.
Leaving regions or resource types open. If an IAM role needs permission to spin EC2 instances up and down, you might grant full EC2 privileges. But if the users with that role only ever use us-east-1 and don’t look around the other regions (why would they?) or keep a close eye on their bill, they may have no idea that some bad actor is bitcoin mining in your account over in us-west-2.
Potential attacks need only an opportunity to get access to your account, and the impact could range from exposing customer data to ransomware to total resource deletion. So it’s important to know what IAM paths are open and whether they’re in use.
Enter the AWS IAM Access Analyzer
The IAM Access Analyzer uses “automated reasoning”, which is a type of mathematical logic, to review your IAM roles, S3 buckets, KMS keys, AWS Lambda functions, and Amazon SQS queues. It’s free to use and straightforward to set up.
Once you set up an analyzer, you will see a list of findings that shows items for you to review and address or dismiss. With the expansion to the organizational level, you can establish your entire organization as a “zone of trust”, so that issues identified are for resources accessible from outside the organization.
The Access Analyzer continuously monitors for new & updated policies, and you can manually re-analyze as well.
3 Things to Go Do Now
If you had time to read this, you probably have time to go set up an analyzer:
Review your findings and address any potential issues.
Check the access you’re granting to any third-party service. For example, ParkMyCloud requests only the minimum permissions needed to do its job. Are you assigning anyone the AWS-provided “ReadOnlyAccess” role? If so, you are sharing far more than is likely needed.