Taking control of your cloud finance is now more important than ever and there is no room for wasted spend. More organizations are shifting to cloud-based infrastructures – according to forecasting done by Gartner last year, the worldwide public cloud revenue is expected to grow 17.3 percent in 2019. While this is good news for technology innovation, from the finance side of the table, elastic infrastructure poses a challenge. CFOs need to ensure that IT and development departments are optimizing spend even while encouraging innovation and growth.
The Challenge When it Comes to Cloud Finance
Finance departments continue the search for capital optimization by lowering costs while prioritizing business models that transform and expand worldwide with flexibility. With this flexibility, though, comes complexity that is difficult to manage, deploy, and – most frustrating of all – to forecast.
With rapid growth comes rapid responsibility. If an organization is not cautious, cloud spending can spiral out of control, and using the cloud might seem counterproductive. Finance and IT departments must come to and work together to achieve key business goals and connect the disconnect to avoid a cost control strategy from becoming a project instead of an actionable and executable plan.
Smart Questions CFOs Should Be Asking
With the struggle to control cloud spend, CFOs need to address cloud finance questions and understand their impacts on operations. After all, most organizations cite lowering costs as one of their primary reasons for moving to the cloud. In order to make sure that financial teams and IT departments are on the same page, here are three smart top cloud finance questions CFOs should ask.
1. Are we thinking about the cloud cost model correctly?
Out of habit from the on-premises mindset, many organizations moving to the cloud purchase far more capacity than they actually need. Given that the major benefits for moving to the cloud are flexibility – to allow you to use the cloud based on your real-time needs, and capacity – to match in theory the physical space an on-site data center would provide. Unfortunately, the latter is not true, the majority of companies overspend in cloud resources they are not using for much or all of the time.
So, when CFOs talk to their IT counterparts about cloud spending, they need to ensure that everyone is now in an OpEx mindset, rather than the on-prem model of CapEx.
2. Are we wasting cloud spend?
The answer is most likely yes. To further explain why this happens we need to look at the factors that contribute to this waste. A huge contributing factor is idle resources. The cloud runs 24/7, but most non-production resources used for development, testing, staging, and QA are only needed during the work week. In perspective, if you work a 40-hour week and only need to use resources then, you are paying for resources to stay idle after work hours. Assuming a twelve-hour workday window five days a week, that means 65% of the time you’re paying for, the resources site idle.
Another contributing factor is oversized resources. We recently found that the average CPU usage of resources managed in our platform is only 4.9%. That points to a trend of massive underutilization when resources can easily be sized down for 50-70% cost savings.
3. What steps are we taking to control and reduce cloud spend?
IT and development departments will be focused on growth, so it’s often the role of Finance to ensure that these teams are putting cost control measures in place on public cloud. Ensure that your technical departments have an actionable – preferably, automated – plan in place to combat wasted cloud spend. Ask for reports broken down by project or team over time, and research cloud optimization platforms that the technical teams should take advantage of. Furthermore, using a cloud optimization platform with automated and analytical capabilities will help you discover cost-savings opportunities and enable more efficient workflows between departments.
The Bottom Line
Finance departments can push the cloud conversation toward optimization of resources, ensuring that IT departments are both innovative and within budget. Create a competitive cloud finance strategy to include visibility, flexibility, and governance to create an opportunity for the business to function effectively across departments. This will increase ROI, reporting, and fundamentally, the implementation of better solutions to thrive in the cloud.
There are a ton of great blogs that cover AWS best practices and use cases. To provide a little more insight into the latest practices offered by AWS, we put together 15 of the best practices since the beginning of 2019, consisting of tips and quotes from different experts.
1. Take Advantage of AWS Free Online Training Resources
“There’s no shortage of good information on the internet on how to use Amazon Web Services (AWS). Whether you’re looking for ways to supplement your certification study efforts or just want to know what the heck it’s all about, check out this compilation of free training and resources on all things AWS.”
2. Keep Up With Instance Updates So You Can Periodically Make Changes to Costs and Uses
“AWS expands its choices regularly, so you need to dynamically re-evaluate as your business evolves. The cloud presents many arbitrage opportunities including instance families, generations, types, and regions—but trying to do this manually is a recipe for time-consuming frustration. Don’t fall victim to Instance Inertia: even though the process of making a change is simple enough, it can be difficult to accomplish without having any conclusive evidence of either cost gains or performance improvements.”
“Your configuration of IAM, like any user permission system, should comply with the principle of “least privilege.” That means any user or group should only have the permissions required to perform their job, and no more.”
4. Visibility Across Multiple Accounts in One Frame Helps Make More Informed Decisions
“Use a cloud security solution that provides visibility into the volume and types of resources (virtual machines, load balancers, security groups, users, etc.) across multiple cloud accounts and regions in a single pane of glass. Having visibility and an understanding of your environment enables you to implement more granular policies and reduce risk.”
5. Tag IAM Entities to Help Manage Access Granted to Resources Based on an Attribute
“AWS has now added the ability to tag IAM users and roles, which eases management of IAM entities by enabling the delegation of tagging rights and enforcement of tagging schemes.”
“A primary use case for the new feature is to grant IAM principals access to AWS resources dynamically based on attributes. This can now be achieved by matching AWS resource tags with principal tags in a condition”
“As cloud deployments grow, teams deal with an increasing amount of resources that are constantly moving, growing, and changing. Projects may be shared between teams or customers and can rely on different regions and platforms. This makes it easy to lose track of what’s being used until the bill comes due. For tags to be actionable at scale, most teams require visibility of exactly which resources are at play at any given time, who is using them, and what they are being used for, and who is responsible for them. Essentially, the more high-quality information associated with a resource, the easier it becomes to manage.”
“Within each of these categories, you can then define your own tags that are specific to your organization for standardization”
7. Decrease Errors and Streamline Your Deployments With An Automation Tool
“Whether you choose to use AWS CodeDeploy or a different tool, automating your software deployments helps you more consistently deploy an application across development, test, and production environments. The importance of automation in deployment in order to decrease errors and increase speed cannot be overstated.”
“Automate your deployment. This saves you from potentially costly and damaging human error. With the automation services available today, you have many options to customize every part of your deployment without letting automation fully take over if you prefer.”
“Purchasing an RI is only the beginning; you should have a process in place to continuously monitor RI utilization and modify unused RIs (split/join or exchange convertible RIs) to maximize their usage. A common AWS billing model is a centralized account with consolidated billing, linked to autonomous accounts so individual accounts can purchase RIs based on their individual usage patterns.”
9. Account For the Capacity You Will Need So You Have a Size That Fits Your Environment
“We know that AWS EC2 instance types are sized and priced exponentially. With millions of sizing options and pricing points, choosing the wrong instance type can mean a major pricing premium—or worse, a substantial performance penalty! We see many organizations choose an instance type based on generic guidelines that do not take their specific requirements into account.”
“AWS offers a variety of types and sizes of EC2 instances. That means that it’s perfectly possible to select an instance type that’s too large for your actual needs, which means you’ll be paying more than necessary. In fact, the data shows that this is happening most of the time. ”
10. Save Your Team Time and Money with Serverless Management
“AWS data is housed in different regions all over the world. Its cloud-based system means you’re able to access your data in just a matter of minutes.”
“No more having to set up and maintain your own servers. That’s just more stress and money out of your pocket. Instead, you can leave it to the experts at AWS who will ensure the infrastructure your business is running efficiently.”
“The AWS Serverless Application Repository allows developers to deploy, publish, and share common serverless components among their teams and organizations. Its public library contains community-built, open-source, serverless components that are instantly searchable and deployable with customizable parameters and predefined licensing. They are built and published using the AWS Serverless Application Model (AWS SAM), the infrastructure as code, YAML language, used for templating AWS resources.”
11. Set up a Secure Multi-Account with AWS Landing Zone
“With the large number of design choices, setting up a multi-account environment can take a significant amount of time, involve the configuration of multiple accounts and services, and require a deep understanding of AWS services.
This solution can help save time by automating the set-up of an environment for running secure and scalable workloads while implementing an initial security baseline through the creation of core accounts and resources.”
12. Ensure Consistency in your Environment with Containers
“Containers offer a lightweight way to consistently port software environments for applications. This makes them a great resource for developers looking to improve infrastructure efficiency, becoming the new normal over virtual machines (VMs).”
“Auto Scaling Groups can be used to control backend resources behind an ELB, provide self-replication (when the instance crashes, Auto Scaling Group will immediately provision a new one to maintain the desired capacity), simplify deployments (regular releases, blue/green deployments, etc.), and for many other use cases…..
The unnecessary spending on EC2 instances is usually caused by unused, or underused, compute resources, that increase your monthly bill. This is an age-old problem where you provision more than you need, to make sure you have enough to handle the expected, but also unexpected traffic. An Auto Scaling Group solves this issue by handling the scalability requirements for you.”
“AWS Backup performs automated backup tasks across an organization’s various assets stored in the AWS cloud, as well as on-premises. It provides a centralized environment, accessible through the AWS Management Console, for organizations to manage their overall backup strategies.
AWS Backup eliminates the need for organizations to custom-create their own backup scripts for individual AWS services, the company contends.”
“Capable of accepting and processing hundreds of thousands of concurrent API calls, API Gateway can manage such related tasks as: API version management; authorization and access control; traffic management and monitoring.”
Part of the role of any managed service provider managing cloud services is to guide their customers through the process of creating and evaluating cloud cost models. This is important whether migrating to the cloud, re-evaluating an existing cloud environment, or simply understanding a monthly cloud bill. Many customers may be more familiar with on-prem cost models, so relating to that mindset is crucial. Here are a few important things to keep in mind when educating your customers about cloud costs.
1. Explain CapEx vs. OpEx
One of the biggest shifts in mentality that must be made when evaluating cloud cost models is the shift from predominantly Capital Expenditures with on-prem workloads as compared to predominantly Operational Expenditures with cloud workloads.
“It’s been a challenge educating our team on the cloud model. They’re learning that there’s a direct monetary impact for every hour that an idle instance is running.”
Another contact added: “The world of physical servers was all CapEx driven, requiring big up-front costs, and ending in systems running full time. Now the model is OpEx, and getting our people to see the benefits of the new cost-per-hour model has been challenging but rewarding.”
Deploying a project in a private cloud involves lots of up-front purchases and ongoing maintenance, including servers, power, hardware, buildings, and more. On top of the actual purchase cost, you must account for amortization, depreciation, and the opportunity cost of those purchases.
Cloud workloads often work on a pay-as-you-go model, where you pay only for what services and features you use and how long you use them. This provides organizations with almost no Capital Expenditures for these resources, but results in a dramatic increase in Operational Expenditures. Neither is necessarily a bad thing, but your job as an MSP is to clearly articulate this shift so the customer can understand why the ongoing costs appear so much higher. And, of course, you’ll have to incorporate your own value into the equation.
2. Make Sure Your Clients Understand Their Cloud Bill Breakdown
For on-prem services, the details of the cost model don’t usually require detail about what software or service is actually running on the physical machine. A database server and a web server may have different specs, but everything becomes normalized to the physical hardware that must be purchased as a one-time fee. This provides a certain level of simplicity in your calculations, but still must account for all the additional physical factors like power, air conditioning, redundancy, cabling, racks, and maintenance.
Cloud services not only charge based on time used, but also have very different costs for each service. A database server and a web server are going to have very different cost structures, and will show up on your monthly bill as separate items. This often makes the bill look much more complex, but the flip side of that is that you have many opportunities for optimization and cost allocation.
3. Be the Authority on IT Costs
Creating cloud cost models for your customers can require a big mental shift from other cost models, but it’s an important step for current and future IT projects. Understanding what the options are, what the costs are, and what your usage will be, are all factors. Make sure to convey all of these aspects to the stakeholders of your client in a clear way to avoid the surprise bill at the end of each month.
Ultimately, the market for cloud managed services is growing, which is good for managed service providers. As customers migrate to the cloud, they will need cost optimization expertise, which is a great angle for MSPs to get a foot in the door.
AWS optimization might be on your mind if you saw last week’s headlines that Lyft has committed to spend $300 million with Amazon Web Services (AWS) per year over the next three years. This information was revealed in Lyft’s IPO prospectus, filed last Friday.
Lyft isn’t the first startup to generate attention from its massive public cloud bills – Snap and Spotify’s Google Cloud bills are just two other examples.
And this level of spend is no surprise, either. Lyft was born and scaled to “unicorn” status in the cloud, from the first three EC2 servers that powered their first ride to the massive infrastructure of microservices that now powers the ride sharing giant. The question is, how do they use those resources efficiently — with a mindset of AWS optimization?
How Lyft is Already Optimizing AWS
Several case studies from AWS as well as an AWS press release put out last week tell us how Lyft is already using cloud services – and give us insight into how they’re already well-versed in AWS optimization.
The fact that Lyft has such commitments at all tells us that they’re taking advantage of AWS’s Enterprise Discount program – as we would expect for any company with that scale of infrastructure. An EDP is a private agreement with AWS with a minimum spend commitment in exchange for discounted pricing – a smart move, as Lyft anticipates no slowing down in its use of AWS.
2. Auto Scaling
When you learn that Lyft does eight times as many rides on a Saturday night as they do on Sunday morning, you realize the importance of auto scaling – scaling up to meet demand, and back down when the infrastructure is no longer needed.
3. Spot Instances
AWS has a published case study with Lyft about their use of Spot Instances – AWS’s offering of spare capacity at steeply discounted prices, which are interruptible and therefore only useful in certain circumstances. By using Spot Instances for testing, Lyft reduced testing costs by 75%.
4. Microservices Architecture
Lyft runs more than 150 microservices that use Amazon DynamoDB, Amazon EKS, and AWS Lambda — allowing individual workloads to scale as needed for the myriad processes involved in the on-demand ride sharing service.
5. Pre-Built Container Configuration
In addition to Amazon EKS, Lyft uses Amazon EC2 Container Registry (ECR) to store container images and deliver these images to test and deployment systems. They likely have a good start on the battle for container optimization, though in general, this market will mature greatly this year – so it’s something they’re sure to continue to optimize.
Things Lyft Needs to Do to Keep their Infrastructure Optimized
The case studies and press releases mentioned above, as well as Lyft’s own engineering blog, give some insight into their tech stack and processes. Beyond that, there are several things they may well be focusing on, that we would highly recommend as they continue to scale (and IPO):
Many cloud customers we talk to name governance as their top priority. Automated policies and user roles are key for ensuring that no one can spend outside their bounds. Sometimes, it’s as simple an idea as proper tagging – but one that can set automated processes in motion to assign resource access to team members, proper on/off schedules for non-production resources, and configuration management processes.
2. Resource Rightsizing
Our recent research showed that average CPU utilization for the instances in our data set (which leaned non-production) was less than 5%. Given that going one instance size down can save 50% of the cost, and two sizes can save 75%, this is a huge area for optimization that we recommend cloud users of all sizes focus on this year. At Lyft’s scale, this will require automated policies to resize underutilized resources automatically.
3. Continuous Evaluation of Microservices
With 150 microservices, blanket policies won’t apply to all cases. Each microservice needs to be evaluated against newer AWS offerings and cost control techniques on an individual basis. Once each of the 150 has been evaluated, it’s time to go back to the beginning of the list and start again — a mindset of continuous cost control would serve them well.
Lyft has gotten this far built and grown on AWS — and their “culture of cloud” has enabled the growth in platform adoption that has brought them to the brink of IPO. One thing is clear: up to this point, growth at any cost has been the goal. That means that the mere amount of cloud spend has not been of huge concern. As they transition into being a public company, margins and profit will start to matter more, which will bring costs into focus. It will soon be important for Lyft to continually optimize infrastructure – in the cloud and across the board.
One of the terms we have been hearing used more often when talking to prospects and customers alike is Cloud Center of Excellence (CCoE). DevOps, CloudOps, Infrastructure and Finance teams are joining together to create a cloud center to improve cloud operations in the enterprise. These are also known as a Cloud Command Center, Cloud Operations Center, Cloud Knowledge Center, or perhaps Cloud Enablement Team.
Essentially, a CCoE brings together a cross-functional team to manage cloud strategy, governance, and best practices, and serve as cloud leaders for the entire organization.
Who Needs a Cloud Center of Excellence?
When we talk to prospects and customers that have adopted a CCoE, there seem to be a couple of common themes:
Cloud-centric organizations where the DevOps, Security and Finance teams want to ensure that the organization’s diverse set of business units are using a common set of best practices, as no one wants the wild west for cloud management
Large organizations who are now multi-cloud and they need to standardize on a set of tools and processes that work across the CSPs for security, governance, operations and cost control
MSPs who are developing cloud centers focused on creating best practices for their customers, for both single and multi-cloud; for example, you would have an Azure Cloud Center of Excellence (ACCoE) or a Google Cloud Center of Excellence (GCCoE)
For more, see this presentation from Zendesk and CloudHealth from AWS re:Invent 2018 to understand how a large, cloud-centric organization leverages the CCoE concept to improve governance and operational efficiency.
What Should the Cloud Center of Excellence Prioritize?
No matter why you have established a cloud center within your organization, there are a few important priorities in order to make your effort a success:
Interdepartmental Communication — the CCoE serves as a bridge between departments that use, measure, or fund cloud operations. All of these departments and stakeholders need to be on the same page about goals, timelines, and budgets for cloud operations, which is the entire idea of establishing a CCoE.
Technology Expertise — as a resource and driver of innovation throughout the organization, it is imperative that the CCoE are the experts on the cloud technology used in the organization. Given the rate of innovation by the cloud providers, this requires dedicated time and effort.
Governance — there are two major elements important for governance: authority and standardization. In order for the CCoE to be effective, it needs to be granted authority to set policies and standards for cloud security, compliance, and cost control — with the expectation that people throughout the organization will follow these policies. Once that authority is held, the CCoE needs to set, communicate, and enforce the policy standards as an initial priority.
Repeatability and Automation — once policies are established, it’s time to make deployment processes repeatable with reference architectures, and to get tools and platforms in place for governance and cost control.
End-User Buy In –– we all know that if a developer doesn’t want to do something, it’s pretty likely they’re not going to do it. Developing a sense of, if not excitement exactly, but engagement, is important for your new structure to succeed. Several of our customers with cloud centers regularly host tech talks, brown bag lunches, and other learning experiences to promote buy-in and adoption of tools and processes.
Call it What You Want: A Dedicated Effort is Key
Maybe Cloud Center of Excellence is too cheesy a phrase for your taste. What matters is cross-departmental collaboration and standardizing a plan for cloud migration, growth, and management.
Is your organization using a Cloud Center of Excellence model? How’s it going? We’d love to hear in the comments below!
Amazon Web Services (AWS) provides a treasure trove of documents and CloudFormation templates in their AWS Solutions portal, including AWS right sizing, the AWS instance scheduler, a chatbot framework, and more. These solutions can be used as-is for immediate integration into your existing environment, or can be the starting point for developing your own unique toolsets. Today, we’re reviewing the AWS Right Sizing tool to see how much it can help you optimize your infrastructure.
AWS Right Sizing: What It Does
AWS offers a variety of types and sizes of EC2 instances. That means that it’s perfectly possible to select an instance type that’s too large for your actual needs, which means you’ll be paying more than necessary. In fact, the data shows that this is happening most of the time. The AWS Right Sizing tool exists to help users find the correct instance size to meet their needs at the lowest cost.
The tool uses a CloudFormation template that deploys infrastructure and scripts needed to make right sizing recommendations for your AWS account. This infrastructure includes an EC2 instance that will run python scripts, a 2-node Redshift cluster for the right sizing analysis, and an S3 bucket for the raw CloudWatch data and the final CSV output. The total cost of this deployment in the us-east-1 region is $0.65 per hour.
The basis of the right sizing logic is to look at the Max CPU from the past 2 weeks of CloudWatch data for each EC2 instance. If the max CPU is above 50% at any point, then it will not recommend a change, but if it is always below 50% then it will attempt to find the cheapest instance size that matches the I/O, memory, network, and at least the max CPU that was found. The final output is a CSV file that includes information about the existing instance sizes, the utilization of those instances, the recommended instance size, and the cost saved per month.
Worth the hassle?
Based on the logic above, the AWS Right Sizing tool does a very basic level of recommendation for instance sizing. There are a few scenarios where these recommendations may not be helpful, such as applications that are memory-intensive or cases where the instance needs to be a larger size than it currently is. The tool also only spits out a CSV with the recommendations, which means you still have to make decisions and take actions based on those recommendations. The CSV file looks like this:
If those recommendations don’t seem to fit what you’re looking for, the nice thing is they offer the full stack, along with all scripts and CloudFormation templates, as an open-source repository. This means you can take the core of the recommendation engine and tweak it to follow your own logic for customized recommendations, or even use it to trigger the resizing of the instance. AWS also offers Trusted Advisor as a part of their Business-level and Enterprise-level support plans, which can offer right sizing recommendations in real time (amongst other health checks and recommendations).
Overall, this AWS right sizing tool can either be a useful check-up tool for your environment, or the basis for your own cost-optimization initiative, but many users will want a more out-of-the-box automated solution for this.
Since changing server sizes and timing this with maintenance windows can be a hassle, ParkMyCloud has introduced a feature to automate the resizing of your EC2 instances. Interested? Check it out with a free trial.