AWS optimization might be on your mind if you saw last week’s headlines that Lyft has committed to spend $300 million with Amazon Web Services (AWS) per year over the next three years. This information was revealed in Lyft’s IPO prospectus, filed last Friday.
And this level of spend is no surprise, either. Lyft was born and scaled to “unicorn” status in the cloud, from the first three EC2 servers that powered their first ride to the massive infrastructure of microservices that now powers the ride sharing giant. The question is, how do they use those resources efficiently — with a mindset of AWS optimization?
How Lyft is Already Optimizing AWS
Several case studies from AWS as well as an AWS press release put out last week tell us how Lyft is already using cloud services – and give us insight into how they’re already well-versed in AWS optimization.
The fact that Lyft has such commitments at all tells us that they’re taking advantage of AWS’s Enterprise Discount program – as we would expect for any company with that scale of infrastructure. An EDP is a private agreement with AWS with a minimum spend commitment in exchange for discounted pricing – a smart move, as Lyft anticipates no slowing down in its use of AWS.
2. Auto Scaling
When you learn that Lyft does eight times as many rides on a Saturday night as they do on Sunday morning, you realize the importance of auto scaling – scaling up to meet demand, and back down when the infrastructure is no longer needed.
3. Spot Instances
AWS has a published case study with Lyft about their use of Spot Instances – AWS’s offering of spare capacity at steeply discounted prices, which are interruptible and therefore only useful in certain circumstances. By using Spot Instances for testing, Lyft reduced testing costs by 75%.
4. Microservices Architecture
Lyft runs more than 150 microservices that use Amazon DynamoDB, Amazon EKS, and AWS Lambda — allowing individual workloads to scale as needed for the myriad processes involved in the on-demand ride sharing service.
5. Pre-Built Container Configuration
In addition to Amazon EKS, Lyft uses Amazon EC2 Container Registry (ECR) to store container images and deliver these images to test and deployment systems. They likely have a good start on the battle for container optimization, though in general, this market will mature greatly this year – so it’s something they’re sure to continue to optimize.
Things Lyft Needs to Do to Keep their Infrastructure Optimized
The case studies and press releases mentioned above, as well as Lyft’s own engineering blog, give some insight into their tech stack and processes. Beyond that, there are several things they may well be focusing on, that we would highly recommend as they continue to scale (and IPO):
Many cloud customers we talk to name governance as their top priority. Automated policies and user roles are key for ensuring that no one can spend outside their bounds. Sometimes, it’s as simple an idea as proper tagging – but one that can set automated processes in motion to assign resource access to team members, proper on/off schedules for non-production resources, and configuration management processes.
2. Resource Rightsizing
Our recent research showed that average CPU utilization for the instances in our data set (which leaned non-production) was less than 5%. Given that going one instance size down can save 50% of the cost, and two sizes can save 75%, this is a huge area for optimization that we recommend cloud users of all sizes focus on this year. At Lyft’s scale, this will require automated policies to resize underutilized resources automatically.
3. Continuous Evaluation of Microservices
With 150 microservices, blanket policies won’t apply to all cases. Each microservice needs to be evaluated against newer AWS offerings and cost control techniques on an individual basis. Once each of the 150 has been evaluated, it’s time to go back to the beginning of the list and start again — a mindset of continuous cost control would serve them well.
Lyft has gotten this far built and grown on AWS — and their “culture of cloud” has enabled the growth in platform adoption that has brought them to the brink of IPO. One thing is clear: up to this point, growth at any cost has been the goal. That means that the mere amount of cloud spend has not been of huge concern. As they transition into being a public company, margins and profit will start to matter more, which will bring costs into focus. It will soon be important for Lyft to continually optimize infrastructure – in the cloud and across the board.