When approaching new problems, such as cost optimization or task automation, development and IT teams are faced with the decision to buy vs. build a solution. There are a number of financial and strategic factors to consider when determining the best choice in each case, which can be difficult to parse through. Here are our tips for building a buy vs. build business case, whether for your own use or to present to management.
Reasons to Build Your Own Solution
1. An off-the-shelf product doesn’t exist to solve your problem. If you can’t buy a product, or hack together several different existing solutions, you are probably going to have to build your own software. There is not too much “blue ocean” left out there, but if you have a need and no product can solve it, then it can make sense. Be wary and make sure you’ve completed your research before determining this is the case: perhaps the solution is called something other than what you’re searching, or exists as part of a larger suite of offerings.
2. It will provide you with a significant competitive advantage over your rivals. This typically requires unique IP (some special sauce) that you can build into the product which other existing products can not offer and which will help your company succeed.
3. You can see a business opportunity whereby not only can you use the product yourself in-house, but you will also be able to offer it to your customers, thus leveraging your company’s investment.
4. You have a team of engineers sitting on the bench with nothing better to do (i.e. minimal opportunity cost). This does actually happen from time-to-time and such a project can make them productive.
5. The specialist knowledge already exists within the company and a natural product owner exists. This is not reason enough to decide to build, but without it, things are likely more difficult.
Reasons to Buy Pre-Built Solutions
1. Building software is complex and expensive. If this is a software product that you are going to roll out across the enterprise, it will require support and likely a commitment for the life of the product to feature updates and improvements.
2. Supporting products that your team might build is a significant commitment and typically is where the ‘big bucks are spent’. An MVP style product is unlikely to keep the masses happy for long, and you will need to budget for ongoing updates, improvements, patching and support. This typically multiplies the cost of building v1.0.
3. Commercializing a product built primarily for in-house usage is a great theory but in reality rarely works. Such examples do exist but are few and far between. Building a new product company requires a lot more than just technology and execution risk is high unless it is to become the #1 priority for your company.
4. A long time to value of a new product venture means that you are often missing out on significant value which would be realized if an existing ‘off the shelf’ (today that often means a SaaS solution) were selected.
5. Enterprise-grade software comes with the bells and whistles that enterprises need. This typically means lots of points of integration, single sign-on requirements, and security as a given. Home-baked products typically do not include these items which are considered ‘added extras’ and not core to solving the problem at hand.
Create Your Business Case
If you work in an organization with access to technical resources (which today includes a lot of companies), there is often a desire to build because “they can” and a sense that they can meet the needs in a more custom manner solving the precise needs of their organization. Even if the opportunity cost of diverting resources away from other projects is low, there can be a tendency to overlook to include the longer term maintenance, upgrade, and support requirements of enterprise-grade software. Additionally, we often encounter companies who have started on the journey toward building an in-house solution, only to discover additional complexity or seeing internal priorities change. In such cases, even when there are significant sunk costs, reappraising alternative paths and third-party solutions can still make sense.
Ultimately, every case is unique and weighing the relative pros/cons and building the business case to buy vs. build will require considering both financial and non-financial aspects to help the right decision is made.
Have you been hearing a lot about Azure Databricks lately? We have. One of the nice things about talking with ParkMyCloud users is that we get to see trends often before they are more widely recognized within the industry. Whether it is adoption of new instances or databases, or usage of new tools and services it’s always interesting to see change occur.
What is Databricks?
One such change over the last year or so has been an enormous increase in the use of very short-lived instances, typically less than 60 minutes, which get spun up as part of clusters. These are in fact Databricks being used to undertake data analytics workloads. I had come across Databricks in relation to their unicorn status in the startup world – as of six months ago were valued at close to $4B – so I guess it was only a matter of time before we began to see the fruits of their labor become popular.
The Databricks story is an interesting one which begins at UC Berkeley with the development of a research project, Apache Spark in 2009. Apache Spark is described as a unified analytics engine for large-scale data processing. It provides an extremely rapid cluster computing technology, designed for fast computation. The team who developed Spark went on to found Databricks in 2013 since which time they have raised $500MM in funding.
The Databricks platform allows enterprises to build their data pipelines across data storage systems and prepare data sets for data scientists and engineers. To do this, Databricks offers a range of tools for building, managing and monitoring data pipelines. It enables the building of machine learning (ML) models, which have grown in parallel with the growth in big data within the enterprise.
The product also has an interesting approach to pricing with the introduction of their own usage-based billing methodology based on DBU’s. A DBU is a Databricks Unit (DBU) which is a unit of processing capability per hour, billed on per-second usage. This cost excludes the cost of the underlying instance (VM). The good thing is that the model is very transparent and provides a number of pricing options and tiers. Based on the tier and type of service required prices range from $0.07/DBU for their Standard product on the Data Engineering Light tier to $0.55 for the Premium product on the Data Analytics tier. Helpfully, they do offer online calculators for both Azure and AWS to help estimate cost including underlying infrastructure. The Azure Databricks pricing example can be seen here.
Databricks + Microsoft = Azure Databricks
A major breakthrough for the company was a unique partnership with Microsoft whereby their product is not just another item in the MS Azure Marketplace but rather is fully integrated into Azure with the ability to spin up Azure Databricks in the same way you would a virtual machine. Once running, the service can scale automatically as the users need change in the same way cloud is able to scale using autoscaling groups to match supply against demand.
Databricks are also available for other public cloud vendors, most notably AWS (available within the Marketplace). However, the level of integration is not the same as on Azure, and the service looks much more like a standard AWS marketplace offering.
Why More and More Companies are Using Azure Databricks
What is clear is that opportunities for use of ML and AI has progressed from experimentation to workloads, and these workloads are now at a massive scale. This has also been accompanied by the emergence of a new subset of DevOps called AIOps, which makes a lot of sense given the amount of infrastructure and services now needing to be configured and deployed to run such workloads.
In a forthcoming blog we will dig a little deeper in terms of the usage patterns for such workloads and the changes in terms of the way organizations running these workloads are now utilizing the public cloud for these non-production workloads.
AWS Firecracker was announced at AWS re:Invent in November 2018 as a new AWS open source virtualization technology. The technology is purpose-built for creating and managing secure, multi-tenant container and function-based services. It was described by the AWS Chief Evangelist Jeff Barr as “what a virtual machine would look like if it was designed for today’s world of containers and functions.”
What is AWS Firecracker?
Firecracker is a Virtual Machine Manager (VMM) exclusively designed for running transient and short-lived processes. In other words, it helps to optimize the running of functions and serverless workloads. It’s also an important new component in the emerging world of serverless technologies and is used to enhance the backend implementation of Lambda and Fargate. Firecracker helps deliver the speed of containers combined with the security of VMs. If you use Lambda or Fargate, you’re already receiving the benefits of Firecracker. However, if you run/orchestrate a large volume of containers, you should take a look at this service with optimization in mind.
How AWS Firecracker Creates Efficiencies
AWS can realize the economic benefits of Firecracker by creating what they call “microVMs”, which allows them to spread serverless workloads around multiple servers thus getting a greater ROI from its investment in the servers behind serverless. In terms of customer benefit, using Firecracker enables these new microVMs to launch in 125 milliseconds or less, compared to the seconds (or longer) it can take to launch a container or spin up a traditional virtual machine. In a world where thousands of VMs can be spun up and down to tackle a specific workload, this will constitute a significant savings. And remember, these are fully fledged micro virtual machines, not just containers.The micro VM’s themselves are worth a closer look as each includes an in-process rate limiter to optimize shared network and storage resources. As a result, one server can support thousands of microVMs with widely varying processor and memory configurations.\
There is also the enhanced security and workload isolation only available from Kernel-based Virtual Machine (KVMs) – more secure than containers, which are less isolated. One particularly valuable security feature is that Firecracker is statically linked, which means all the libraries it needs to run are included in its executable code. This makes new Firecracker environments safer by eliminating outside libraries. Altogether, this offering and the combination of efficiency, security and speed created quite the buzz at the AWS re:Invent launch.
Will Firecracker make a “bang”?
There are a few caveats related to the still novel aspects of the technology. In particular, compared to alternatives, such as containers or Hyper-V VMs, it is prudent to confine to non-production workloads as the technology is still new and needs to be more fully battle-tested for production use.
However, as confidence, adoption, and experience grow in the use of serverless technologies it certainly seems like Firecracker can offer a popular new method for provisioning compute resources and will likely help bridge the current gap between VMs and containers.
Once or twice a year we like to take a look at what is going on in the world of reserved instance pricing. We review both the latest offerings and options put out by cloud providers, as well as how users are choosing to use Reserved Instances (AWS), Reserved VMs (Azure) and Committed Use (Google Cloud).
A good place to start when it comes to usage patterns and trends is the annual Rightscale (Flexera) State of Cloud Report. The 2019 report shows that current reservation usage stands at 47% for AWS, 23% for Azure and 10 percent of GCP. These are some interesting data when you view them alongside companies overall reporting that their number one cloud initiative for the coming year is optimizing their existing use of the cloud. All of these cloud providers have a major focus on pre-selling infrastructure via their reservations programs as this provides them with predictable revenue (something much loved by Wall St) plus also allows them to plan for and match supply with demand. In return for an upfront commitment they offer discounts of ‘up to 80%”, albeit much as your local furniture retailer has big saving headlines, these discount levels still warrant further investigation.
While working on an upcoming new feature release we began to dig a little deeper into the nature of current reserved instance pricing and discounts. From our research it appears that a real world discount level is in the 30%-50% range. To achieve some of the much higher level discounts you might see the cloud providers pushing, typically requires commitments of three years; being restricted to only certain regions; restrictions on OS types; and generally a willingness to commit to spending a few million dollars.
Reservation discounts, while not as volatile as spot instances, do change and need to be carefully monitored and analyzed. For example as of this writing, one of the more popular modern m5.large instance types in a US East Region costs $0.096 per hour when purchased on demand, but reduces to $0.037, a significant 62% saving. However, to secure such a discount requires a three-year commitment and prepayment in full up front. While the numbers of such organizations committing to contracts of this nature is not publicly known, it is likely that only the most confident of organizations with large cash reserves would be positioned to make a play like this.
Depending on the precise program used to purchase the reservations, there can be certain options to either convert specific instance families, instance types and OS’s for other types or even to resell the instances on a secondary exchange for a penalty fee of 12%, on AWS for example. Or to terminate the agreement for the same 12% fee on Azure. GCP’s Committed Use program seems to be the most stringent as there is no way to cancel the contract or resell pre-purchased instances, albeit Google does not offer a pre-purchase option.
As the challenge of optimizing cloud spend has slowly moved up the priority list to take the #1 slot, so has a maturation process taken place inside organizations when it comes to undertaking economic analysis and understanding the various tradeoffs. Some organizations are using tools to support such analysis, others are hiring consultants or using in house analytics resources. Whatever the approach in terms of analyzing an organization’s use of cloud, this typically requires looking at balancing the purchase of different types of reservations, spot instances or using on-demand infrastructure that is highly optimized through automation tools. Whatever the approach, the level of complexity in such analysis is certainly not reducing, and mistakes are common. However, the potential savings are significant if you achieve the right balance and is clearly something you should not ignore.
The relative balance between the different options to purchase and consume cloud services in many ways reflects the overall context within which organizations operate, their specific business models and broader macro issues such as the outlook for the overall economy. Understanding the breadth of options is key and although for most organizations, reservations are likely to be a key component it is worth digging into just how large the relative trade offs might be.
As container technology moves past something new and into the mainstream, users are concerned about the next step: container optimization. In our conversations with customers and potential customers, containers have been a consistent topic for the last few years, typically focused on production environments. However, recent conversations have become more focused, specifically on how to optimize container spending.
Kubernetes – which seems to be the most popular of container services among our customer base – does allow for a number of ways to optimize for costs and to maximize performance. We have identified five specific opportunities ripe for container optimization. Take a look at these within your own environments.
1) Rightsize Your Pods
Kubernetes Pods are the smallest deployable computing units in the Kubernetes container environment. It is a common practice to use a standard template for limits and requests for pod provisioning. If requests describe the minimal requirement for the CPU and memory for a pod to be scheduled on a node, the limits describe the max amount of CPU and memory the pod can consume on that specific node. Typically engineers set the initial limits by using a rule of thumb, such as doubling it just to be on the safe side and then planning to change it later once they have some data to look at. As with many things in life, “later” rarely happens. As a result, the footprint of the cluster inflates over time, exceeding the actual demand for the services running inside the cluster.
Just think about it, if every pod is over-provisioned by 50% and the cluster is always is 80% full, that means that 40% of the cluster capacity is allocated but not used, or simply put — wasted.
2) Turn Off Idle Pods
Many standard instances/VMs and databases in non-production environments are idle outside of working hours and can be turned off or “parked”. The same case exists for Pods, which in non-production environments can and should be scheduled in the same way.
3) Rightsize Your Nodes
Too many worker nodes are the wrong size and type. Kubernetes permits co-allocating the applications on the same nodes, which can dramatically reduce the cloud bill. Yet, incorrectly sized instances and volumes can lead to the inflation of the cost of Kubernetes clusters. Rightsizing could save up to 50% (particularly if no previous action has been taken to rightsize your nodes.)
Another thing to consider is that smaller nodes have a higher relative OS footprint and increase management overhead. The smaller the node, the higher the number of stranded resources. Stranded resources are CPU or memory which are idle, yet cannot be allocated to any of the pods, because the pods which are to be scheduled are too big to claim it. If a pod’s sizes are close to the size of the node (server) the percentage of the resources which are stranded gets higher.
4) Consider Storage Opportunities
Out of the box, containers lose their data when they restart. This is fine for stateless components but becomes an issue when a persistent data store is required. One place to look for additional container optimization opportunities is the overprovisioning of persistent storage (EBS, Azure Storage Disks, etc) related to your containers. There are a number of options to optimize container storage, particularly virtualized storage that can be shared by multiple containers, and which persists over time, without being destroyed when individual containers are destroyed. There are a few different persistent-storage plugins and plugin-driven storage solutions available from third-party vendors.
5) Review Purchasing Options
All of the preceding options related to the actual configuration of your container infrastructure. Just as important as this is ensuring that your purchasing options closely align with your needs. Ensuring the correct instance/VM purchase type for your containerized infrastructure is critical to ensuring flexibility and maximizing ROI. Carefully analyze your purchasing options (e.g. on-demand, reservations and spot) to select the right option for your workload, both in terms of size and usage schedule. Note that reserved instances are not always the best option for resources that can be scheduled to be turned off. Leverage cost optimization tools to support the earlier options for instance scheduling and rightsizing. Such tools can often change the equation and help avoid lock-ins and upfront commitments.
Container Optimization is Just Another Kind of Resource Optimization
The opportunities to save money through container optimization are in essence no different than for your non-containerized resources. Native tools, from either the cloud provider or open source, can help with this, but their capabilities are limited. For a fully optimized environment, you’ll want to take advantage of the growing ecosystem of specific cost optimization tools.
Stay tuned for news from ParkMyCloud on this front coming soon!