Have you been hearing a lot about Azure Databricks lately? We have. One of the nice things about talking with ParkMyCloud users is that we get to see trends often before they are more widely recognized within the industry. Whether it is adoption of new instances or databases, or usage of new tools and services it’s always interesting to see change occur.
What is Databricks?
One such change over the last year or so has been an enormous increase in the use of very short-lived instances, typically less than 60 minutes, which get spun up as part of clusters. These are in fact Databricks being used to undertake data analytics workloads. I had come across Databricks in relation to their unicorn status in the startup world – as of six months ago were valued at close to $4B – so I guess it was only a matter of time before we began to see the fruits of their labor become popular.
The Databricks story is an interesting one which begins at UC Berkeley with the development of a research project, Apache Spark in 2009. Apache Spark is described as a unified analytics engine for large-scale data processing. It provides an extremely rapid cluster computing technology, designed for fast computation. The team who developed Spark went on to found Databricks in 2013 since which time they have raised $500MM in funding.
The Databricks platform allows enterprises to build their data pipelines across data storage systems and prepare data sets for data scientists and engineers. To do this, Databricks offers a range of tools for building, managing and monitoring data pipelines. It enables the building of machine learning (ML) models, which have grown in parallel with the growth in big data within the enterprise.
The product also has an interesting approach to pricing with the introduction of their own usage-based billing methodology based on DBU’s. A DBU is a Databricks Unit (DBU) which is a unit of processing capability per hour, billed on per-second usage. This cost excludes the cost of the underlying instance (VM). The good thing is that the model is very transparent and provides a number of pricing options and tiers. Based on the tier and type of service required prices range from $0.07/DBU for their Standard product on the Data Engineering Light tier to $0.55 for the Premium product on the Data Analytics tier. Helpfully, they do offer online calculators for both Azure and AWS to help estimate cost including underlying infrastructure. The Azure Databricks pricing example can be seen here.
Databricks + Microsoft = Azure Databricks
A major breakthrough for the company was a unique partnership with Microsoft whereby their product is not just another item in the MS Azure Marketplace but rather is fully integrated into Azure with the ability to spin up Azure Databricks in the same way you would a virtual machine. Once running, the service can scale automatically as the users need change in the same way cloud is able to scale using autoscaling groups to match supply against demand.
Databricks are also available for other public cloud vendors, most notably AWS (available within the Marketplace). However, the level of integration is not the same as on Azure, and the service looks much more like a standard AWS marketplace offering.
Why More and More Companies are Using Azure Databricks
What is clear is that opportunities for use of ML and AI has progressed from experimentation to workloads, and these workloads are now at a massive scale. This has also been accompanied by the emergence of a new subset of DevOps called AIOps, which makes a lot of sense given the amount of infrastructure and services now needing to be configured and deployed to run such workloads.
In a forthcoming blog we will dig a little deeper in terms of the usage patterns for such workloads and the changes in terms of the way organizations running these workloads are now utilizing the public cloud for these non-production workloads.