During its virtual Google Cloud Next ’20 “On Air” series, Google announced the introduction of BigQuery Omni. This is an extension of its existing BigQuery data analytics solution to now analyze data in multiple public clouds, currently including Google Cloud and Amazon Web Services (AWS), with Microsoft Azure coming soon. Powered by Google Cloud’s Anthos, and using a unified interface, BigQuery Omni allows developers to analyze data locally without having to move data sets between the platforms.
BigQuery Engine to Analyze Multi-Cloud Data
Google Cloud’s general manager and VP of engineering, Debanjan Saha, says “BigQuery Omni is an extension of Google Cloud’s continued innovation and commitment to multi-cloud that brings the best analytics and data warehouse technology, no matter where the data is stored.” And that, “BigQuery Omni represents a new way of analyzing data stored in multiple public clouds, which is made possible by BigQuery’s separation of compute and storage.”
According to Google Cloud, this provides scalable storage that can reside in Google Cloud or other public clouds, and stateless, resilient compute that executes standard SQL queries.
Google Cloud reports that BigQuery Omni will:
- Break down silos and gain insights on data with a flexible, multi-cloud analytics solution that doesn’t require moving or copying data from other public clouds into Google Cloud for analysis.
- Get consistent data experience across clouds and datasets with a unified analytics experience across datasets, in Google Cloud, AWS, and Azure (coming soon) using standard SQL and BigQuery’s familiar interface. BigQuery Omni supports Avro, CSV, JSON, ORC, and Parquet.
- Securely run analytics to another public cloud with a fully managed infrastructure, powered by Anthos, so you can query data without worrying about the underlying infrastructure. Users can choose the public cloud region where their data is located, and run the query.
Why is Google Aiming Multi-Cloud?
Many organizations leveraging public cloud are doing so with multiple clouds: 55% of organizations are multi-cloud according to a recent survey from IDG, and 80% according to a recent Gartner survey. (Is this actually necessary? Maybe.)
Google Cloud has been the most open to supporting this multi-cloud reality, and perhaps implicit in releases like Anthos and BigQuery Omni is Google’s recognition that it’s #3 in the market, and many of its customers have a presence in AWS or Azure.
So, BigQuery Omni actually involves physically running BigQuery clusters in the cloud on which the remote data resides. This is something that in the past, could only be done if your data was stored only in Google Cloud. Now with Kubernetes-powered Anthos, as well as the visualization tool gained in Google’s acquisition of Looker, Google is moving toward a middleware strategy. Now, it is offering services to bridge data silos, as a strategy to gain market share from its bigger competitors. Expect to see more similar service offerings coming from Google as they look to break AWS’s lead on public cloud.
Spot instances and similar “spare capacity” models are frequently cited as one of the top ways to save money on public cloud. However, we’ve noticed that fewer cloud customers are taking advantage of this discounted capacity than you might expect.
We say “spot instances” in this article for simplicity, but each cloud provider has their own name for the sale of discounted spare capacity – AWS’s spot instances, Azure’s spot VMs and Google Cloud’s preemptible VMs.
Spot instances are a type of purchasing option that allows users to take advantage of spare capacity at a low price, with the possibility that it could be reclaimed for other workloads with just brief notice.
In the past, AWS’s model required users to bid on Spot capacity. However, the model has since been simplified so users don’t actually have to bid for Spot Instances anymore. Instead, they pay the Spot price that’s in effect for the current hour for the instances that they launch. The prices are now more predictable with much less volatility. Customers still have the option to control costs by providing a maximum price that they’re willing to pay in the console when they request Spot Instances.
Spot Instances in Each Cloud
Variations of spot instances are offered across different cloud providers. AWS has Spot Instances while Google Cloud offers preemptible VMs and as of March of this year, Microsoft Azure announced an even more direct equivalent to Spot Instances, called Azure Spot Virtual Machines.
Spot VMs have replaced the preview of Azure’s low-priority VMs on scale sets – all eligible low-priority VMs on scale sets have automatically been transitioned to Spot VMs. Azure Spot VMs provide access to unused Azure compute capacity at deep discounts. Spot VMs can be evicted at any time if Azure needs capacity.
AWS spot instances have variable pricing. Azure Spot VMs offer the same characteristics as a pay-as-you-go virtual machine, the differences being pricing and evictions. Google Preemptible VMs offer a fixed discounting structure. Google’s offering is a bit more flexible, with no limitations on the instance types. Preemptible VMs are designed to be a low-cost, short-duration option for batch jobs and fault-tolerant workloads.
Adoption of Spot Instances
Our research indicates that less than 20% of cloud users use spot instances on a regular basis, despite spot being on nearly every list of ways to reduce costs (including our own).
While applications can be built to withstand interruption, specific concerns remain, such as loss of log data, exhausting capacity and fluctuation in the spot market price.
In AWS, it’s important to note that while spot prices can reach the on-demand price, since they are driven by long-term supply and demand, they don’t normally reach on-demand price.
A Spot Fleet, in which you specify a certain capacity of instances you want to maintain, is a collection of Spot Instances and can also include On-Demand Instances. AWS attempts to meet the target capacity specified by using a Spot Fleet to launch the number of Spot Instances and On-Demand Instances specified in the Spot Fleet request.
To help reduce the impact of interruptions, you can set up Spot Fleets to respond to interruption notices by hibernating or stopping instances instead of terminating when capacity is no longer available. Spot Fleets will not launch on-demand capacity if Spot capacity is not available on all the capacity pools specified.
AWS also has a capability that allows you to use Amazon EC2 Auto Scaling to scale Spot Instances – this feature also combines different EC2 instance types and pricing models. You are in control of the instance types used to build your group – groups are always looking for the lowest cost while meeting other requirements you’ve set. This option may be a popular choice for some as ASGs are more familiar to customers compared to Fleet, and more suitable for many different workload types. If you switch part or all of your ASGs over to Spot Instances, you may be able to save up to 90% when compared to On-Demand Instances.
Another interesting feature worth noting is Amazon’s capacity-optimized spot instance allocation strategy. When customers diversify their Fleet or Auto Scaling group, the system will launch capacity from the most available capacity pools, effectively decreasing interruptions. In fact, by switching to capacity-optimized allocation users are able to reduce their overall interruption rate by about 75%.
Is “Eviction” Driving People Away?
There is one main caveat when it comes to spot instances – they are interruptible. All three major cloud providers have mechanisms in place for these spare capacity resources to be interrupted, related to changes in capacity availability and/or changes in pricing.
This means workloads can be “evicted” from a spot instance or VM. Essentially, this means that if a cloud provider needs the resource at any given time, your workloads can be kicked off. You are notified when an AWS spot instance is going to be evicted: AWS emits an event two minutes prior to the actual interruption. In Azure, you can opt to receive notifications that tell you when your VM is going to be evicted. However, you will have only 30 seconds to finish any jobs and perform shutdown tasks prior to the eviction making it almost impossible to manage. Google Cloud also gives you 30 seconds to shut down your instances when you’re preempted so you can save your work for later. Google also always terminates preemptible instances after 24 hours of running. All of this means your application must be designed to be interruptible, and should expect it to happen regularly – difficult for some applications, but not so much for others that are rather stateless, or normally process work in small chunks.
Companies such as Spot – recently acquired by NetApp (congrats!) – help in this regard by safely moving the workload to another available spot instance automatically.
Our research has indicated that fewer than one-quarter of users agree that their spot eviction rate was too low to be a concern – which means for most, eviction rate is a concern. Of course, it’s certainly possible to build applications to be resilient to eviction. For instance, applications can make use of many instance types in order to tolerate market fluctuations and make appropriate bids for each type.
AWS also offers an automatic scaling feature that has the ability to increase or decrease the target capacity of your Spot Fleet automatically based on demand. The goal of this is to allow users to scale in conservatively in order to protect your application’s availability.
Early Adopters of Spot and Other Innovations May be One and the Same
People who are hesitant to build for spot more likely use regular VMs, perhaps with Reserved Instances for savings. It’s likely that people open to the idea of spot instances are the same who would be early adopters for other tech, like serverless, and no longer have a need for Spot.
For the right architecture, spot instances can provide significant savings. It’s a matter of whether you want to bother.
Google Sustainability is an effort that ranges across their business, from the Global Fishing Watch to environmental consciousness in the supply chain. Given that cloud computing has been a major draw of global energy in recent years, the amount of computing done in data centers more than quintupled between 2010 and 2018. But, the amount of energy consumed by the world’s data centers grew only six percent during that period, thanks to improvements in energy efficiency. However, that’s still a lot of power. That’s why Google’s sustainability efforts for data centers and cloud computing are especially important.
Google Cloud Sustainability Efforts – As Old as Their Data Centers
Reducing energy usage has been an initiative for Google for more than 10 years. Google has been carbon neutral since 2007, and 2019 marked the third year in a row that they’ve matched their energy usage with 100 percent renewable energy purchases. Google’s innovation in the data center market also comes from the process of building facilities from the ground up instead of buying existing infrastructures and using machine learning technology to monitor and improve power-usage-effectiveness (PUE) and find new ways to save energy in their data centers.
When comparing the big three cloud providers in terms of sustainability efforts, AWS is by far the largest source of carbon emissions from the cloud globally, due to its dominance. However, AWS’s sustainability team is investing in green energy initiatives and is striving to commit to an ambitious goal of 100% use of renewable energy by 2040 to become as carbon-neutral as Google has been. Microsoft Azure, on the other hand, has run on 100 percent renewable energy since 2014 but would be considered a low-carbon electricity consumer and that’s in part because it runs less of the world than Amazon or Google.
Nonetheless, data centers from the big three cloud providers, wherever they are, all run on electricity. How the electricity is generated is the important factor in whether they are more or less favorable for the environment. For Google, reaching 100% renewable energy purchasing on a global and annual basis was just the beginning. In addition to continuing their aggressive move forward with renewable energy technologies like wind and solar, they wanted to achieve the much more challenging long-term goal of powering operations on a region-specific, 24-7 basis with clean, zero-carbon energy.
Why Renewable Energy Needs to Be the Norm for Cloud Computing
It’s no secret that cloud computing is a drain of resources, roughly three percent of all electricity generated on the planet. That’s why it’s important for Google and other cloud providers to be part of the solution to solving global climate change. Renewable energy is an important element, as is matching the energy use from operations and by helping to create pathways for others to purchase clean energy. However, it’s not just about fighting climate change. Purchasing energy from renewable resources also makes good business sense, for two key reasons:
- Renewables are cost-effective – The cost to produce renewable energy technologies like wind and solar had come down precipitously in recent years. By 2016, the levelized cost of wind had come down 60% and the levelized cost of solar had come down 80%. In fact, in some areas, renewable energy is the cheapest form of energy available on the grid. Reducing the cost to run servers reduces the cost for public cloud customers – and we’re in favor of anything that does that.
- Renewable energy inputs like wind and sunlight are essentially free – Having no fuel input for most renewables allows Google to eliminate exposure to fuel-price volatility and especially helpful when managing a global portfolio of operations in a wide variety of markets.
Google Sustainability in the Cloud Goes “Carbon Intelligent”
In continuum with their goals for data centers to consume more energy from renewable resources, Google recently revealed in their latest announcement that it will also be time-shifting workloads to take advantage of these resources and make data centers run harder when the sun shines and the wind blows.
“We designed and deployed this first-of-its-kind system for our hyperscale (meaning very large) data centers to shift the timing of many compute tasks to when low-carbon power sources, like wind and solar, are most plentiful.”, Google announced.
Google’s latest advancement in sustainability is a newly developed carbon-intelligent computing platform that seems to work by using two forecasts – one indicating future carbon intensity of the local electrical grid near its data center and another of its own capacity requirements – and using that data “align compute tasks with times of low-carbon electricity supply.” The result is that workloads run when Google believes it can do so while generating the lowest-possible CO2 emissions.
The carbon-intelligent computing platform’s first version will focus on shifting tasks to different times of the day, within the same data center. But, Google already has plans to expand its capability, in addition to shifting time, it will also move flexible compute tasks between different data centers so that more work is completed when and where doing so is more environmentally friendly. As the platform continues to generate data, Google will document its research and share it with other organizations in hopes they can also develop similar tools and follow suit.
Leveraging forecasting with artificial intelligence and machine learning is the next best thing and Google is utilizing this powerful combination in their platform to anticipate workloads and improve the overall health and performance of their data center to be more efficient. Combined with efforts to use cloud resources efficiently by only running VMs when needed, and not oversizing, resource utilization can be improved to reduce your carbon footprint and save money.
As you accelerate your organization’s containerization in the cloud, key stakeholders may worry about putting all your eggs in one cloud provider’s basket. This combination of fears – both a fear of converting your existing (or new) workloads into containers, plus a fear of being too dependent on a single cloud provider like Amazon AWS, Microsoft Azure, or Google Cloud – can lead to hasty decisions to use less-than-best-fit technologies. But what if using more of your chosen cloud provider’s features meant you were less reliant on that cloud provider?
The Core Benefit of Containers
Something that can get lost in the debate about whether containerization is good or worthwhile is the feature of portability. When Docker containers were first being discussed, one of the main use cases was the ability to run the container on any hardware in any datacenter without worrying if it would be compatible. This seemed to be a logical progression from virtual machines, which had provided the ability to run a machine image on different hardware, or even multiple machines on the same hardware. Most container advocates seem to latch on to this from the perspective of container density and maximizing hardware resources, which makes much more sense in the on-prem datacenter world.
In the cloud, however, hardware resource utilization is now someone else’s problem. You choose your VM or container size and pay just for that size, instead of having to buy a whole physical server and pay for the entirety of it up-front. Workload density still matters, but is much more flexible than on-prem datacenters and hardware. With a shift to containers as the base unit instead of Virtual Machines, your deployment options in the cloud are numerous. This is where container portability comes into play.
The Dreaded “Vendor Lock-in”
Picking a cloud provider is a daunting task, and choosing one and later migrating away from it can have enormous impacts of lost money and lost time. But do you need to worry about vendor lock-in? What if, in fact, you can pivot to another provider down the road with minimal disruption and no application refactoring?
Implementing containerization in the cloud means that if you ever choose to move your workloads to a different cloud provider, you’ll only need to focus on pointing your tooling to the new provider’s APIs, instead of having to test and tinker with the packaged application container. You also have the option of running the same workload on-prem, so you could choose to move out of the cloud as well. That’s not to say that there would be no effort involved, but the major challenge of “will my application work in this environment” is already solved for you. This can help your Operations team and your Finance team to worry less about the initial choice of cloud, since your containers should work anywhere. Your environment will be more agile, and you can focus on other factors (like cost) when considering your infrastructure options.
If you’re in engineering or development, communicating about cloud infrastructure and other software development costs with your finance department is tricky. For one thing, those costs are almost certainly rising.
Also, you are in different roles with different priorities. This naturally creates barriers of communication. You may think your development costs are perfectly reasonable while your CFO thinks there’s a problem – or you may be focused on different parts of the bill than your colleagues in finance are.
Here are some ways to break down that communication barrier and make your software development costs sound a little less scary.
Use the CFO’s Language
Engineering and finance use different language to talk about the same things – which means there’s going to be an element of translation involved. Before meeting with someone who lives in a different day-to-day world than you do, consider how they may talk about cost areas in a way that’s meaningful to their role. For example:
- Dev-speak: “Non-production workloads” – or dev or test or stage
- Finance-speak: R&D costs
- Dev-speak: “Production workloads”
- Finance-speak: Cost of goods sold or COGS
Focus on Business Growth Impact
So your software development costs are probably going up. There will be some wasted spend that can be eliminated, but for the most part, this growth is unavoidable for a growing business. Highlight the end results that drove decisions to increase spending on software development, for example:
- We increased our headcount and sprint velocity to speed time to market and beat our competition for offering A.
- We are developing multiple applications in parallel.
- Our user base is growing, which is increasing our infrastructure costs.
- Our open bug count is down by 50% YOY, increasing customer satisfaction and retention.
Know the Details, But Don’t Get Bogged Down in Them
Are your S3 costs surging? Did you just commit to a bunch of 3-year reserved instances upfront (wait –– did you really?) Did your average salary per developer increase due to specialized skill requirements, or by moving outsourced QA in-house?
You should know the answers to all of these questions, but there’s no need to lead with them in a conversation. Use them as supporting information to answer questions, but not the headline.
Share Your Cost Control Plans – and Automate
Everybody likes an action plan. Identify the areas where you can reduce costs.
- Consider roles where outsourcing may be prudent – such as apps outside your core offering
- Automate QA testing – You’re not going to replace human software developers with bots (yet), but there are a few areas where automation can reduce costs, such as QA testing.
- Optimize your existing infrastructure to turn off when not needed, and size resources to match demand based on utilization metrics, automatically
- Reduce other wasted infrastructure spend by decommissioning legacy systems,
Like many things in business, effective communication and collaboration can go a long way. While it’s important to optimize costs to make your software development costs go the furthest, they are going to continue to rise. And that’s okay.
When it comes to AWS training resources, there’s no shortage of information out there. Considering the wide range of videos, tutorials, blogs, and more, it’s hard knowing where to look or how to begin. Finding the best resource depends on your learning style, your needs for AWS, and getting the most updated information available. Whether you’re just getting started in AWS or consider yourself an expert, there’s an abundance of resources for every learning level. With this in mind, we came up with our 7 favorite AWS training resources, sure to give you the tools you need to learn AWS:
1. AWS Self-Paced Labs
What better way to learn that at your own pace? AWS self-paced labs give you hands-on learning in a live AWS environment, with AWS cloud services, and actual scenarios you would encounter in the cloud. There are two different ways to learn with these labs. You can either take an individual lab or follow a learning quest. Individual labs are intended to help users get familiar with an AWS service as quickly as 15 minutes. Learning quests guide you through a series of labs so you can master any AWS scenario at your own pace. Once completed, you will earn a badge that you can boast on your resume, LinkedIn, website, etc.
Whatever your experience level may be, there are plenty of different options offered. Among the recommended labs you’ll find an Introduction to Amazon Elastic Compute Cloud (EC2), and for more advanced users, a lab on Maintaining High Availability with Auto Scaling (for Linux).
2. AWS Free Tier
Sometimes the best way to learn something is by jumping right in. With the AWS Free Tier, you can try AWS services for free. This is a great way to test out AWS for your business, or for the developers out there, to try services like AWS CodePipeLine, AWS Data Pipeline, and more. While you are still getting a hands-on opportunity to learn a number of AWS services, the only downside is that there are certain usage limits. You can track your usage with a billing alarm to avoid unwanted charges, or you can try ParkMyCloud and park your instances when they’re not in use so you get the most out of your free tier experience. In fact, ParkMyCloud started its journey by using AWS’s free tier!
3. AWS Documentation and Whitepapers
AWS Documentation is like a virtual encyclopedia of tools, terms, training, and everything AWS. You’ll find case studies, tutorials, cloud computing basics, and so much more. This resource is a one-stop-shop for all of your AWS documentation needs, whether you’re a beginner or advanced user. No matter where you are in your AWS training journey, AWS documentation is always a useful reference and certainly deserves a spot in your bookmarks.
Additionally, you’ll find whitepapers that give users access to technical AWS content that is written by AWS and individuals from the AWS community, to help further your knowledge of their cloud. These whitepapers include things from technical guides, reference material, and architecture diagrams.
So far, we’ve gone straight to the source for 3 out of 7 of our favorite AWS training resources. Amazon really does a great job of providing hands-on training, tutorials, and documentation for users with a range of experience. However, YouTube opens up a whole new world of video training that includes contributions from not only Amazon, but other great resources as well. Besides the obvious Amazon Web Services channel, there are also popular and highly rated videos by Edureka, Simplilearn, Eli the Computer Guy, and more.
As cloud technology usage continues to expand and evolve, blogs are a great way to stay up to speed with AWS and the world of cloud computing. Of course, in addition to aws labs, a free-trial, extensive documentation, and their own YouTube channel, AWS also has their own blog. Since AWS actually has a number of blogs that vary by region and technology, we recommend that you start by following Jeff Barr – Chief Evangelist at Amazon Web Services, and primary contributor. Edureka was mentioned in our recommended YouTube channels, they also have a blog that covers plenty of AWS topics. The CloudThat blog is an excellent resource for AWS and all things cloud, and was co-founded by Bhaves Goswami – a former member of the AWS product development team. Additionally, AWS Insider is a great source for all things AWS. Here you’ll find blogs, webcasts, how-to, tips, tricks, news articles and even more hands-on guidance for working with AWS. If you prefer newsletters straight to your inbox, check out Last Week in AWS and Inside Cloud.
6. Online Learning Platforms
As public cloud computing continues to grow – and AWS continues to dominate the market – people have become increasingly interested in this CSP and what it has to offer. In the last 8-10 years, two massive learning platforms were developed, Coursera and Udemy. These platforms offer online AWS courses, specializations, training, and degrees. The abundance of courses that these platforms provide can help you learn all things AWS and give you a wide array of resources to help you train for different AWS certifications and degrees.
GitHub is a developer platform where users work together to review and host code, build software and manage projects. This platform has access to a number of materials that can help further your AWS training. In fact, here’s a great list of AWS training resources that can help you prepare for an Amazon Cloud certification. The great thing about this site is the collaboration among the users. The large number of people in this community brings together people from all different backgrounds so they are able to provide knowledge about their own specialties and experiences. With access to everything from ebooks, video courses, free lectures, and sample tests, posts like these can help you get on the right certification track.
There’s plenty of information out there when it comes to AWS training resources. We picked our 7 favorite resources for their reliability, quality, and range of information. Whether you’re new to AWS or consider yourself an expert, these resources are sure to help you find what you’re looking for.