There’s a simple fact for public cloud users today: you need to use cloud agnostic tools. Yes – even if you only use one public cloud. Why? This recommendation comes down to a few drivers that we see time and time again.
You won’t always use just this cloud
There is an enterprise IT trend to multi-cloud and hybrid cloud – such a prevalent trend that even if you are currently single-cloud, you should plan for the eventuality of using more than one cloud, as the multi-cloud future has arrived. Dave Bartoletti, VP and Principal Analyst at Forrester Research, who broke down multi-cloud and hybrid cloud by the numbers:
- 62 percent of public cloud adopters are using 2+ unique cloud platforms
- 74 per cent of enterprises describe their strategy as hybrid/multi-cloud today
In addition, standardizing on cloud agnostic tools also can alleviate costs associated with policy design, deployment, and enforcement across different cloud environments. Management and monitoring using the same service platform greatly reduces the issue of mismatched security policies and uncertainty in enforcement. Cloud agnostic tools that also operate in the context of the data center — whether in a cloud, virtualized, container, or traditional infrastructure — are a boon for organizations who need to be agile and move quickly. Being able to reuse policies and services across the entire multi-cloud spectrum reduces friction in the deployment process and offers assurances in consistency of performance and security.
How do you decide what tools to adopt?
We talk to different size enterprises using the cloud on a daily basis, and always ask if they are using cloud native tools, or if they are using third party tools that are cloud agnostic. The answer – it’s a mix to be sure, often it’s a mix between cloud-native and third-party tools within the same enterprise.
What we hear is that managing the cloud infrastructure is quite a complex job, especially when you have different clouds, technologies, and a diverse and opinionated user community to support. So a common theme with many of the third-party tools we see used tend to include freemium models, a technology someone used at a previous company, tools recommended by the cloud services provider (CSP) themselves, and open-API-driven solutions that allow for maximum automation in their cloud operations. It also serves the tools vendors well if deploying the tool includes minimum effort — in other words, SaaS tools that do not require a bunch of services and integration work. Plug and play is a must.
For context, here at ParkMyCloud support AWS, Azure, Google and Alibaba clouds, and usually talk to DevOps and IT Ops folks responsible for their cloud infrastructure. And those folks are usually after cloud cost control and governance when speaking with us. So our conversations tend to focus on the tools they use and need for cloud infrastructure management like CI/CD, monitoring, cost control, cost visibility and optimization, and user governance. For user governance and internal communication, Single-sign On and ChatOps are must have.
So we decided to compile a list of the most common clouds and tools we run across here at ParkMyCloud, in order of popularity:
- Cloud Service Provider
- AWS, Google Cloud, Microsoft Azure, Alibaba Cloud – and we do get requests for IBM and Oracle clouds
- Infrastructure Monitoring (not APM)
- Cloud Native (AWS CloudWatch, Azure Metrics, Google Stackdriver), DataDog, Nagios, SolarWinds, Microsoft, BMC, Zabbix, IBM
- Cost Visibility and Optimization
- CloudHealth Technologies, Cloudability, Cloudyn/Azure Cost Management, Apptio
- CI/CD + DevOps (this is broad but these are most common names we hear that fit into this category)
- Cloud Native, CloudBees Jenkins, Atlassian Bamboo, HashiCorp, Spinnaker, Travis CI
- Single Sign-On (SSO)
- ADFS, Ping, Okta, Azure AD, Centrify, One Login, Google OAuth, JumpCloud
- Slack, Microsoft Teams, Google Hangouts
- Cloud Cost Control
- Cloud Native/Scripter, ParkMyCloud, GorillaStack, Skeddly, Nutanix (BotMetric)
Beat the curve with cloud agnostic tools
Our suggestion is to use cloud agnostic tools wherever possible. Our experience tells us that a majority of the enterprises lean this way anyways. The upfront cost in terms of license fee and/or set up could be more, but we think it comes down to (1) most people will end up hybrid/multi-cloud in the future, even if they aren’t now, and (2) cloud agnostic tools are more likely to meet your needs as a user, as the companies building those tools will stay laser-focused on supporting and improving said functionality across the big CSPs.
Lately, we’ve been thinking about cloud computing jobs and titles we’ve been seeing in the space. One of the great things about talking with ParkMyCloud users is that we get to talk to a variety of different people. That’s right – even though we’re laser-focused on cloud cost optimization, it turns out that can matter to a lot of different people in an organization. (And no wonder, given the size of wasted spend – that hits people’s’ buttons).
You know the cloud computing market is growing. You know that means new employment opportunities, and new niches in which to make yourself valuable. So what cloud computing jobs should you check out?
If you are a sysadmin or ops engineer:
Cloud Operations. Cloud operations engineers, managers, and similar are the people we speak with most often at ParkMyCloud, and they are typically the cloud infrastructure experts in the organization. This is a great opportunity for sysadmins looking to work in newer technology.
If you’re interested in cloud operations, definitely work on certifications from AWS, Azure, Google, or your cloud provider of choice. Attend meetups and subscribe to industry blogs – the cloud providers innovate at a rapid pace, and the better you keep up with their products and solutions, the more competitive you’ll be.
See also: DevOps, cloud infrastructure, cloud architecture, and IT Operations.
If you like technology but you also like working with people:
Customer Success, cloud support, or other customer-facing job at a managed service provider (MSP). As we recently discussed, there’s a growing market of small IT providers focusing on hybrid cloud in the managed services space. The opportunities at MSPs aren’t limited to customer success, of course – just in the past week we’ve talked to people with the following titles at MSPs: Cloud Analyst, Cloud Engineer, Cloud Champion/Cloud Optimization Engineer, CTO, and Engagement Architect.
Also consider: pre-sales engineering at one of the many software providers in the cloud space.
If you love process:
Site Reliability Engineer. This title, invented by Google, is used for operations specialists who focus on keeping the lights on and the sites running. Job descriptions in this discipline tend to focus on people and processes rather than around the specific infrastructure or tools.
If you have a financial background:
Cloud Financial Analyst. See also: cloud cost analyst, cloud financial administrator, IT billing analyst, and similar. Cloud computing jobs aren’t just for technical people — there is a growing field that allows experts to adapt financial skills to this hot market. As mentioned above, since the cloud cost problem is only going to grow, IT organizations need professionals in financial roles focused on cloud. Certifications from cloud providers can be a great way to stand out.
What cloud computing jobs are coming next?
As the cloud market continues to grow and change, there will be new cloud computing job opportunities – and it can be difficult to predict what’s coming next. Just a few years ago, it was rare to meet someone running an entire cloud enablement team, but that’s becoming the norm at larger, tech-forward organizations. We also see a trend of companies narrowing in “DevOps” roles to have professionals focused on “CloudOps” specifically — as well as variations such as DevFinOps. And although some people hear “automation” and worry that their jobs will disappear, there will always be a need for someone to keep the automation engines running and optimized. We’ll be here.
In the world of infrastructure as code, the biggest divide seems to come in the war between Hashicorp’s Terraform vs. CloudFormation in AWS. Both tools can help you deploy new cloud infrastructure in a repeatable way, but have some pretty big differences that can mean the difference between a smooth rollout or a never ending battle with your tooling. Let’s look at some of the similarities and some of the differences between the two.
While the tools have some very unique features, they also share some common aspects. In general, both CloudFormation and Terraform help you provision new AWS resources from a text file. This means you can iterate and manage the entire infrastructure stack the same as you would any other piece of code. Both tools are also declarative, which means you define what you want the end goal to be, rather than saying how to get there (such as with tools like Chef or Puppet). This isn’t necessarily a good or bad thing, but is good to know if you’re used to other config management tools.
Unique Characteristics of CloudFormation
One of the biggest benefits of using CloudFormation is that it is an AWS product, which means it has tighter tie-ins to other AWS services. This can be a huge benefit if you’re all-in on AWS products and services, as this can help you maximize your cost-effectiveness and efficiency within the AWS ecosystem. CloudFormation also makes use of either YAML or JSON as the format for your code, which might be familiar to those with dev experience. Along the same lines, each change to your infrastructure is a changeset from the previous one, so devs will feel right at home.
There’s some additional tools available around CloudFormation, such as:
- Stacker – for handling multiple CloudFormation stacks simultaneously
- Troposphere -if you prefer python for creating your configuration files
- StackMaster – if you prefer Ruby
- Sceptre – for organizing CloudFormation stacks into environments
Unique Characteristics of Terraform
Just as being an AWS product is a benefit of CloudFormation if you’re in AWS, the fact that Terraform isn’t affiliated with any particular cloud makes it much more suited for multi-cloud and hybrid-cloud environments, and of course, for non-AWS clouds. There are Terraform modules for almost any major cloud or hypervisor in the Terraform Registry, and you can even write your own modules if necessary.
Terraform treats all deployed infrastructure as a state, with any subsequent changes to any particular piece being an update to the state (unlike the changesets mentioned above for CloudFormation). This means you can keep the state and share it, so others know what your stack should look like, and also means you can see what would change if you modify part of your configuration before you actually decide to do it. The Terraform configuration files are written in HCL (Hashicorp Configuration Language), which some consider easier to read than JSON or YAML.
More on Terraform: How to Use Terraform Provisioning and ParkMyCloud to Manage AWS
Terraform vs. CloudFormation: Which to choose?
The good news is that if you’re trying to decide between Terraform vs. CloudFormation, you can’t really go wrong with either. Both tools have large communities with lots of support and examples, and both can really get the job done in terms of creating stacks of resources in your environments. They are both also free, with CloudFormation having no costs (aside from the infrastructure that gets created) and Terraform being open-source while offering a paid Enterprise version for additional collaboration and governance options. Each has their pros and cons, but using either one will help you scale up your infrastructure and manage it all as code.
Alibaba Cloud is growing at an amazing rate, recently claiming to have overtaken both Google and IBM as the #3 public cloud provider globally, and certainly the #1 provider in China. Many sites and services hosted outside China are accessible from within China, but can suffer high latency and potentially lost functionality if their web interface requires interaction with blocked social media systems. As such, it is no surprise that a number of our (non-Chinese) customers have expressed interest in actually running virtual machine Alibaba instances in China. In this blog we are going to outline the process…and give an alternate plan.
General Process to Run Alibaba Instances in China
The steps to roll-out a deployment on Alibaba in mainland China are relatively clear:
- Establish a “legal commercial entity” in Mainland China.
- Select what services you want to run on Alibaba Cloud
- Apply for Internet Content Provider (ICP) certification
The first three steps are described in more detail below.
Establish a Legal Commercial Entity
Or putting it another way – you need to have an office in China. This can range from an actual office with your own employees, to a Joint Venture, which is a legal LLC between your organization and an established Chinese company. If your service is more informational in nature and is not actually selling anything via the service, then this can be relatively easy, taking only a couple weeks (at least for the legal side), though you will still need to find a Joint Venture partner and make the deal worth their while financially. For commerce or trade-related services, the complexity, time requirements, and costs start going up significantly.
What to run on Alibaba Cloud
There is a decision-point here, as there is one set of rules for Alibaba-hosted web/app servers, and additional rules for everything else. Base virtual machines, databases and other such core IT building blocks require the ICP registration described below, plus “real-name registration”, where a passport is needed to actually confirm the identity of whomever is purchasing the resource. If all you need is a web server, then you can skip this step. In either case, some of the filing requirements involve having a server and/or DNS record prepared in order to complete the later steps. A web site does not need to be completely finished until launch, but a placeholder may be needed.
Internet Content Provider (ICP) certification
There are two flavors of ICP certification:
- A “simple” ICP Filing – which is the bare minimum needed for informational websites that are not directly generating revenue.
- ICP Commercial Filing – This starts with getting an approved ICP Filing, and then also includes a Commercial License that must be obtained a province/municipality in China. In some cases, this appears to be related to which Alibaba region you are using, and even the physical location of your public IP address.
Many references recommend finding an experienced consultant to guide you through these processes, and it is easy to see why!
OK…WAY too much work. What is Plan B?
The other way to run Alibaba instances in China is to host your site or services in Hong Kong. All of the rules described above apply to “Mainland China”, which does not include Hong Kong. Taiwan is also not included in Mainland China, but Hong Kong has the advantage of being better connected to the rest of China. If the main problem you are trying to solve is to reduce latency to your site for China-based customers, Hong Kong is the closest you can get without actually being there, and Alibaba appears to do a pretty good job optimizing the Hong Kong experience. No local office or legal filings required!
Once you are all set up: Optimize your Costs!
After your instances are set up, make sure you’re optimizing Alibaba costs. Our Mainland China-based customers using Alibaba have confirmed that ParkMyCloud is able to access the Alibaba APIs from our US-based servers – so you can go ahead and try it out.
Since the beginning of public cloud, users have been attempting to improve cloud automation. This can be driven by laziness, scale, organizational mandate, or some combination of those. Since the rise of DevOps practices and principles, this “automate everything” approach has become even more popular, as it’s one of the main pillars of DevOps. One of the ways you can help sort, filter, and automate your cloud environment is to utilize tags on your cloud resources.
In the cloud infrastructure world, tags are labels or identifiers that are attached to your instances. This is a way for you to provide custom metadata to accompany the existing metadata, such as instance family and size, region, VPC, IP information, and more. Tags are created as key/value pairs, although the value is optional if you just want to use the key. For instance, your key could be “Department” with a value of “Finance”, or you could have a key of just “Finance”.
There are 4 general tag categories, as laid out in the best practices from AWS:
- Technical – This often includes things like the application that is running on the resource, what cluster it belongs to, or which environment it’s running in (such as “dev” or “staging”).
- Automation – These tags are read by automated software, and can include things like dates for when to decommission the resource, a flag for opting in or out of a service, or what version of a script or package to install.
- Business and billing – Companies with lots of resources need to track which department or user owns a resource for billing purposes, which customer an instance is serving, or some sort of tracking ID or internal asset management tag.
- Security – Tags can help with compliance and information security, as well as with access controls for users and roles who may be listing and accessing resources.
In general, more tags are better, even if you aren’t actively using those tags just yet. Planning ahead for ways you might search through or group instances and resources can help save headaches down the line. You should also ensure that you standardize your tags by being consistent with the capitalization/spelling and limiting the scope of both the keys and the values for those keys. Using management and provisioning tools like Terraform or Ansible can automate and maintain your tagging standards.
Once you’ve got your tagging system implemented and your resources labelled properly, you can really dive into your cloud automation strategy. Many different automation tools can read these tags and utilize them, but here are a few ideas to help make your life better:
- Configuration Management – Tools like Chef, Puppet, Ansible, and Salt are often used for installing and configuring systems once they are provisioned. This can determine which settings to change or configuration bundles to run on the instances.
- Cost Control – this is the automation area we focus on at ParkMyCloud – our platform’s automated policies can read the tags on servers, scale groups, and databases to determine which schedule to apply and which team to assign the resource to, among other actions.
- CI/CD – If your build tool (like Jenkins or Bamboo) is set to provision or utilize cloud resources for the build or deployment, you can use tags for the build number or code repository to help with the continuous integration or continuous delivery.
- Cloud Account Clean-up – Scripts and tools that help keep your account tidy can use tags that set an end date for the resource as a way to ensure that only necessary systems are around long-term. You can also take steps to automatically shut down or terminate instances that aren’t properly tagged, so you know your resources won’t be orphaned.
Conclusion: Tagging Will Improve Your Cloud Automation
As your cloud use grows, implementing cloud automation will be a crucial piece of your infrastructure management. Utilizing tags not only helps with human sorting and searching, but also with automated tasks and scripts. If you’re not already tagging your systems, having a strategy on the tagging and the automation can save you both time and money.
Over the past year or so, we have spoken with quite a few prospective users who have defined their responsibilities as site reliability engineering (SRE). If, like me, you’re not familiar with the term, I’ll save you the Google search. SRE is a discipline that incorporates aspects of software engineering and applies that to IT operations problems. Practitioners aim to create ultra-scalable and highly reliable software systems. According to Ben Treynor, founder of Google’s Site Reliability Team, SRE is “what happens when a software engineer is tasked with what used to be called operations.” And its origins can also be traced back to 2003 and Google when Ben was hired to lead software engineers to run a production environment.
The site reliability engineering footprint at Google is now larger than 1,500 engineers. Many products have small to medium sized SRE teams supporting them, though not all products do. The SRE processes that have been honed over the years are being used by other, mainly large scale, companies that are also starting to implement this paradigm, including ServiceNow, Microsoft, Apple, Twitter, Facebook, Dropbox, Amazon, Target, IBM, Xero, Oracle, Zalando, Acquia, and GitHub.
The people we talk to on a daily basis are typically charged with operational management of their company’s cloud infrastructure, and thus governing and controlling costs (that’s where we come in). I got to wondering, how is this approached different by, say, a site reliability engineer vs. someone who labels himself as “DevOps”?
How Does Site Reliability Engineering Compare to DevOps?
In simple terms, the difference between SREs and DevOps seems clear based on our conversations with folks. SREs are engineers focused on production environments, while DevOps is a philosophy as well as a role. DevOps folks are definitely less concerned with production vs. non-production, and more concerned with the overall cloud management and operations. Side note, DevOps was coined around 2008, so a SRE actually predates a DevOps engineer.
A site reliability engineer (SRE) will spend up to 50% of their time doing “ops” related work such as issues, on-call, and manual intervention. Since the software system that an SRE oversees is expected to be highly automatic and self-healing, the SRE should spend the other 50% of their time on development tasks such as new features, scaling or automation. The ideal SRE candidate is a highly skilled system administrator with knowledge of code and automation.
When I first encountered it, site reliability engineering just seemed like another buzzword to replace “IT” or “Ops”. As I read more on it, I understand that it’s more about the people and the process and less about the technology. There is rarely a mention of the underlying infrastructure or tools, and it seems like the main requirement is just the desire to improve. With that, you can align your development and operations (funny, right – DevOps) around the discipline of SRE.
Should Your Company Implement a Site Reliability Engineering Approach?
So while all the hype is around implementing DevOps in your organization, should you really be adopting the idea of site reliability engineering? It certainly makes sense based on the name alone, as “site reliability” is synonymous with “business availability” in our modern internet-connected culture. Any downtime for your service or application means lost revenue and dissatisfied customers, which means the business takes a hit. Using site reliability engineering to keep things running smoothly, while employing DevOps principles to improve those smooth-running processes, seems to be the best combination to really empower your company.