Alibaba Cloud is growing at an amazing rate, recently claiming to have overtaken both Google and IBM as the #3 public cloud provider globally, and certainly the #1 provider in China. Many sites and services hosted outside China are accessible from within China, but can suffer high latency and potentially lost functionality if their web interface requires interaction with blocked social media systems. As such, it is no surprise that a number of our (non-Chinese) customers have expressed interest in actually running virtual machine Alibaba instances in China. In this blog we are going to outline the process…and give an alternate plan.
General Process to Run Alibaba Instances in China
The steps to roll-out a deployment on Alibaba in mainland China are relatively clear:
Establish a “legal commercial entity” in Mainland China.
Select what services you want to run on Alibaba Cloud
Apply for Internet Content Provider (ICP) certification
The first three steps are described in more detail below.
Establish a Legal Commercial Entity
Or putting it another way – you need to have an office in China. This can range from an actual office with your own employees, to a Joint Venture, which is a legal LLC between your organization and an established Chinese company. If your service is more informational in nature and is not actually selling anything via the service, then this can be relatively easy, taking only a couple weeks (at least for the legal side), though you will still need to find a Joint Venture partner and make the deal worth their while financially. For commerce or trade-related services, the complexity, time requirements, and costs start going up significantly.
What to run on Alibaba Cloud
There is a decision-point here, as there is one set of rules for Alibaba-hosted web/app servers, and additional rules for everything else. Base virtual machines, databases and other such core IT building blocks require the ICP registration described below, plus “real-name registration”, where a passport is needed to actually confirm the identity of whomever is purchasing the resource. If all you need is a web server, then you can skip this step. In either case, some of the filing requirements involve having a server and/or DNS record prepared in order to complete the later steps. A web site does not need to be completely finished until launch, but a placeholder may be needed.
Internet Content Provider (ICP) certification
There are two flavors of ICP certification:
A “simple” ICP Filing – which is the bare minimum needed for informational websites that are not directly generating revenue.
ICP Commercial Filing – This starts with getting an approved ICP Filing, and then also includes a Commercial License that must be obtained a province/municipality in China. In some cases, this appears to be related to which Alibaba region you are using, and even the physical location of your public IP address.
Many references recommend finding an experienced consultant to guide you through these processes, and it is easy to see why!
OK…WAY too much work. What is Plan B?
The other way to run Alibaba instances in China is to host your site or services in Hong Kong. All of the rules described above apply to “Mainland China”, which does not include Hong Kong. Taiwan is also not included in Mainland China, but Hong Kong has the advantage of being better connected to the rest of China. If the main problem you are trying to solve is to reduce latency to your site for China-based customers, Hong Kong is the closest you can get without actually being there, and Alibaba appears to do a pretty good job optimizing the Hong Kong experience. No local office or legal filings required!
Once you are all set up: Optimize your Costs!
After your instances are set up, make sure you’re optimizing Alibaba costs. Our Mainland China-based customers using Alibaba have confirmed that ParkMyCloud is able to access the Alibaba APIs from our US-based servers – so you can go ahead and try it out.
Since the beginning of public cloud, users have been attempting to improve cloud automation. This can be driven by laziness, scale, organizational mandate, or some combination of those. Since the rise of DevOps practices and principles, this “automate everything” approach has become even more popular, as it’s one of the main pillars of DevOps. One of the ways you can help sort, filter, and automate your cloud environment is to utilize tags on your cloud resources.
In the cloud infrastructure world, tags are labels or identifiers that are attached to your instances. This is a way for you to provide custom metadata to accompany the existing metadata, such as instance family and size, region, VPC, IP information, and more. Tags are created as key/value pairs, although the value is optional if you just want to use the key. For instance, your key could be “Department” with a value of “Finance”, or you could have a key of just “Finance”.
Technical – This often includes things like the application that is running on the resource, what cluster it belongs to, or which environment it’s running in (such as “dev” or “staging”).
Automation – These tags are read by automated software, and can include things like dates for when to decommission the resource, a flag for opting in or out of a service, or what version of a script or package to install.
Business and billing – Companies with lots of resources need to track which department or user owns a resource for billing purposes, which customer an instance is serving, or some sort of tracking ID or internal asset management tag.
Security – Tags can help with compliance and information security, as well as with access controls for users and roles who may be listing and accessing resources.
In general, more tags are better, even if you aren’t actively using those tags just yet. Planning ahead for ways you might search through or group instances and resources can help save headaches down the line. You should also ensure that you standardize your tags by being consistent with the capitalization/spelling and limiting the scope of both the keys and the values for those keys. Using management and provisioning tools like Terraform or Ansible can automate and maintain your tagging standards.
Once you’ve got your tagging system implemented and your resources labeled properly, you can really dive into your cloud automation strategy. Many different automation tools can read these tags and utilize them, but here are a few ideas to help make your life better:
Configuration Management – Tools like Chef, Puppet, Ansible, and Salt are often used for installing and configuring systems once they are provisioned. This can determine which settings to change or configuration bundles to run on the instances.
Cost Control – this is the automation area we focus on at ParkMyCloud – our platform’s automated policies can read the tags on servers, scale groups, and databases to determine which schedule to apply and which team to assign the resource to, among other actions.
CI/CD – If your build tool (like Jenkins or Bamboo) is set to provision or utilize cloud resources for the build or deployment, you can use tags for the build number or code repository to help with the continuous integration or continuous delivery.
Cloud Account Clean-up – Scripts and tools that help keep your account tidy can use tags that set an end date for the resource as a way to ensure that only necessary systems are around long-term. You can also take steps to automatically shut down or terminate instances that aren’t properly tagged, so you know your resources won’t be orphaned.
Conclusion: Tagging Will Improve Your Cloud Automation
As your cloud use grows, implementing cloud automation will be a crucial piece of your infrastructure management. Utilizing tags not only helps with human sorting and searching, but also with automated tasks and scripts. If you’re not already tagging your systems, having a strategy on the tagging and the automation can save you both time and money.
Over the past year or so, we have spoken with quite a few prospective users who have defined their responsibilities as site reliability engineering (SRE). If, like me, you’re not familiar with the term, I’ll save you the Google search. SRE is a discipline that incorporates aspects of software engineering and applies that to IT operations problems. Practitioners aim to create ultra-scalable and highly reliable software systems. According to Ben Treynor, founder of Google’s Site Reliability Team, SRE is “what happens when a software engineer is tasked with what used to be called operations.” And its origins can also be traced back to 2003 and Google when Ben was hired to lead software engineers to run a production environment.
The site reliability engineering footprint at Google is now larger than 1,500 engineers. Many products have small to medium sized SRE teams supporting them, though not all products do. The SRE processes that have been honed over the years are being used by other, mainly large scale, companies that are also starting to implement this paradigm, including ServiceNow, Microsoft, Apple, Twitter, Facebook, Dropbox, Amazon, Target, IBM, Xero, Oracle, Zalando, Acquia, and GitHub.
The people we talk to on a daily basis are typically charged with operational management of their company’s cloud infrastructure, and thus governing and controlling costs (that’s where we come in). I got to wondering, how is this approached differently by, say, a site reliability engineer vs. someone who labels himself as “DevOps”?
How Does Site Reliability Engineering Compare to DevOps?
In simple terms, the difference between SREs and DevOps seems clear based on our conversations with folks. SREs are engineers focused on production environments, while DevOps is a philosophy as well as a role. DevOps folks are definitely less concerned with production vs. non-production, and more concerned with the overall cloud management and operations. Side note, DevOps was coined around 2008, so an SRE actually predates a DevOps engineer.
A site reliability engineer (SRE) will spend up to 50% of their time doing “ops” related work such as issues, on-call, and manual intervention. Since the software system that an SRE oversees is expected to be highly automatic and self-healing, the SRE should spend the other 50% of their time on development tasks such as new features, scaling or automation. The ideal SRE candidate is a highly skilled system administrator with knowledge of code and automation.
When I first encountered it, site reliability engineering just seemed like another buzzword to replace “IT” or “Ops”. As I read more on it, I understand that it’s more about the people and the process and less about the technology. There is rarely a mention of the underlying infrastructure or tools, and it seems like the main requirement is just the desire to improve. With that, you can align your development and operations (funny, right – DevOps) around the discipline of SRE.
Should Your Company Implement a Site Reliability Engineering Approach?
So while all the hype is around implementing DevOps in your organization, should you really be adopting the idea of site reliability engineering? It certainly makes sense based on the name alone, as “site reliability” is synonymous with “business availability” in our modern internet-connected culture. Any downtime for your service or application means lost revenue and dissatisfied customers, which means the business takes a hit. Using site reliability engineering to keep things running smoothly, while employing DevOps principles to improve those smooth-running processes, seems to be the best combination to really empower your company.
Implementing DevOps practices in small organizations seems like standard practice, but what if you’re trying to utilize DevOps in large organizations? Trying to modernize workflows can be a challenge for any company, but there are different challenges, risks, and benefits for bigger companies. Let’s take a look at how enterprises might approach a DevOps transformation through a few of the core tenants of DevOps.
There are a few different forms of feedback that come with DevOps: automated feedback about specific code (typically through unit and integration testing software), personal feedback from other team members, consumer feedback from customers using your product, and cross-team feedback throughout the organization. Startups and small companies may find it easier to have open lines of communication between individual team members as well as across teams.
Large organizations will need to make a conscious effort to keep team communication open, On the other hand, they will have more resources available (both money and employees) to field customer and in-house feedback about individual services or larger products. They may also be able to better purchase and implement automated testing and CI/CD tools, which leads to…
One of the biggest tech benefits to a DevOps approach is automating away the manual tasks that bog down critical projects. Large organizations often have the time, money, and people to set up automated tools, like CI/CD pipelines, unit and integration test suites, and config management systems. The biggest challenge in the enterprise world is trying to make everyone happy.
One approach is to standardize on a single tool for each purpose, such as Jenkins or Chef. This can enable your IT staff to specialize in those tools, but may make some users unhappy with being forced into a tool they may not prefer. The alternative is to allow each team or business unit to use their own preferred software, but this can turn into a “toolset hell” with a mashup of every combination of applications within your organization. Each approach has its pros and cons, and often comes down to a management decision.
Having individual teams that handle their part of the puzzle and nothing else is the biggest hurdle that enterprises face when trying to apply DevOps principles. The combination of ‘dev’ and ‘ops’ (and other disciplines, like ‘sec’ and ‘fin’) is naturally split out in a large organization, so recombining them can be a huge undertaking. Then again, that gap is exactly the problem the DevOps approach seeks to solve.
Some companies solve this by having a separate team that handles the cross-team support and communication. Other companies break down these silos by enabling employees to seamlessly migrate between teams depending on the project or application. The more “devopsy” method is to utilize ChatOps and centralized documentation repositories for open communication and collaboration, which can help break down unify the distinct teams.
The idea of holistic thinking tends to come easier to larger organizations, as successful enterprises typically have a system in place for “big picture” thinking, either through a management or product team, or through a cross-functional committee. That said, communication of this vision down to the employees, along with communication up to that management team, is crucial for enabling outside-the-box thinking to get past any roadblocks and hurdles that are in the way of creating and deploying the end product. Sometimes, the hardest part is convincing programmers that not everything needs to be solved with code!
DevOps in Large Organizations: Challenging but Rewarding
Some folks think that DevOps only applies to startups and small companies, but we’re seeing more and more teams benefit from implementing DevOps in large organizations. The benefits of the above DevOps principles are numerous, but frequently come with a different set of challenges based on your organizational size. Once you are aware of those challenges and have a plan to overcome them, you can start to transform your enterprise to a DevOps shop.
Sometimes we ask potential customers what their top ParkMyCloud alternative is. Usually, they don’t have one, but sometimes, they’re considering scripting their own on/off solution instead.
It makes sense: at a glance at the problem of scheduling cloud resources, it’s easy to say, “my team can write a scheduler.” However, there are more factors than you may have considered – including cost optimization over a variety of resources, maintenance time, visibility and reporting, opportunity cost, and more.
11 Things to Include in Your Scripts – Besides Scheduling
While you may be able to write scripts to turn resources on and off on a schedule, there are a number of associated functionalities that would be more difficult and time consuming:
Multi-account/user – scripting typically doesn’t support multi-cloud/multi-user/multi-account access, and it is difficult to support existing team structures and ensure appropriate controls
Schedule override – difficult to let users override schedules when they need to access them while scheduled
Logical Groups – hard to find a way to let users group resources and start/stop sequentially
Scale group parking – must develop means to create a single view and the ability to manage and start/stop scale groups
On-demand access – must develop a process to enable on-demand access to stopped instances in off hours
Visibility – need to develop custom application to determine cost savings based upon application of automation or removal of schedules (to date we have not encountered anyone who has developed such an application)
Reporting – not only do cost savings need to be tracked, they need to be reportable via ad hoc utilization, savings, and scheduling reports over arbitrary date ranges
Policies – difficult to build custom policies regarding the scheduling of instances like “Never Park” or “Snooze Only”
Standardization – difficult to ensure consistency and standardization of automation approach across entire organization unless highly centralized
Easy-to-use UI for non-developers – no easy way to create a UI that allows you to devolve management of cloud resources to non-technical teams who may not be familiar with the cloud provider console
If you’re interested in automating on/off times for your cloud resources, then you’re probably interested in optimizing costs. So don’t lose sight of the cost behind “building” – the man-hours and opportunity cost. After all, every time you have your team working on creating solutions for side projects, you distract them from your core business activities.
And it will take more time than you think. In addition to the functionality listed above, consider the following maintenance tasks:
Must keep up-to-date on changes to public cloud APIs
Must keep up-to-date on change/updates to public cloud services
When your business’s desired policies, schedules, or behavior change, must update and test
Is Scripting a Viable ParkMyCloud Alternative?
Of course, it’s up to you to determine whether scripting is a worthwhile ParkMyCloud alternative for your business. We’d say, it’s not worth the cost and sacrifice of value. Besides, ParkMyCloud users save an average of $12 on their cloud bills per dollar spent on the product – that’s an ROI that will keep your finance team happy. And that’s just the paid versions. If it’s still hard for you to justify, then use ParkMyCloud’s free tier – with no cost, there’s no reason to waste your time scripting.
The latest time-saving automation to add to your DevOps tool belt: ChatOps cloud cost control. That’s right – you may already be using ChatOps to make your life easier, but did you know that amongst the advantages, you can also use it to control your cloud resources?
Whatever communication platform you’re already using for chatting with your team members, you can use for chatting with your applications and services. And with the increasing rise of ChatOps, that brings us to one of the questions we’ve been getting asked more frequently by our DevOps users: how can I manage schedules and instances from Slack, Microsoft Teams, Atlassian Stride, and other chat programs?
One of the cool things you can do using ChatOps is control your cloud resources through ParkMyCloud. Learn how it’s done in this quick YouTube demo:
ParkMyCloud has the ability to send messages to chat rooms via notifications and receive commands from chat bots via the API. This video details the Slackbot specifically, but similar bots can be used with Microsoft Teams or Atlassian Stride. There are multiple settings you can configure within Slack to manage your account, including notifications to let you know when a schedule is shutting an instance down. You can also set up the ability to override a schedule and turn the system on from Slack. Watch the video for a brief overview of how to:
Set up a notification that uses the Slack type
Adjust settings to be notified of user actions, parking actions, policy actions, and more
Set up the ParkMyCloud Slackbot to respond to notifications
Once you set up Slack with ParkMyCloud, you’ll be able to do anything you normally would in the UI or API, including snooze and toggle instances to override their schedules, receive notifications and be able to control your account directly from your Slack chat room. The Slackbot is available on our GitHub. Give it a try, and enjoy full ChatOps control of your cloud costs!