SysAdmin vs. DevOps? IT Operations Management vs. Cloud Operations Management? Unless your head has been under a rock, you’re probably aware that the cloud has been rapidly reshaping and redefining IT as we know it — from the language we use to describe it to the management models and infrastructure itself.
Cloud providers like Amazon Web Services, Google Cloud Platform, and Microsoft Azure have transformed cloud computing, giving businesses access to IT resources anytime, anywhere. At the same time, this rapid migration to off-premise cloud has been reshaping the needs and roles in the IT department.
Here are 4 ways that the cloud is redefining IT roles and operations:
Sysadmin vs. DevOps
When you compare sysadmin vs. DevOps, you’ll find that they’re similar roles, but uniquely distinct. A System Administrator, or sysadmin, is the person responsible for configuring, operating, and maintaining computer systems – servers in particular. This jack-of-all-trades IT role handles everything from installations and upgrades to security, troubleshooting, technical support and more.
And then we have the evolution of DevOps, which could very well be the biggest gamechanger to the IT process. Under the DevOps umbrella, a team of software developers, IT operations, and product management people must combine strengths to effectively streamline and stabilize operations for rolling out new apps and updating code to support and improve the whole business.
With the cloud taking over and without the need for physical, on-prem servers, a large portion of the sysadmin role has become lost to automation. As this change was occurring, sysadmins remained effective as their role shifted towards the support of developers, combining efforts and thus giving birth to the term DevOps. So can you truly compare sysadmin vs. DevOps? Well, the roles are similar in the sense that sysadmins can do a lot of what DevOps guys do, but not the other way around, making DevOps the newer, bigger jack of all trades.
IT Operations Management vs. Cloud Operations Management
IT Operations Management is responsible for the efficiency and performance of IT processes, which can include anything from administrative processes to hardware and software support, and for both internal and external clients. IT management sets the standard policies and procedures for how service and support is carried out and how issues are resolved.
Thanks to the cloud, IT management has also given way to automation and outsourcing. Cloud operational processes are now a more efficient way of using resources, providing services, and meeting compliance requirements. In the same way that ITOP manages IT processes, Cloud Operations Management is doing so in a cloud environment with resource capacity planning and cloud analytics that provide vital intelligence into how to control resources and run them cost effectively (speaking of, check out our recent partnership aimed at making this easier for you).
IT Service Management vs. Cloud Service Management
Traditional IT service management (ITSM) dealt with strategizing in the design, delivery, management and innovation of the way an organization is using IT. This involved developing, implementing, and monitoring IT governance and management through the use of frameworks like COBIT, Microsoft Operations Framework, Six Sigma, and ITIL, for example.
As the cloud became a better option for operational management, companies have turned to cloud computing to transform their business model via service providers like Amazon Web Services, Google Cloud Platform, and Microsoft Azure to outsource IT for more efficient, scalable cloud services.
Since cloud computing resources are hosted as off-site VMs and managed externally, ITSM has grown more complex, introducing Cloud Service Management (CSM) as an extension of ITSM, pushing in three core areas: automated service provisioning, DevOps, and asset management. And as ITSM shifts towards CSM, the concerns lie in cloud adoption strategy and the approach for designing, deploying, and running of cloud services.
From Finance and Operations vs. DevFinOps
In a world where IT projects are known to exceed budgets and coming up with cost estimates is no easy feat, how can businesses break down a reasonable overall estimate for projects where we develop, build and run applications on a utility? The answer is to make estimates little by little as parts of the work get completed, integrating financial planning directly into IT development operations. In other words: DevFinOps.
IT asset management merges the financial, contractual, and inventory components of an IT project to support life cycle management and strategic decision making. The strategy involves both software and hardware inventory and the decision making process for purchases and redistribution. DevFinOps expands and builds upon ITAM by fixing financial cost and value of IT assets directly into IT infrastructure, updating calculations in real time and simplifying the budgeting process.
What This Means For You
Cell phones, self-driving cars, DevOps — cloud computing is yet another evolution in technology, albeit a huge one, and IT is simply going through a metamorphosis. The best way of looking it at is that cloud is not killing IT, it’s redefining IT, and enterprises are following suit as they shift towards the cloud and change or update traditional IT roles. As IT evolves, the cloud is paving the way for opportunities for those who adapt and evolve their roles with it.
As cloud becomes more mature, the need for cloud operations management becomes more pervasive. In my world, it seems pretty much like IT Operations Management (ITOM) from decades ago. In the way-back machine I used to work at Micromuse, the Netcool company, which was acquired by IBM Tivoli, the Smarter Planet company, which then turned Netcool into Smarter Cloud … well you get the drift. Here we are 10+ years later, and IT = Cloud (and maybe chuck in some Watson).
Cloud operations management is the process concerned with designing, overseeing, controlling, and subsequently redesigning cloud operational processes. This involves management of both hardware and software as well as network infrastructures to promote an efficient and lean cloud.
Analytics is heavily involved in cloud operations management and used to maximize visibility of the cloud environment, which gives the organization the intelligence required to control the resources and running services confidently and cost-effectively.
Cloud operations management can:
Improve efficiency and minimize the risk of disruption
Deliver the speed and quality that users expect and demand
Reduce the cost of delivering cloud services and justify your investments
Since ParkMyCloud helps enterprises control cloud costs, we mostly talk to customers about the part of cloud operations concerned with running and managing resources. We are all about that third bullet – reducing the cost of delivering cloud services and justifying investments. We strive to accomplish that while also helping with the first two bullets to really maximize the value the cloud brings to an enterprise.
So what’s really cool is when we get to ask people what tools they are using to deploy, secure, govern, automate and manage their public cloud infrastructure, as those are the tools that they want us to integrate into as part of their cost optimization efforts, and we need to understand the roles operation folks now play in public cloud (CloudOps).
And, no it’s not easier to manage cloud. In fact I would say it’s harder. The cloud provides numerous benefits – agility, time to market, OpEx vs. CapEx, etc. – but you still have to automate, manage and optimize all those resources. The pace of change is mind boggling – AWS advertises 150+ services now, from basic compute to AI, and everything in between.
So who are these people responsible for cloud operations management? Their titles tend to be DevOps, CloudOps, IT Ops and Infrastructure-focused, and they are tasked with operationalizing their cloud infrastructure while teams of developers, testers, stagers, and the like are constantly building apps in the cloud and leveraging a bottoms-up tools approach. Ten years ago, people could not just stand up a stack in their office and have at it, but they sure as hell can now.
So what does this look like in the cloud? I think KPMG did a pretty good job with this graphic and generally hits on the functional buckets we see people stick tools into for cloud operations management.
So how should you approach your cloud operations management journey? Let’s revisit the goals from above.
Efficiency – Automation is the name of the game. Narrow in on the tools that provide automation to free up your team’s development time.
Deliverability – See the bullet above. When your team has time, they can focus on delivering the best possible product to your customers.
Cost control – Think of “continuous cost control” as a companion to continuous integration and continuous delivery. This area, too, can benefit from automated tools – learn more about continuous cost control.
We recently announced that ParkMyCloud and CloudHealth Technologies have joined forces, merging cloud cost optimization with hybrid cloud governance and bringing you the best of both worlds. To demonstrate the value of this partnership, we held a joint webinar last week to discuss the customer case study of Connotate – a business that specializes in providing web data extraction solutions, and was able to optimize their cloud environment and maximize their ROI thanks to ParkMyCloud and CloudHealth, together.
The webinar panel included Chris Parlette – Director of Cloud Solutions at ParkMyCloud, JP Nahmias – Director of Product Development at CloudHealth, and Andrew Dawson – Solutions Architect at CloudHealth and the person who worked directly with Reed Savory – Director of IT at Connotate, to help in adopting the two platforms. You can replay the entire webinar, or if you prefer, check out the recap below:
CloudHealth Technologies: Leader in Cloud Management & Hybrid Cloud Governance
CloudHealth Technologies manages $4.1 billion worth of spend in cloud management – just above a quarter of Amazon’s total in AWS. CloudHealth boasts a robust infrastructure, incurring about $100 million in monthly RI purchases from their clients.
CloudHealth works by collecting data from various different cloud providers (AWS, Azure, Google) and consolidating the data into various different tables, taking the information and evaluating your cloud environment based on the different metrics you’re interested in, such as cost, usage, optimization, etc. CloudHealth reports metrics directly to you, but also takes a step further by optimizing your infrastructure, making recommendations, and providing hybrid cloud governance. As the customer takes the recommended actions, CloudHealth automates the environment for them.
ParkMyCloud: Leader in Cloud Cost Optimization
ParkMyCloud was created to help cloud users realize cost savings in a world of tools that give cloud users visibility into their environment without helping them take action.
ParkMyCloud automates cloud cost savings by integrating into DevOps processes. The need for such a tool came to fruition after enterprises migrated to the cloud thinking it was supposed to be cheaper, but when their bill came at the end of the month, they noticed that something wasn’t right. That something is called cloud waste.
The Cloud Waste Problem
Always on means always paying. Cloud services are like any other public utility, you pay for what you use. If you leave them running, you continue paying whether you’re actually using them or not.
44% of workloads are classified as non-production (i.e. test, development, staging, etc.) and don’t need to run 24/7
Over-provisioning.Are you using more than you need with oversized resources?
55% or all public cloud resources are not correctly sized for their workloads.
15% of spend on paying resources which are no longer used.
Not only does ParkMyCloud cut your cloud costs and eliminate waste, we make it easy through automation. Some of the ways we automate the process include:
Visibility and control across multiple clouds(AWS, Azure, and Google), accounts, and regions in a single UI
User governance – RBAC and SSO for multi-tenant user control and enterprise security
DevOps Integration – Policy engine, REST API, and Slack integration to automate continued cost control
Actionable cost control – policy driven, automated cost control for compute and database resources
ParkMyCloud Integrates Cost Control into CloudOps
ParkMyCloud also integrates with DevOps tools and into various DevOps tool kits, including:
Single sign-on – including Okta, Ping, ADSS, Centrify, and more, which as you go further into identity management, is a requirement for quite a few platforms.
DevOps & CI/CD tools – such as CHEF, Puppet, and Atlassian Bamboo
Chat & notification – notifications through chat services like Slack and HipChat, or via email.
IT Service Management – integrates with ITSM tools to provide a one-stop shop for costs and savings information
Monitoring & Logging – pushing to monitoring tools, like Splunk or DataDog, but also reading information from those tools.
Connotate: A ParkMyCloud & CloudHealth Success Story
Connotate is a provider of web data extraction solutions. They make the internet a database for customers to use and ingest, harvesting data for things like price comparisons and financial analysis. Connotate differentiates from competitors because of its ease of use; anyone can easily go in and highlight a web page in a browser to extract data, and they’re able to extract from both static and dynamic web content.
Connotate had a legacy deployment in AWS along with a ton of data centers all over the U.S. Last year they decided to shut down all those data centers and go from a small AWS footprint to moving everything to Amazon. With thousands of VMs and physical servers that needed to migrate to the cloud, Connotate quickly realized that planning the migration themselves was taking extensive work and planning.
Connotate needed something to help smooth over their migration to the cloud, a tool that would give them financial visibility and predictability to model their cloud costs with. They used CloudHealth’s migration assessment to run analysis against all their data center workloads, which suggested what AWS instance to use and predicted the cost to run it, giving transparency for how to run migration before it actually happens.
Migrating to the cloud with confidence
A few months after moving their infrastructure to the cloud, Connotate found that CloudHealth’s predicted numbers for cloud costs were within $400 of their actual spend after the migration. After seeing the results, they continued doing all of their Amazon cloud monitoring through CloudHealth.
Saving money automatically in AWS
To help control cloud costs, Connotate turned to ParkMyCloud. One of their business models involves data harvesting for customers that don’t want to do it themselves, and to do this, they use hundred virtual machines through Amazon for data retrieval and sending data back to customers. Essentially they were spinning up servers, harvesting the data, and then spinning them back down. Virtual machines were also being used by the sales team and other non-tech people for the purpose of running a demo. The servers were left running 24/7, resulting in waste for the organization. Connotate needed a tool for the scheduling servers that was user-friendly enough for non-technical people, but technical enough to be a tool that could also be used efficiently for the DevOps team in the organization.
ParkMyCloud checked all the boxes they needed to turn off servers after they were spun up, and the simple web UI really helped the non-tech people know how and when to use it, as well as not be afraid to use it. ParkMyCloud is now a big portion of Connotate’s business for cost control, and one of the major methods they use today for containing cloud sprawl.
ParkMyCloud & CloudHealth: Better Together for Efficient Cloud Management
Now that Connotate has fully migrated to Amazon, they’re making common use of CloudHealth’s rightsizing capabilities and using rightsizing reporting to understand how to better utilize their servers, as well as ParkMyCloud to park unnecessary test servers that shouldn’t be left on 24/7. Using both tools in tandem results in more efficiency, cost control, and transparency into the entire cloud and on-premise environment – a win for all!
A technical integration is in the works, which will encompass:
Importing cost and savings data from ParkMyCloud into CloudHealth, so you can see what you’ve been doing in ParkMyCloud within CloudHealth
Using CloudHealth recommendations to trigger parking actions in ParkMyCloud
And more, based on customer feedback
With ParkMyCloud and CloudHealth together, users can maximize hybrid cloud governance, cost visibility, cost savings, and make their CFOs happy. For a live demo of ParkMyCloud and more information about the integration – watch the entire webinar.
If you are using AWS EC2 in production, chances are good that you’re using the AWS M instance type. The M family is a “General Purpose” instance type in AWS, most closely matching a typical off-the-shelf server one would buy from Dell, HP, etc, and was the first instance family released by AWS in 2006.
If you are looking for mnemonics for an AWS certification exam, you may want to think of the M instance type as the Main choice, or the happy Medium between the more specialized instances. The M instance provides a good balance of CPU, RAM, and disk size/performance. The other instance types specialize in different ways, providing above average CPU, RAM, or disk size/performance, and include a price premium. The one exception is the “T” instance type, discussed further below.
For a normal web or application server workload, the M instance type is probably the best tool for the job. Unless you KNOW you are going to be running a highly RAM/CPU/IO-intensive workload, you can usually start with an M instance, monitor its performance for a while, and then if the instance is performance-limited by one of the hardware characteristics, switch over to a more specialized instance to remove the constraint. For example:
“C” instances for Compute/CPU performance.
“R” or “X” instances for lots of memory – RAM or eXtreme RAM
“D”, “H”, or “I” instances optimize for storage with different types/quantities of local storage drives (i.e., HDD or SDD that are part of the physical hardware the instance is running on) for high-Density storage (up to 48TB), High sequential throughput, or fast random I/O IOPS, respectively. (The latter two categories are much more specialized – see here for more details)
The “T” instance family is much like the “M” family, in that it is aimed at general purpose workloads, but at a lower price point. The key difference (and perhaps the only difference) is that the CPU performance is restricted to bursts of high performance (or “bursTs”) that are tracked by AWS through a system of CPU credits. Credits build up when the system is idle, and are consumed when the CPU load exceeds a certain baseline. When the CPU credit balance is used up, the CPU is Throttled to a fraction of its full speed. T instances are good for low-load web servers and non-production systems, such as those used by developers or testers, where continuous predictable high performance is not needed.
Looking at some statistics, the Botmetric Public Cloud Usage Report for 2017 states that 46% of AWS EC2 usage is on the M family, and 83% of non-production workloads are on T instances. Within the ParkMyCloud environment, we see the following top instance family statistics across our customers’ environments:
I instances: 39%
M instances: 22%
T instances: 27%
Since many of our customers are focused on cost optimization for non-production cloud resources (i.e., a lot of developers and test environments), we are probably seeing more “T” instances than “M” instances as they are less expensive, and the “bursty” nature of T instances is not a factor in their work. For a production workload, M instances with dedicated CPU resources are more predictable. While we cannot say for sure why we are also seeing a very large number of “I” instances, it is quite possible that developers/testers are running database software in an EC2 instance, rather than in RDS, in order to have more direct control and visibility into the database system. Still, 49% of the resources are in the General Purpose M and T families.
The Nitty and/or Gritty
Assuming you have decided that an M instance is the right tool for your job, your next choice will be to decide which one. As of the date of this blog, there are twelve different instance types within the M family, covering two generations of systems.
Table 1 – The M Instance Family Specs (Pricing per hour for on-demand instances in US-East-1 Region)
The M4 generation was released in June 2015. The M4 runs 64-bit operating systems on hardware with the 2.3 GHz Intel Xeon E5-2686 (Broadwell) or 2.4 GHz Intel Xeon E5-2676 H3 (Haswell) processors, potentially jumping to 3GHz with Turbo Boost. None of the M4 instance family supports instance store disks, but are all EBS-optimized by default. These instances also support Enhanced Networking, a no-extra-cost option that allows up to 10 Gbps of network bandwidth.
The M5 generation was just released this past November at re:Invent 2017. The M5 generation is based on custom Intel Xeon Platinum 8175M processors running at 2.5GHz. When communicating with other systems in a Cluster Placement Group (a grouping of instances in a single Availability Zone), the m5.24xlarge instance can support an amazing 25 Gbps of network bandwidth. The M5 type also support EBS via an NVMe driver, a block storage interface designed for flash memory. Interestingly, AWS has not jacked-up the EBS performance guarantee for this faster EBS interface. This may be because it is the customer’s responsibility to install the right driver to get the higher performance on older OS images, so this could also be a cheap/free performance win if you can migrate to M5.
Amazon states that the M5 generation delivers 14% better price/performance on a per-core basis than the M4 generation. In the pricing above, one can do the math and find that all of the M5 instances cost $0.048 per vCPU per hour, and that the M4 instances all cost $0.05 per vCPU per hour. So right out of the box, the M5 is priced 4% cheaper than an equivalently configured M4. Do the same math for RAM vs vCPU and you can see that AWS allocates 4GB of RAM per vCPU in both the M4 and M5 generations. This probably says a lot about how the underlying hardware is sliced/diced for virtual machines in the AWS data centers.
Beware the sticker shock – cloud services pricing is nothing close to simple, especially as you come to terms with the dollar amount on your monthly cloud bill. While cloud service providers like AWS, Azure, and Google were meant to provide compute resources to save enterprises money on their infrastructure, cloud services pricing is complicated, messy, and difficult to understand. Here are 7 ways that cloud providers obscure pricing on your monthly bill:
1 – They use varying terminology
For the purpose of this post, we’ll focus on the three biggest cloud service providers: AWS, Azure, and Google. Between these three cloud providers alone, different analogies are used for just about every component of services offered.
For example, when you think of a virtual machine (VM), that’s what AWS calls an “instance,” Azure calls a “virtual machine,” and Google calls a “virtual machine instance.” If you have a group of these different machines, or instances, in Amazon and Google they’re called “auto-scaling” groups, whereas in Azure they’re called “scale sets.” There’s also different terminology for their pricing models. AWS offers on-demand instances, Azure calls it “pay as you go,” and Google refers to it as “sustained use.” You’ve also got “reserved instances” in AWS, “reserved VM instances” in Azure, and “committed use” in Google. And you have spot instances in AWS, which are the same as low-priority VMs in Azure, and preemptible instances in Google.
2 – There’s a multitude of variables
Operating systems, compute, network, memory, and disk space are all different factors that go into the pricing and sizing of these instances. Each of these virtual machine instances also have different categories: general purpose, compute optimized, memory optimized, disk optimized and other various types. Then, within each of these different instance types, there are different families. In AWS, the cheapest and smallest instances are in the “t2” family, in Azure they’re called the “A” family. On top of that, there are different generations within each of those families, so in AWS there’s t2, t3, m2, m3, m4, and within each of those processor families, different sizes (small, medium, large, and extra large). So there are lots of different options available. Oh, and lots confusion, too.
3 – It’s hard to see what you’re spending
If you aren’t familiar with AWS, Azure, or Google Cloud’s consoles or dashboards, it can be hard to find what you’re looking for. To find specific features, you really need to dig in, but even just trying to figure out the basics of how much you’re currently spending, and predicting how much you will be spending – all can be very hard to understand. You can go with the option of building your own dashboard by pulling in from their APIs, but that takes a lot of upfront effort, or you can purchase an external tool to manage overall cost and spending.
4 – It’s based on what you provision…not what you use
Cloud services pricing can charge on a per-hour, per-minute, or per-second basis. If you’re used to the on-prem model where you just deploy things and leave them running 24/7, then you may not be used to this kind of pricing model. But when you move to the cloud’s on-demand pricing models, everything is based on the amount of time you use it.
When you’re charged per hour, it might seem like 6 cents per hour is not that much, but after running instances for 730 hours in a month, it turns out to be a lot of money. This leads to another sub-point: the bill you get at the end of the month doesn’t come until 5 days after the month ends, and it’s not until that point that you get to see what you’ve used. As you’re using instances (or VMs) during the time you need them, you don’t really think about turning them off or even losing servers. We’ve had customers who have servers in different regions, or on different accounts that don’t get checked regularly, and they didn’t even realize they’ve been running all this time, charging up bill after bill.
You might also be overprovisioning or oversizing resources — for example, provisioning multiple extra large instances thinking you might need them someday or use them down the line. If you’re used to that, and overprovisioning everything by twice as much as you need, it can really come back to bite you when you go look at the bill and you’ve been running resources without utilizing them, but are still getting charged for them – constantly.
5 – They change the pricing frequently
Cloud services pricing has changed quite often. So far, they have been trending downward, so things have been getting cheaper over time due to factors like competition and increased utilization of data centers in their space. However, don’t jump to conclude that price changes will never go up.
Frequent price changes make it hard to map out usage and costs over time. Amazon has already made changes to their price more than 60 times since they’ve been around, making it hard for users to plan a long-term approach. Also for some of these instances, if you have them deployed for a long time, the prices of instances don’t display in a way that is easy to track, so you may not even realize that there’s been a price change if you’ve been running the same instances on a consistent basis.
6 – They offer cost savings options… but they’re difficult to understand (or implement)
In AWS, there are some cost savings measures available for shutting things down on a schedule, but in order to run them you need to be familiar with Amazon’s internal tools like Lambda and RDS. If you’re not already familiar, it may be difficult to actually implement this just for the sake of getting things to turn off on a schedule.
One of the other things you can use in AWS is Reserved Instances, or with Azure you can pay upfront for a full year or two years. The problem: you need to plan ahead for the next 12 to 24 months and know exactly what you’re going to use over that time, which sort of goes against the nature of cloud as a dynamic environment where you can just use what you need. Not to mention, going back to point #2, the obscure terminology for spot instances, reserved instances, and what the different sizes are.
7 – Each service is billed in a different way
Cloud services pricing shifts between IaaS (infrastructure as a service), which uses VMs that are billed one way, and PaaS (platform as a service) gets billed another way. Different mechanisms for billing can be very confusing as you start expanding into different services that cloud providers offer.
As an example, the Lambda functions in AWS are charged based on the number of requests for your functions, the duration, and the time it takes for your code to execute. The Lambda free tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month, or you can get 1M request free and $0.20 per 1M requests thereafter, OR use “duration” tier and get 400,000 GB-seconds per month free, $0.00001667 for every GB-second used thereafter – simple, right? Not so much.
Another example comes from the databases you can run in Azure. Databases can run as a single server or can be priced by elastic pools, each with different tables based on the type of database, then priced by storage, number of databases, etc.
With Google Kubernetes clusters, you’re getting charged per node in the cluster, and each node is charged based on size. Nodes are auto-scaled, so price will go up and down based on the amount that you need. Once again, there’s no easy way of knowing how much you use or how much you need, making it hard to plan ahead.
What can you do about it?
Ultimately, cloud service offerings are there to help enterprises save money on their infrastructures, and they’re great options IF you know how to use them. To optimize your cloud environment and save money on costs, we have a few suggestions: