Shadow IT: you’ve probably heard of it. Also known as Stealth IT, this refers to information technology (IT) systems built and used within organizations without explicit organizational approval or deployed by departments other than the IT department.
A recent survey of IT decision makers ranked shadow IT as the lowest priority concern for 2019 out of seven possible options. Are these folks right not to worry? In the age of public cloud, how much of a problem is shadow IT?
What is Shadow IT?
So-called shadow IT includes any system employees are using for work that is not explicitly approved by the IT department. These unapproved systems are common, and chances are you’re using some yourself. One survey found that 86% of cloud applications used by enterprises are not explicitly approved.
A common example of shadow IT is the use of online cloud storage. With the numerous online or cloud-based storage services like Dropbox, Box, and Google Drive, users have quick and easy methods to store files online. These solutions may or may not have been approved and vetted by your IT department as “secure” and/or a “company standard”.
Another example is personal email accounts. Companies require their employees to conduct business using the corporate email system. However, users frequently use their personal email accounts either because they want to attach large files, connect using their personal devices, or because they think the provided email is too slow. One in three federal employees has stated they had used personal email for work. Another survey found that 4 in 10 employees overall used personal email for work.
After consumer applications, we come to the issue of public cloud. Companies employ infrastructure standards to make support manageable throughout the organization, manage costs, and protect data security. However, employees can find these limiting.
In our experience, the spread of technologies without approval comes down to enterprise IT not serving business needs well enough. Typically, the IT group is too slow or not responsive enough to the business users. Technology is too costly and doesn’t align well with the needs of the business. IT focuses on functional costs per unit as the value it delivers; but the business cares more about gaining quick functionality and capability to serve its needs and its customers’ needs. IT is also focused on security and risk management, and vetting of the numerous cloud-based applications takes time – assuming the application provider even makes the information available. Generally, enterprise IT simply doesn’t or cannot operate at the speed of the other business units it supports. So, business users build their own functionalities and capabilities through shadow IT purchases.
Individuals or even whole departments may turn to public cloud providers like AWS to have testing or even production environments ready to go in less time than their own IT departments, with the flexibility to deploy what they like, on demand.
Is Shadow IT a problem?
With the advent of SaaS, IaaS and PaaS services with ‘freemium’ offerings that anyone can start using (like Slack, GitHub, Google Drive, and even AWS), Shadow IT has become an adoption strategy for new technologies. Many of these services count on individuals to use and share their applications so they can grow organically within an organization. When one person or department decides one of these tools or solutions makes their job easier, shares that service with their co-workers, and that service grows from there, spreads from department to department, growing past the free tier, until IT’s hand is forced to explicit or implicit approve through support. In cases like these, shadow IT could be considered a route to innovation and official IT approval.
On the other hand, shadow IT solutions are not often in line with organizational requirements for control, documentation, security, and reliability. This can open up both security and legal risks for a company. Gartner predicted in 2016 that by 2020, a third of successful attacks experienced by enterprises will be on their shadow IT resources. It’s impossible for enterprises to secure what they’re not aware of.
There is also the issue of budgeting and spend. Research from Everest Group estimates that shadow IT comprises 50% or more of IT spending in large enterprises. While this could reduce the need for chargeback/showback processes by putting spend within individual departments, it makes technology spend far less trackable, and such fragmentation eliminates the possibility of bulk or enterprise discounting when services are purchased for the business as a whole.
Is it a problem?
As with many things, the answer is “it depends.” Any given Shadow IT project needs to be evaluated from a risk-management perspective. What is the nature of the data exposed in the project? Is it a sales engineer’s cloud sandbox where she is getting familiar with new technology? Or is it a marketing data mining and analysis project using sensitive customer information? Either way, the reaction to a Shadow IT “discovery” should not be to try to shame the users, but rather, to adapt the IT processes and provide more approved/negotiated options to the users in order to make their jobs easier. If Shadow IT is particularly prevalent in your organization, you may want to provide some risk management guidance and training of what is acceptable and what is not. In this way, Shadow IT can be turned into a strength rather than a weakness, by outsourcing the work to the end users.
But, of course, IT cannot evaluate the risk of systems it does not know about. The hardest part is still finding those in the shadows.
Exciting news: RightSizing is now generally available in ParkMyCloud! You can now use this method for automated cost optimization alongside scheduling to achieve an optimized cloud bill in AWS, Azure, and Google Cloud.
How it Works
When you RightSize an instance, you find the optimal virtual machine size and type for its workload.
Why is this necessary? Cloud providers offer a myriad of instancetypeoptions, which can make it difficult to select the right option for the needs of each and every one of your instances. Additionally, users often select the largest size and compute power available, whether it’s because they don’t know their workload needs in advance, don’t see cost as their problem, or “just in case”.
In fact, our analysis of instances being managed in ParkMyCloud showed that 95% of instances were operating at less than 50% average CPU, which means they are oversized and wasting money.
Now with ParkMyCloud’s RightSizing capability, you can quickly and easily – even automatically – resolve these sizing discrepancies to save money. ParkMyCloud uses your actual usage data to make these recommendations, and provides three recommendation options, which can include size changes, family/type changes, and modernization changes. Users can choose to accept these recommendations manually or schedule the changes to occur at a later date.
How Much You Can Save
A single instance change can save 50% or more of the cost. In the example shown here, ParkMyCloud recommends three possible changes for this instance, which would save 40-68% of the cost.
At scale, the savings potential can be dramatic. For example, one enterprise customer who beta-tested RightSizing found that their RightSizing recommendations added up to $82,775.60 in savings – an average of more than $90 per month/ more than $1,000 per year for every instance in their environment.
How to Get Started
Are you already using ParkMyCloud? If not, go ahead and register for a free trial. You’ll have full access for 14 days to try out ParkMyCloud in your own environment – including RightSizing.
If you already use ParkMyCloud, you’ll need to make sure you’re subscribed to the Pro or Enterprise tier to have access to this advanced feature.
Now it’s time to RightSize! Watch this video to see how you can get started in just 90 seconds:
Earlier this week, AWS announced the launch of AWS resource optimization recommendations within their cost management portal. AWS claims that this will “identify opportunities for cost efficiency and act on them by terminating idle instances and rightsizing under-used instances.” Here’s what that actually means, and what functionality AWS still does not provide that users need in order to automate cost control.
AWS Recommendations Overview
AWS Recommendations are an enhancement to the existing cost optimization functionality covered by AWS Cost Explorer and AWS Trusted Advisor. Cost Explorer allows users to examine usage patterns over time. Trusted Advisor alerts users about resources with low utilization. These new recommendations actually suggest instances that may be a better fit.
AWS Resource Optimization provides two types of recommendations for EC2 instances:
Terminate idle instances
Rightsize underutilized instances
These recommendations are generated based on 14 days of usage data. It considers “idle” instances to be those with lower than 1% peak CPU utilization, and “underutilized” instances to be those with maximum CPU utilization between 1% and 40%.
While any native functionality to control costs is certainly an improvement, users often express that they wish AWS would just have less complex billing in the first place.
AWS Resource Optimization Tool vs. ParkMyCloud
ParkMyCloud offers cloud cost optimization through RightSizing for AWS, as well as Azure and Google Cloud, in addition to our automated scheduling to shut down resources when they are idle. Note that AWS’s new functionality does not include on/off schedule recommendations.
Here’s how the new AWS resource optimization tool stacks up against ParkMyCloud.
Types of Recommendations Generated
The AWS Resource Optimization tool will provide up to three recommendations for size changes within the same instance family, with the most conservative recommendation listed as the primary recommendation. Putting it another way, the top recommendation will be one size down from the current instance, the second recommendation will be two sizes down, etc. ParkMyCloud recommends the optimal instance type and size for the workload, regardless of the existing instance’s configuration. This includes instance modernization recommendations, which AWS does not offer.
The AWS tool generates recommendations for EC2 instances only, while ParkMyCloud recommends scheduling and RightSizing recommendations for EC2 and RDS. AWS also does not support GPU-based instances in its recommendations, while ParkMyCloud does.
AWS customers must explicitly enable generation of recommendations in the AWS Cost Management tools. In ParkMyCloud, recommendations are generated automatically (with some access limitations based on subscription tier).
ParkMyCloud allows you to manage resources across a multi-cloud environment including AWS, Azure, Google Cloud, and Alibaba Cloud. AWS’s tool, of course, only allows you to manage AWS resources.
When you start to dig in, you’ll notice several limitations of the recommendations provided by AWS. The recommendations are based on utilization data from the last 14 days, a range that is not configurable. ParkMyCloud’s recommendations, on the other hand, can be based on a configurable range of 1-24 weeks of data, configurable by the customer by team, cloud provider, and resource type.
Another important aspect of “optimization” that AWS does not allow the user to configure are the utilization thresholds. AWS assumes that any instance at less than 1% CPU utilization is idle, and assumes any instance between 1-40% CPU utilization is undersized. While these are reasonable rules of thumb, users need the ability to customize such thresholds to best suit their own environment and use cases, and AWS does not allow you to customize these parameters. AWS also assumes an “all or nothing” approach – they recommend that any instance detected as idle simply be terminated. ParkMyCloud does not assume that low utilization means the instance should be terminated, but suggests sizing and/or scheduling solutions with specificity to the utilization patterns. ParkMyCloud allows users to select between Conservative, Balanced, or Aggressive schedule recommendations with customizable thresholds.
AWS also only evaluates “maximum CPU utilization” to determine idle resources. However, for resource schedule recommendations, ParkMyCloud uses both Peak and Average CPU plus network utilization for all instances, and memory utilization for instances with the CloudWatch agent installed. For sizing recommendations, ParkMyCloud uses maximum Average CPU plus memory utilization data if available.
Perhaps the most dangerous aspect of the AWS Recommendation is they will recommend an instance size change based on CPU alone, even if they do not have memory metrics. Without cross-family recommendation, this means each size down typically cuts the memory in half. ParkMyCloud Rightsizing Recommendations do not assume this is OK. In the absence of memory metrics, we make downsizing recommendations that keep memory constant. For a concrete example of this, here is an AWS recommendation to downsize from m5.xlarge to m5.large, cutting both CPU and memory, and resulting in a net savings of $60 per month.
In contrast, here is the ParkMyCloud Rightsizing Recommendation for the same instance:
You can see that while the AWS recommendation can save $60 per month by downsizing from m5.xlarge to m5.large, the top ParkMyCloud recommendation saves a very similar $57.67 by allowing the transition from m5.xlarge to r5a.large, keeping memory constant. While the savings are off by $2.33, this a far less risky transition and probably worth the difference. In both cases, of course, memory data from the CloudWatch Agent would likely result in better recommendations.
As shown in the AWS recommendation above, AWS provides the “RI hours” for the preceding 14 days, giving better visibility into the impact of resizing on your reserved instance usage, and uses this data for the cost and savings calculations. ParkMyCloud does not yet provide correlation of the size to RI usage, though that is planned for a future release. That said, the AWS documentation also states “Rightsizing recommendations don’t capture second-order effects of rightsizing, such as the resulting RI hour’s availability and how they will apply to other instances. Potential savings based on reallocation of the RI hours aren’t included in the calculation.” So the RI visibility on the AWS side has minimal impact on the quality of their recommendations.
If the user is viewing the AWS Recommendation from within the same account as the target EC2 instance, a “Go to the Amazon EC2 Console” button appears on the recommendation details, but it leads to the EC2 console for whatever your last-used region was, and without an automatic filter for the specific instance ID. This means you need to do your own navigation to the right region (perhaps also requiring a new console login if the recommendation is for a different account in the Organization), and then find the instance to see the details. ParkMyCloud provides better ease-of-use in that you can jump directly from the recommendation into the instance details, regardless of your AWS Organization account structure. ParkMyCloud: 1 click. AWS: At least five, plus copy/paste of the instance ID and plus possibly a login.
ParkMyCloud also shows utilization data for the recommendation below the recommendation text, giving excellent context. AWS again requires navigation the right account, EC2 and then region, or to CloudWatch and the right metrics using the instance ID.
AWS Resource Optimization also ignores instances that have not been run for the past three days. ParkMyCloud takes this lack of utilization into consideration and does not discard these instances from recommendations.
AWS regenerates recommendations every 24 hours. ParkMyCloud regenerates recommendations based on the calculation window set by the customer.
Automation & Ease of Use
While AWS’s new recommendations are generated automatically, they all must be applied manually. ParkMyCloud allows users to accept and apply scheduling recommendations automatically, via a Policy Engine based on resource tagging and other criteria. RightSizing changes can be “applied now”, or scheduled to occur in the future, such as during a designated maintenance window.
There is also the question of visibility and access across team members. In AWS, users will need access to billing, which most users will not have. In ParkMyCloud, Team Leads have access to view and execute recommendations for resources assigned to their respective teams. Additionally, recommendations can be easily exported, so business units or teams can share and review recommendations before they’re accepted if required by their management process.
AWS’s management console and user interface are often cited as confusing and difficult to use, a trend that has unfortunately carried forward to this feature. On the other hand, ParkMyCloud makes resource management straightforward with a user-friendly UI.
Want to see what ParkMyCloud will recommend for your environment? Try it out with a free trial, which gives you 14-day access to our entire feature set, and you can see what cost optimization recommendations we have for you.
On our first day as Turbonomic employees, our team had some great discussions with CTO Charles Crouchman about Turbonomic, ParkMyCloud, and the market for infrastructure automation tools. Charles explained his vision of the future of infrastructure automation, which parallels the automation trajectory that cars and other vehicles have been following for decades. It’s a comparison that’s useful in order to understand the goals of fully-automated cloud infrastructure – and the mindset of cloud users adopting this paradigm. (And of course, given our name, we’re all in on driving analogies!)
The Five Levels of Vehicle Autonomy
The idea of the five levels of vehicle autonomy – or six, if you include level 0 – is an idea that comes from the Society of Automotive Engineers.
The levels are as follows:
Level 0 – No Automation. The driver performs all driving tasks with no tools or assistance.
Level 1 – Driver Assistance.The vehicle is controlled by the driver, but the vehicle may have driver-assist features such as cruise control or an automated emergency brake.
Level 2 – Partial Automation or Occasional Self-Driving. The driver must remain in control and engaged in driving and monitoring, but the vehicle has combined automated functions such as acceleration and steering/lane position.
Level 3 – Conditional Automation or Limited Self-Driving. The driver is a necessity, but not required to monitor the environment. The vehicle monitors the road and traffic, and informs the driver when he or she must take control.
Level 4 – High Automation or Full Self-Driving Under Certain Conditions. The vehicle is capable of driving under certain conditions, such as urban ride-sharing, and the driver may have the option to control the vehicle. This is where airplanes are today – for the most part, they can fly themselves, but there’s always a human pilot present.
Level 5 – Full Automation or Full Self-Driving Under All Conditions. The vehicle can drive without a human driver or occupants under all conditions. This is an ideal, but right now, neither the technology nor the people are ready for this level of automation.
How These Levels Apply to Infrastructure Automation Tools
Now let’s take a look at how these levels apply to infrastructure automation tools and infrastructure:
Level 0 – No Automation. No tools in place.
Level 1 – Driver Assistance.Some level of script-based automation with limited applications, such as scripting the installation of an application so it’s just one user command, instead of hand-installing it.
Level 2 – Partial Automation or Occasional Self-Driving. In cloud infrastructure, this translates to having a monitoring system in place that can alert you to potential issues, but cannot take action to resolve those issues.
Level 3 – Conditional Automation or Limited Self-Driving. Think of this as traditional incident resolution or traditional orchestration. You can build specific automations to handle specific use cases, such as opening a ticket in a service desk, but you have to know what the event trigger is in order to automate a response.
Level 4 – High Automation or Full Self-Driving Under Certain Conditions.This is the step where analytics are integrated. A level-4 automated infrastructure system uses analytics to decide what to do. A human can monitor this, but is not needed to take action.
Level 5 – Full Automation or Full Self-Driving Under All Conditions. Full automation. Like in the case of vehicles, both the technology and the people are a long way from this nirvana.
So where are most cloud users in the process right now? There are cloud users and organizations all over this spectrum, which makes sense when you think about vehicle automation: there are early adopters who are perfectly willing to buy a Tesla, turn on auto-pilot, and let the car drive them to their destination. But, there are also plenty of laggards who are not ready to take their hands off the wheel, or even turn on cruise control.
Most public cloud users have at least elements of levels 1 and 2 via scripts and monitoring solutions. Many are at level 3, and with the most advanced platforms, organizations reach level 4. However, there is a barrier between levels 4 and 5: you will need an integrated hardware/software solution. The companies that are closest to full automation are the hyperscale cloud companies like Netflix, Facebook, and Google who have basically built their own proprietary stack including the hardware. This is where Kubernetes comes from and things like Netflix Scryer.
In our conversation, Charles said: “The thing getting in the way is heterogeneity, which is to say, most customers buy their hardware from one vendor, application software from another company, storage from another, cloud capacity from another, layer third-party software applications in there, use different development tools –– and none of these things were effectively built to be automated. So right now, automation needs to happen from outside the system, with adaptors into the systems. To get to level 5, the automation needs to be baked in from the system software through the application all the way up the stack.”
What Defines Early Adopters of Infrastructure Automation Tools
While there’s a wide scale of adoption in the market right now, there are a few indicators that can predict whether an organization or an individual will be open to infrastructure automation tools.
The first is a DevOps approach. If an organization using DevOps, they have already agreed to let software automate deployments, which means they’re accepting of automation in general – and likely to be open to more.
Another is whether resource management is centralized within the organization or not. If it is centralized, the team or department doing the management tends to be more open to automation and software solutions. If ownership is distributed throughout the organization, it’s naturally more difficult to make unified change.
Ultimately, the goal we should all be striving for is to use infrastructure automation tools to step up the levels of automated resource configuration and cost control. Through automation, we can reduce management time and room for human error to achieve optimized environments.
In today’s entry in our exploration of container services, we’ll look at Azure Kubernetes Service (AKS). Azure AKS manages your hosted Kubernetes environment, making it simple to deploy and manage containerized applications without container orchestration expertise, divesting much of that responsibility to Azure – much like EKS and GKE do for AWS and Google Cloud. Critical tasks like health monitoring of ongoing operations and maintenance by provisioning, upgrading, and scaling resources on demand are handled by Azure.
Azure AKS Overview
Azure AKS is, as of this writing, just over a year old, released for general availability in June 2018. With AKS, you can deploy, scale, and manage Docker containers and applications. Azure AKS gives developers greater flexibility, automation and reduced management overhead for administrators and developers. This is because it’s a managed service, which takes some of the management burden off the user.
As applications grow to span multiple containers deployed across multiple servers, operating them becomes more complex. To manage this complexity, Azure AKS provides an open source API to deploy, scale and manage Docker containers and container-based applications across a cluster of container hosts.
Use cases for AKS include:
Easily migrating existing applications to Kubernetes
Simplifying the deployment and management of microservices based applications
Easily integrated DevSecOps
IoT device deployment and management on demand
Machine Learning model training with AKS
If AKS is free, what do you pay for?
Yes, Azure AKS is a free service since there is no charge for managing Kubernetes clusters. However, you pay for the VM instances, storage and networking resources consumed by your Kubernetes cluster. These should be managed like any other cloud resources, with attention paid to potential areas of waste.
AKS vs. ACS
Microsoft’s experience with cluster orchestration began with Azure Container Service back in 2017, which supported Kubernetes, Docker Swarm and Mesosphere’s DC/OS. It was the simplest most open and flexible way to run container applications in the cloud then, and now followed by Azure Kubernetes Services (AKS), which was made generally available in 2018.
ACS users who run on Kubernetes can possibly migrate to AKS, but migration should be planned and reviewed for it to be successful as there are many key areas in which they are different. If considering migration, check out Azure’s guide to migrating from ACS to AKS here.
Should you use Azure AKS?
Chances are, you’re locked into a cloud provider – or have a preferred cloud provider – already, so you’re likely to use the container management service offered on your provider of choice. If you’re on Azure, AKS will be the natural choice as you increase use of microservices and app portability with containers.
Google Cloud recently released a new pricing option: Google Cloud capacity reservations. This new option intended for users with anticipated spikes in usage, such as over holidays or planned backups. It also expanded its Committed Use discount program to apply to more types of resources.
Manish Dalwadi, product manager for Compute Engine, said in Google’s announcement of these releases, “you shouldn’t need an advanced degree in finance to get the most out of your cloud investment.”
We’ve noted Google Cloud’s positioning as “best in customer-first pricing” in previous articles on Sustained Use Discounts and Resource-Based Pricing. However, the new options – particularly capacity reservations – may not be the best example of this.
How Google Cloud Capacity Reservations Work
Google Cloud capacity reservations are a bit different from options we see at the other major cloud providers. They are not a cost-savings plan like the AWS and Azure’s “reserved instance” programs that allow users to pay upfront for lower prices. Instead, they actually reserve capacity, to ensure it’s available when you need it. Use cases include holiday/Black Friday demand, planned organic growth, and backup/disaster recovery.
VMs you have reserved in advance will be billed at the same rate as on-demand. However, other discounts may apply. As you consume reserved VMs, you’ll also get the benefit of any applicable Sustained and Committed Use discounts.
One potential issue is that once you make a reservation, you will continue to consume and be charged for the resources until the reservation expires or you delete it. By default, any instance that matches the reservation configuration will be allocated against the reservation. On the one hand, this can prevent you from having to pay for reserved capacity above what you are using, but this may actually defeat your purpose of trying to have additional guaranteed capacity available. To guarantee the extra capacity for a specific instance even if it is stopped (or “parked” as we like to say), you will need to explicitly set an option when the instance is created. Note that you will still be paying for the reservation if you do not have any running instances that match the reservation.
Another caveat is that “a VM instance can only use a reservation if its properties exactly match the properties of the reservation.“ In other words, you cannot buy a bunch of small reservations and expect that they can be combined into a big reservation, like you can do with certain types of reserved instances from the other cloud providers. This is consistent with the idea of a capacity reservation, rather than a discount program, and is worth keeping in mind.
This is a new avenue for customers to easily commit themselves to spending on resources they may not actually need, so we encourage you to evaluate carefully before reserving capacity and to keep a close watch on your monthly bill and review the cloud waste checklist.
More Committed Use Discounts
Alongside the capacity reservations, Google also announced an expansion of Committed Use Discounts to include GPUs, Cloud TPU Pods, and local SSDs.
Ultimately, Google Cloud pricing fares well on measures of user-friendliness and options for cost savings, but we question if the reserved capacity changes will do anything to improve the readability of the bill. On the other hand, the expansion of Committed Use discounts does provide more savings-in-hand options for customers.
Take a few minutes to ensure you’re not oversizing or spending money on resources that should be turned off, and you’ll be well on your way to an optimized Google Cloud bill.