AWS Neptune Overview – Amazon’s Graph Database Service

AWS Neptune Overview – Amazon’s Graph Database Service

AWS Neptune is AWS’s managed graph database service, offered to give customers an option to easily build and run applications that work with highly connected datasets. It was first announced at AWS re:Invent 2017, and made generally available in May 2018.

Graph databases like AWS Neptune were created to address the limitations of relational databases, and offer an efficient way to work with complex data. 

What is a graph database?

A graph database is a database optimized to store and process highly connected data – in short, it’s about relationships. The data structure for these databases consists of vertices or nodes, and direct links called edges.

Use cases for such highly-connected data include social networking, restaurant recommendations, retail fraud detection, knowledge graphs, life sciences, and network & IT ops. For a restaurant recommendations use case, for example, you may be interested in the relationships between various users, where those users live, what types of restaurants those users like, where the restaurants are located, what sort of cuisine they serve, and more. With a graph database, you can use the relationships between these data points to provide contextual restaurant recommendations to users.

Details on the AWS Neptune Offering

AWS Neptune Pricing 

The AWS Neptune cost calculation depends on a few factors:

  • On-Demand instance pricing – you’ll need to pay for the compute instances needed for read-write workloads as well as Amazon Neptune replicas. These follow the general pricing for AWS On Demand instances.
  • Database Storage & I/Os – storage is also paid per usage with no upfront commitments. Storage is billed in per GB-month increments and I/Os are billed in per million request increments. 
  • Backup storage – you are charged for the storage associated with your automated database backups and database cluster snapshots. As per usual, increasing the retention period will cost more. 
  • Data transfer – you are charged per GB for data transferred in and out of AWS Neptune.

For this, as with most AWS services, pricing is confusing and difficult to predict. 

AWS Neptune Use Cases

Use cases for the AWS graph database and other similar offerings include:

  • Machine learning, such as intelligent image recognition, speech recognition, intelligent chatbots, and recommendation engines.
  • Social networking
  • Fraud detection – flexibility at scale makes graph databases useful to work with the huge amount of transactional data needed to detect fraud. 
  • Regulatory compliance – ever-more important as HIPPA, GDPR, and other regulations pose strict regulations on the way organizations use data about customers.
  • Knowledge graphs – such as advanced results for keyword searches and complex content searches.Life sciences – graph databases are uniquely suited to store models of disease and gene interactions, protein matterns, chemical compounds, and more. 
  • Network/IT Operations to keep networks secure, including identity and access management, detection of malicious file paths, and more. 
  • Supply chain transparency – graph databases are great for modeling complex supply chains that span the globe. 

Tired of SQL?

If you’re tired of SQL, AWS Neptune may be for you. A graph database is fundamentally different from SQL. There are no tables, columns, or rows – it feels like a NoSQL database. There are only two data types: vertices and edges, both of which have properties stored as key-value pairs.

AWS Neptune is fully managed, which means that database management tasks like hardware provisioning, software patching, setup, configuration, and backups are taken care of for you.

It’s also highly available and shows up in multiple availability zones. This is very similar to Aurora, the relational database from Amazon, in its architecture and availability.

Neptune supports Property Graph and W3C’s RDF. You can use these to build your own web of data sets that you care about, and build networks across the data sets in the way that makes sense for your data, not with arbitrary presets. You can do this using the graph models’ query languages: Apache TinkerPop Gremlin and SPARQL.

AWS Neptune Visualization is not built in natively. However, data can be visualized with Amazon SageMaker Jupyter notebooks, or third-party options like Metaphactory, Tom Sawyer Software, Cambridge Intelligence/Keylines, and Arcade. 

Other Graph Database Options

There’s certainly competition in the market for other graph database solutions. Here are a few that are frequently mentioned. 

AWS Neptune vs. Neo4j

Neo4j is a graph database that has been rated most popular by mindshare and adoption. Version 1.0 was released in February 2010. Unlike AWS Neptune, Neo4j is open source. Neo4j uses the language Cypher, which it originally developed. While there are several languages available in the graph database market, Cypher is widely known by now. 

Neo4j, unlike AWS Neptune, does actually come with graph visualization, which is a huge plus for working with this kind of data, though as mentioned above, there are several ways to visualize your Neptune data.

Other

Other graph databases include: AllegroGraph, AnzoGraph, ArangoDB, DataStax Enterprise Graph, InfiniteGraph, JanusGraph, MarkLogic, Microsoft SQL Server 2017, OpenLink Virtuoso, Oracle Spatial and Graph (part of Oracle Database), OrientDB, RedisGraph, SAP HANA, Sparksee, Sqrrl Enterprise, and Teradata Aster.

AWS Neptune – Getting Started

If you’re interested in the new service, you can check out more about AWS Neptune. As you get started, the AWS Neptune docs are a great resource. Or, check out some AWS Neptune Tutorials on YouTube

Once you’re on board, make sure you have cost control as a priority. ParkMyCloud can now park Neptune databases to ensure you’re only paying for what you’re actually using. Try it out for yourself!

Microsoft’s Start/Stop VM Solution vs. ParkMyCloud

Microsoft’s Start/Stop VM Solution vs. ParkMyCloud

Users looking to save money on public cloud may be in the market for a start/stop VM solution. While it sounds simple, there is huge savings potential available simply by stopping VMs, typically on a schedule. The basic idea is that non-production instances don’t need to run 24×7, so by turning VMs off when they’re not needed, you can save money.

If you use Microsoft Azure, perhaps you’ve seen the Start/Stop VM solution in the Azure Marketplace. You may want this tool if you want to configure Azure to start/stop VMs for the weekend or on weekday nights. It may also serve as a way to avoid creating a stop VM powershell.

Users of Azure have taken advantage of this option to start/stop VMs during off-hours, but have found that it is lacking some key functionality that they require for their business. Let’s take a look at what this Start/Stop tool offers and what it lacks, then compare it to ParkMyCloud’s comprehensive offering.

Azure Start/Stop VM Solution

Let’s take a look at Azure’s start/stop VM solution. The crux of this solution is the use of a few Azure services, specifically Automation and Log Analytics to schedule the VMs and Azure Monitor emails to let you know when a system was shut down or started. Both scheduling and keeping track of said schedules are important. 

As far as the backbone of Azure services, the use of native tools within Azure can be useful if you’re already baked into the Azure ecosystem, but can be prohibitive to exploring other cloud options. You may only use Azure at the moment, but having the flexibility to use other public clouds in the future is a strong reason to use cloud-agnostic tools today.

Next, this solution costs money, but it’s not very easy to estimate the cost (but does that surprise you?). The total cost is based on the underlying services (Automation, Log Analytics, and Azure Monitor), which means it could be very cheap or very expensive depending on what else you use and how often you’re scheduling resources. 

The schedules themselves can be based on time, but only for a single start and stop time – which is not practical for typical applications. The page claims it can be based on utilization, but in the initial setup there is no place to configure that. It also needs to be set up for 4 hours before it can show you any log or monitoring information. 

The interface for setting up schedules and automation is not very user-friendly. It requires creating automation scripts that are either for stopping or starting only, and only have one time attached. This is tedious, and the single-time configuration makes it difficult to maximize off time and therefore savings. 

To create new schedules, you have to create new scripts, which makes the interface confusing for those who aren’t used to the Azure portal. At the end of the setup, you’ll have at least a dozen new objects in your Azure subscription, which only grows if you have any significant number of VMs.

Users have noted numerous complaints in the solution’s reviews:

  • Great idea – painful to use – I don’t know why it couldn’t work like the auto shutdown built into the VM config with maybe a few more options (on/off weekdays vs. weekends). Feels like a painful set of scripts with no config options once it’s deployed (or I don’t understand how to use it).”
  • “Tried to boil the ocean – This solution is complex and bloated. It still supports classic VMs. The autostop solution only supports stop not start. Why bother using this?”
  • Start/Stop VM Azure – Difficult to do and harder to modify/change components. I’ll have difficulty to repeat to create another schedule for different VM.”

Luckily, there’s an easier option.

How it stacks up to ParkMyCloud

So if the Start/Stop VM Solution from Microsoft can start and stop Azure VMs, what more do you need? Well, we at ParkMyCloud have heard from customers (ranging from day-1 startups to Fortune 100 companies) that there are features necessary for a cloud cost optimization tool if it is going to get widespread adoption. 

That’s why we created ParkMyCloud: to provide simple, straightforward cost optimization that provides rapid ROI while being easy to use. You can use ParkMyCloud to save money through Azure start/stop VM schedules for non-production resources that are not needed evenings and weekends, as well as RightSizing overprovisioned resources.

Here are some of the features ParkMyCloud has that are missing from the Microsoft tool:

  • Single Pane of Glass – ParkMyCloud can work with multiple clouds, multiple accounts within each cloud, and multiple regions within each account, all in one easy-to-use interface.
  • Easy to change or override schedules – Users can change schedules or temporarily override them through the UI, our API, our Slackbot, or through our iOS app. 
  • Schedule recommendations – the Azure tool requires users to determine their own schedules. ParkMyClouds recommends on/off schedules based on keywords found in tags and names, and based on resource utilization history. 
  • Policy engine – ParkMyCloud can assign schedules automatically based on rules you create based on teams, names, or other criteria.
  • RightSizing – in addition to on/off schedules,  you can also save money with RightSizing. Our data shows that more than 95% of VMs are operating at less than 50% average CPU, which means they are oversized and wasting money.  Changing the VM size or family, or modernizing instance types, saves 50-75% of the cost of the instance.
  • User Management – Admins can delegate access to users and assign Team Leads to manage sub-groups within the organization, providing user governance over schedules and VMs. Admin, Team Lead, and Team Member roles are able to be modified to fit your organization’s needs.
  • No Azure-specific knowledge needed – Users don’t need to know details about setting up Automation Scripts or Log Analytics to get their servers up and running. Many ParkMyCloud administrators provide access to users throughout their organizations via the ParkMyCloud RBAC. This is useful for users who may need to, say, start and stop a demo environment on demand, but who do not have the knowledge necessary to do this through the Azure console.
  • Enterprise features – Single sign-on, savings reports, notifications straight to your email or chat group, and full support access helps your large organization save money quickly.
  • Integrations – use ParkMyCloud with your favorite SSO tools such as Ping and Okta. Get notifications and send commands back to ParkMyCloud through tools like Slack and Microsoft Teams.
  • Straightforward setup – it usually takes new users 15 minutes or less to set up a ParkMyCloud account, connect to Azure, and get started saving money. 
  • Reporting – with ParkMyCloud, users can view, download, and email savings reports covering costs, actions, and savings by team, credential, provider, resource, and more.
  • Notifications – users can get configurable notifications of ParkMyCloud updates & activities via email, webhook or ChatOps.
  • Huge cost savings and ROIhere are just a few examples from some of our customers.
    • A global fast food chain is managing 3,500+ resources in ParkMyCloud and saving more than $200,000 per month on their cloud spend
    • A global registry software company has saved more than $2.2 million on their cloud spend since signing up for ParkMyCloud – an ROI of 6173%
    • A global consumer goods company with 200+ ParkMyCloud users saves more than $100,000 per month on their cloud spend.

As you can tell, the Start/Stop VM solution from Microsoft can be useful for very specific cases, but most customers will find it lacking the features they really need to make cloud cost savings a priority. ParkMyCloud offers these features at a low cost, so try out the free trial now to see how quickly you can cut your Azure cloud bill.

Related Reading:

9 Key Takeaways from our AWS Webinar on Automated Cost Control

9 Key Takeaways from our AWS Webinar on Automated Cost Control

We recently held our first AWS webinar, featuring speakers from AWS, Sysco, and our CTO Bill Supernor. If you missed “How to Turn AWS Utilization Data into Automated Cost Control,” not to worry! You can watch a replay here.

Here are 9 takeaways from this AWS webinar – and more resources to learn about them:

    • Cost Optimization is one of five key pillars in the AWS Well-Architected Framework, and we’re glad to see AWS prioritizing controlled costs so highly. If you’re not already familiar with the Well-Architected Framework, learn more on the AWS site. The other pillars, by the way, include operational excellence, security, reliability, and performance efficiency. 
    • Choose the right pricing model for your workload needs. Make sure to evaluate whether Reserved Instances are a good choice before committing, and don’t forget about Spot Instances either. 
    • Tagging resources according to cost allocation was emphasized by AWS as important for decision making – and of course it is! You have to be able to categorize your resources to make decisions about them. Here’s more on how to improve cloud automation through tagging.
    • Use AWS CloudWatch – similarly, use your CloudWatch data to optimize your environment. AWS is collecting data about your usage whether you’re looking at it or not – so put it to work!
    • Bagels work – Sysco Foods’ Kurt Brochu shared that he could motivate his team to show up for cost optimization trainings by providing bagels. Sometimes it takes a bit of prodding to get team members not directly responsible for budget to care about cost, so don’t be afraid to get creative. 
    • Use Gamification as a motivator – similarly, by turning cost savings into a race or other competition, you can awake interest that might otherwise be hard to find.
    • There are plenty more AWS webinars – AWS partners frequently hold webinars in conjunction with the cloud provider. One of the best places to learn about them is the @AWS_Partners Twitter channel.

Watch the replay of our AWS webinar for the full story – and let us know in the comments below what else you’d like to learn about in future webinars!

How Big is AWS?

How Big is AWS?

If you’re at all familiar with cloud computing, you know Amazon Web Services is a giant – but just how big is AWS? There are a number of measures of the size of a cloud business like Amazon’s –– here are answers to just a few of those questions.

How big is AWS’s staff?

While numbers for employees of Amazon as a whole are reported in the company’s quarterly earnings reports (630,600 as of Q1 2019), the number of those under AWS is less clear. 

AWS has just over 40,000 employees listed on LinkedIn, but of course, that’s not the most accurate measure. By eyeballing the operating expenses reported for the AWS segment compared to Amazon’s business as a whole, you could estimate up to 62,000, but it’s likely lower than that. As of this writing, AWS has 12,280 full-time job openings listed on their website, while Amazon as a whole has 32,454 openings.

We’ll be interested to see how this is affected by HQ2 joining ParkMyCloud’s neighborhood in Northern Virginia later this year.

How big is AWS’s infrastructure?

AWS has 66 availability zones within 21 geographic regions around the world. Each availability zone consists of one to dozens of individual data centers. To visualize these data centers, check out AWS’s exploration of them here

How big is AWS’s list of products?

When we last counted in April, there were 170 unique services listed on AWS’s offerings page, and there could certainly be more by now. These range from core compute products like EC2 to newer releases like AWS Deepracer for machine learning. Look for a spike in this count after AWS re:Invent in early December, as the cloud provider tends to save up announcements for its yearly user conference. 

Speaking of…

How big is AWS re:Invent?

In 2018, AWS re:Invent pulled 52,000 attendees, and AWS estimates a crowd of 62,000 for 2019, each year taking up a large portion of the Las Vegas Strip.

In comparison, Microsoft Ignite expects 25,000 attendees this year, Google Cloud Next estimated 30,000 attendees, and VMworld estimates 21,000 attendees. Then again, Salesforce’s Dreamforce 2018 drew 170,000 attendees, and Consumer Electronics Show (CES) reports 175,212 attendees for 2019. So while AWS re:Invent may be large for a cloud-specific conference, it’s not quite a giant as far as tech shows go. 

How big is AWS’s market share?

When looking at public cloud, it’s clear AWS still holds the largest portion of the market. A recent report put AWS at 47% of the market, with the next-closest competitor as Azure at 22%. More about AWS vs. Azure vs. Google cloud market share.

How big is AWS’s revenue?

For Q1 2019, AWS reported sales of $7.7 billion, showing consistent growth and the largest of any of the cloud service providers. For the full year of 2018, AWS reported $25.7 billion in revenue – that’s more than McDonald’s. Additionally, AWS has $16 billion or more in backlog revenue from contracts for future services. It is a growing proportion of Amazon’s business. In fact:

How big is AWS as a portion of Amazon?

In first quarter reports, AWS contributed about 50% of Amazon’s overall operating income, with an operating margin of 29%. Overall, AWS is growing as a contributor to Amazon’s income and growth. More here.

So how big is AWS? It’s up to you how you want to measure, but suffice it to say: big.

How AWS Firecracker Makes Containers and Serverless More Efficient

How AWS Firecracker Makes Containers and Serverless More Efficient

AWS Firecracker was announced at AWS re:Invent in November 2018 as a new AWS open source virtualization technology. The technology is purpose-built for creating and managing secure, multi-tenant container and function-based services. It was described by the AWS Chief Evangelist Jeff Barr as “what a virtual machine would look like if it was designed for today’s world of containers and functions.”

What is AWS Firecracker?

Firecracker is a Virtual Machine Manager (VMM) exclusively designed for running transient and short-lived processes. In other words, it helps to optimize the running of functions and serverless workloads. It’s also an important new component in the emerging world of serverless technologies and is used to enhance the backend implementation of Lambda and Fargate. Firecracker helps deliver the speed of containers combined with the security of VMs. If you use Lambda or Fargate, you’re already receiving the benefits of Firecracker. However, if you run/orchestrate a large volume of containers, you should take a look at this service with optimization in mind.

How AWS Firecracker Creates Efficiencies

AWS can realize the economic benefits of Firecracker by creating what they call “microVMs”, which allows them to spread serverless workloads around multiple servers thus getting a greater ROI from its investment in the servers behind serverless. In terms of customer benefit, using Firecracker enables these new microVMs to launch in 125 milliseconds or less, compared to the seconds (or longer) it can take to launch a container or spin up a traditional virtual machine. In a world where thousands of VMs can be spun up and down to tackle a specific workload, this will constitute a significant savings. And remember, these are fully fledged micro virtual machines, not just containers.The micro VM’s themselves are worth a closer look as each includes an in-process rate limiter to optimize shared network and storage resources. As a result, one server can support thousands of microVMs with widely varying processor and memory configurations.\

There is also the enhanced security and workload isolation only available from Kernel-based Virtual Machine (KVMs) – more secure than containers, which are less isolated. One particularly valuable security feature is that Firecracker is statically linked, which means all the libraries it needs to run are included in its executable code. This makes new Firecracker environments safer by eliminating outside libraries. Altogether, this offering and the combination of efficiency, security and speed created quite the buzz at the AWS re:Invent launch.

Will Firecracker make a “bang”?

There are a few caveats related to the still novel aspects of the technology. In particular, compared to alternatives, such as containers or Hyper-V VMs, it is prudent to confine to non-production workloads as the technology is still new and needs to be more fully battle-tested for production use.

However, as confidence, adoption, and experience grow in the use of serverless technologies it certainly seems like Firecracker can offer a popular new method for provisioning compute resources and will likely help bridge the current gap between VMs and containers.

SaaS vs. PaaS vs. IaaS – Where the Market is Going

SaaS vs. PaaS vs. IaaS – Where the Market is Going

SaaS, PaaS, IaaS – these are the three essential models of cloud services to compare, otherwise known as Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Each of these has its own benefits, and it’s good to understand why providers offer these different models and what implications they have for the market. While SaaS, PaaS, and IaaS are different, they are not competitive – most software-focused companies use some form of all three. Let’s take a look at these main categories, and because I like to understand things by company name, I’ll include a few of the more common SaaS, PaaS, and IaaS providers in market today.

SaaS: Software as a Service

Software as a Service, also known as cloud application services, represents the most commonly utilized option for businesses in the cloud market. SaaS utilizes the internet to deliver applications, which are managed by a third-party vendor, to its users. A majority of SaaS applications are run directly through the web browser, and do not require any downloads or installations on the client side.

Prominent providers: Salesforce, ServiceNow, Google Apps, Dropbox and Slack (and ParkMyCloud, of course).

PaaS: Platform as a Service

Cloud platform services, or Platform as a Service (PaaS), provide cloud components to certain software while being used mainly for applications. PaaS delivers a framework for developers that they can build upon and use to create customized applications. All servers, storage, and networking can be managed by the enterprise or a third-party provider while the developers can maintain management of the applications.

Prominent providers and offerings: AWS Elastic Beanstalk, RedHat Openshift, IBM Bluemix, Windows Azure, and VMware Pivotal CF.

IaaS: Infrastructure as a Service

Cloud infrastructure services, known as Infrastructure as a Service (IaaS), are made of highly scalable and automated compute resources. IaaS is fully self-service for accessing and monitoring things like compute, storage, networking, and other infrastructure related services, and it allows businesses to purchase resources on-demand and as-needed instead of having to buy hardware outright.

Prominent Providers: Amazon Web Services (AWS), Microsoft Azure (Azure), Google Cloud Platform (GCP), and IBM Cloud.

SaaS vs. PaaS vs. IaaS

SaaS, PaaS and IaaS are all under the umbrella of cloud computing (building, creating, and storing data over the cloud). Think about them in terms of out-of-the-box functionality and building from the bottom up.

IaaS helps build the infrastructure of a cloud-based technology. PaaS helps developers build custom apps via an API that can be delivered over the cloud. And SaaS is cloud-based software companies can sell and use.

Think of IaaS as the foundation of building a cloud-based service — whether that’s content, software, or the website to sell a physical product, PaaS as the platform on which developers can build apps without having to host them, and SaaS as the software you can buy or sell to help enterprises (or others) get stuff done.

SaaS, PaaS, IaaS Market Share Breakdown

The SaaS market is by far the largest market, according to a Gartner study that reported that enterprises spent $182B+ on cloud services, with SaaS services making up 43% of that spend.

While SaaS is currently the largest cloud service in terms of spend, IaaS is currently projected to be the fastest growing market with a CAGR of 20% plus over the next 3 to 4 years. This bodes very well for the “big three” providers, AWS, Azure and GCP.

Where the Market is Going

What’s interesting is that many pundits argue that PaaS is the future, along with FaaS, DaaS and every other X-as-a-service. However, the data shows otherwise. As evidenced by the reports from Gartner above, IaaS has a larger market share and is growing the fastest.

First of all, this is because IaaS offers all the important benefits of using the cloud such as scalability, flexibility, location independence and potentially lower costs. In comparison with PaaS and SaaS, the biggest strength of IaaS is the flexibility and customization it offers. The leading cloud computing vendors offer a wide range of different infrastructure options, allowing customers to pick the performance characteristics that most closely match their needs.

In addition, IaaS is the least likely of the three cloud delivery models to result in vendor lock-in. With SaaS and PaaS, it can be difficult to migrate to another option or simply stop using a service once it’s baked into your operations. IaaS also charges customers only for the resources they actually use, which can result in cost reductions if used strategically. While much of the growth is from existing customers, it’s also because more organizations are using IaaS across more functions than either of the other models of cloud services.