If you use AWS, it’s likely you’ll use or at least run across Amazon Redshift – so make sure you know these eight things about how AWS Redshift Pricing works.
Amazon Redshift Overview
AWS calls Redshift the “most popular and fastest” cloud data warehouse. It is fully-managed, and scalable to petabytes of data for storage and analysis. You can use Redshift to analyze your data using SQL and business intelligence tools. It features:
- Integration with data lakes and other AWS services that allows you to query data and write data back to data lake in open formats
- High performance – fast query performance, columnar storage, data compression, and zone maps.
- Petabyte scale – “virtually unlimited” according to AWS, with scalable number and type of nodes, with limitless concurrency.
There are three node types, two current and one older:
- RA3 nodes with managed storage – scale and pay for compute and managed storage independently. You choose the number of nodes based on performance requirements and pay only for the storage you used. Managed storage uses SSDs in each RA3 nodes for local storage, and Amazon S3 for longer-term storage. If the data in your node exceeds the local SSDs, data will be automatically offloaded to S3.
- DC2 nodes – these compute-intensive data warehouses include local SSD storage. You choose the number of nodes based on data size and performance requirements. Data is stored locally, and as it grows, you can add more compute nodes. AWS recommends this for datasets under 1TB uncompressed, otherwise, use RA3 for the S3 offloading capability to keep costs down.
- DS2 nodes – these use HDDs, and AWS recommends you use RA3 instead.
Where did the name come from? In astronomy, “redshift” refers to the lengthening of electromagnetic radiation toward longer wavelengths as an object moves away from the observer – the light equivalent of the change in an ambulance siren pitch as it passes you, collectively known as the Doppler Effect. Or, if you’re into gossip, it’s a thumb to the nose at “big red” Oracle.
AWS Redshift Pricing Structure
So, how much does Amazon Redshift cost? Like EC2 and other services, the core cost is on-demand by the hour, based on the type and number of nodes in your cluster.
Core On-Demand Pricing
- RA3 – as of this writing, prices range from $3.26 per hour for an ra3.4xlarge in US East (N. Virginia) to $5.195 for the same type in Sao Paulo. Price increases linearly with size, with the ra3.16xlarge costing 4 times the ra3.4xlarge.
- Data stored in managed storage is billed separately based on actual data stored in the RA3 node types, at the same rate whether your data is in SSDs or S3.
- DC2 – the dc2.large currently costs $0.25 per hour in US East (N. Virginia) up to $0.40 in Sao Paulo.
Note: We’ve omitted DS2 as those are no longer recommended.
Pricing for Additional Capabilities
Amazon Redshift Spectrum Pricing – Redshift Spectrum allows you to run SQL queries directly against Amazon S3 data. You are charged for the number of bytes scanned by Spectrum, rounded up by megabyte, with a 10MB minimum per query. Compressed and columnar data will keep costs down. Current pricing is $5 per terabyte of data scanned.
Concurrency Scaling Pricing – you can accumulate one hour of concurrency scaling cluster credits every 24 hours while your main cluster is running. Past that, you will be charged the per-second on-demand rate
Redshift Managed Storage Pricing – managed storage for RA3 node types is at a fixed GB-month rate for the region you are using. It is calculated per hour based on total data present in the managed storage.
8 Things to Keep in Mind
Of course, there’s always more to know.
- Free trial – for your first Redshift cluster, you can get a two month free trial of a DC2.Large node. It’s actually a trial of 750 hours per month, so if you use more than 160GB of compressed SSD storage or run multiple nodes, you will exhaust that in less than one month, at which point you’ll be charged the on-demand rate unless you spin down the cluster. This is separate from the AWS Free Tier.
- Reserved Instances are available – by paying upfront, you can pay less overall – 3% to 63% depending on the payment options you choose. But should you use them?
- Billed per-second – for partial hours, you will only be billed at a per-second rate. Surprisingly, this was only released in February of this year.
- You can pause – you can pause and resume to suspend billing, but you will still pay for backup storage while a cluster is paused.
- Redshift Spectrum pricing does not include costs for requests made against your S3 buckets – see S3 pricing for those rates.
- Redshift managed storage pricing does not include backup storage due to snapshots, and once the cluster is terminated, you will still be charged for your backups. Don’t let these get orphaned!
- Data transfer – there is no charge for data transferred between Amazon Redshift and Amazon S3 within the same region for backup, restore, load, and unload operations – but for all other data transfers in and out, you will be billed at standard data transfer rates.
- RA3 is not available in all regions
Keep Your AWS Redshift Costs In Control
There are a few things you can do to optimize your AWS Redshift costs:
- Use Reserved Instances where you have predictable needs, and where the savings over on-demand is high enough
- Delete orphaned snapshots – like all backups, ensure that you are managing your snapshots and deleting when clusters are deleted
- Schedule on/off times – for Redshift clusters used for development, testing, staging, and other purposes not needed 24×7, make sure you schedule them to turn off when not needed – now possible with the announcement from last month that Redshift clusters can now be paused. Automated scheduling coming soon in ParkMyCloud!