Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum

© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Paul Macey
Solutions Architect – Big Data, Amazon Web Services
Modernise your Data Warehouse
with Amazon Redshift and Amazon Redshift
Spectrum

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why Modernise?
Performance Scalability Cost

Introducing
Amazon Redshift

Load
Unload
Query
Backup
Restore
Amazon Redshift Architecture
Massively parallel, shared nothing
columnar architecture
Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL
processing
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, unload, backup, restore
Amazon Redshift Spectrum nodes
• Execute queries directly against
Amazon Simple Storage Service
(Amazon S3)
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
Amazon S3
...
1 2 3 4 N
Amazon
Redshift
Spectrum

Amazon Redshift + Spectrum
Performance at EB Scale
Fast Queries
Elastic and Highly Available
Elastic
On-demand, pay-per-query
Cost Effective
Multiple clusters access
same data
High Concurrency
Query data in-place using
open file formats
No ETL
Full Amazon Redshift SQL
Support
Standardised

Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…
Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
1

Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Data Catalog
Apache Hive
Query is optimised and compiled at
the leader node. Determine what
gets run locally and what goes to
Amazon Redshift Spectrum
2

Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Data Catalog
Apache Hive
Query plan is sent to
all compute nodes
3

Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Compute nodes obtain partition info
from Data Catalog; dynamically prune
partitions
4
Data Catalog
Apache Hive

Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Each compute node issues
multiple requests to the Amazon
Redshift Spectrum layer
5
Data Catalog
Apache Hive

Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Amazon Redshift Spectrum nodes
scan your S3 data
6
Data Catalog
Apache Hive

Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
7
Amazon Redshift
Spectrum projects,
filters, and
aggregates
Data Catalog
Apache Hive

Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Final aggregations and joins
with local Amazon Redshift
tables done in-cluster
8
Data Catalog
Apache Hive

Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Result is sent back to client9
Data Catalog
Apache Hive

Amazon Redshift Spectrum Is Fast
• Leverages Amazon Redshift’s advanced cost-based optimiser
• Pushes down projections, filters, aggregations and join reduction
• Dynamic partition pruning to minimise data processed
• Automatic parallelisation of query execution against S3 data
• Efficient join processing within the Amazon Redshift cluster

Amazon Redshift Spectrum Is Cost-effective
• You pay for your Amazon Redshift cluster plus $5 per TB scanned from
S3
• Each query can leverage 1000s of Amazon Redshift Spectrum nodes
• You can reduce the TB scanned and improve query performance by:
• Partitioning data
• Using a columnar file format
• Compressing data

Amazon Redshift Spectrum Uses Standard SQL
• Spectrum seamlessly integrates with your existing SQL & BI apps
• Support for complex joins, nested queries & window functions
• Support for data partitioned in S3 by any key
• Date, Time and any other custom keys
• e.g., Year, Month, Day, Hour

Demo: Modern Data Architecture
Amazon Redshift + Spectrum

Demo Architecture
Amazon Redshift
+ Spectrum
Amazon S3
• Database Tables & Views
• Traditional Star Schema
• Tickit Sample Database
• Data for the past 5 years
• S3 Bucket
• Multiple Folders (1 for each Tickit table)
• Multiple data files
• Data for the past 25 years

Next Steps
• Amazon Redshift Spectrum Getting
Started Guide
• Amazon Redshift Documentation
• And much, much more!
Tap your badge for additional resources:

Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum

Similar to Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum