Redshift 101

Understanding the Basics & Avoiding Common Mistakes
Presented by: Michael Krouze, CTO & VP Analytics, Charter Solutions, Inc.
Redshift 101

Charter Solutions’ Partnerships
2

What is Amazon Redshift?
3
Amazon Redshift is a cloud hosted,
fast, fully-managed, petabyte-
scale data warehouse.

Distributed rather than single node
4
vs.

Columnar rather than row-based
5

Enough intro, on to the meat of the presentation

7
Pick the right node
type for your cluster

Redshift Node Options
8
dc1.large: 15 GB RAM, 2 cores, 2 slices,
160 GB SSD, 5.12 TB max/cluster
dc1.8xlarge: 244 GB RAM, 32 cores, 32
slices, 2.56 TB SSD, 326 TB max/cluster
dS2.xlarge: 15 GB RAM, 4 cores, 2 slices, 2
TB HDD, 64 TB max/cluster
ds2.8xlarge: 244 GB RAM, 36 cores, 16
slices, 16 TB SSD, 2 PB max/cluster
DenseComputeDenseStorage
¨ Geared to high performance
¨ SSD Storage (326 TB max)
¨ ~ 95 GB member per TB of storage
¨ Starts at $0.25/hr
¨ Geared to large data sets
¨ HDD Storage (2PB max)
¨ ~ 15 GB memory per TB of storage
¨ Starts at $0.85/hr

9
Understand and use
sort keys properly

Zone Maps
Read
Min: 5
Max 45
Read
Min: 9
Max: 32
Min: 30
Max: 42
Read
Min: 22
Max : 80
Read
Min: 18
Max: 50
10
Min: 1
Max 10
Read
Min: 11
Max: 25
Min: 26
Max: 40
Min: 41
Max : 55
Min: 56
Max: 95
Select count(*) from customers where age = 24
Unsorted Sorted

Sort Key Options
11
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM

Sort Key Options
12
Compound Sort Key • Table is sorted by 1st column , then 2nd column etc.
• Queries that use 1st column as primary filter, then other columnss
• Can speed up joins and group bys
• Slower to VACUUM

Sort Key Options
13
Compound Sort Key • Table is sorted by 1st column , then 2nd column etc.
• Queries that use 1st column as primary filter, then other columnss
• Can speed up joins and group bys
• Slower to VACUUM
Interleaved Sort Key • Equal weight is given to each column
• Queries that use different columns in filter
• Queries get fasterthe more columns used in the filter (up to 8)
• Slowest to VACUUM
• More effective with large tables (> 100M+ rows)

14
Understand and use
distribution styles and
keys properly

Distribution Style Options
15
All
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
All data on every node
Key
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Same key to same location
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Even
Round robin distribution
• Tables with no joins or
group-bys
• Small dimension tables
(<1000 rows)
• Medium dimension
tables (1K – 2M)
• Large fact tables
• Large dimension tables

16
Primary keys and
foreign keys don’t
work the way you
think

How are they different?
17
u Primary and foreign key constraints are not enforced by Redshift
u Indexes are not created (only sort keys exist for indexing)
u They do help with query plan optimization though

Redshift Compression
19
u Each column can be compressed with most appropriatealgorithm for content
u Many algorithms supported
u Raw encoding, Byte-dictionary, Delta encoding, Mostly encoding, Runlength encoding, Text encoding,
LZO encoding
u Average of 2-4x compression rates are common
u Can cut query time as much as 50%
u Use analyze compression to get recommendations

20
Vacuum and analyze
regularly

Addition of new rows create unsorted regions
21

Vacuum reclaims space and re-sorts tables
22

Vacuum
23
u 4 modes:
u FULL – Reclaims space and re-sorts
u DELETE ONLY – Reclaims space but does not re-sort
u SORT ONLY – Re-sorts but does not reclaim space
u REINDEX – Used for INTERLEAVED sort keys. Re-Analyzes sort keys and then runs FULL VACUUM
u Vacuum is I/O intensive and can take time to run
u Run regularly to minimize impact

Analyze
24
u Updates statistics used by the query planner
u Run regularly to keep statistics up to date
u Especially after large data loads

25
Monitor and tune
workload management

Workload Management
26
u Workload management is about creating queues for different workloads
User Group A
Short-running queueLong-running queue
Short
Query Group
Long
Query Group

u Contact me:
u michael.krouze@chartersolutions.com
u @mjkrouze
u Resources:
u www.chartersolutions.com
u github.com/awslabs/amazon-redshift-utils
u AWS YouTube channel
u AWS on SlideShare

Redshift 101

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Redshift 101

Similar to Redshift 101 (20)

Recently uploaded

Recently uploaded (20)

Redshift 101