SlideShare a Scribd company logo
1 of 28
Download to read offline
Understanding the Basics & Avoiding Common Mistakes
Presented by: Michael Krouze, CTO & VP Analytics, Charter Solutions, Inc.
Redshift 101
Charter Solutions’ Partnerships
2
What is Amazon Redshift?
3
Amazon Redshift is a cloud hosted,
fast, fully-managed, petabyte-
scale data warehouse.
Distributed rather than single node
4
vs.
Columnar rather than row-based
5
Enough intro, on to the meat of the presentation
7
Pick the right node
type for your cluster
Redshift Node Options
8
dc1.large: 15 GB RAM, 2 cores, 2 slices,
160 GB SSD, 5.12 TB max/cluster
dc1.8xlarge: 244 GB RAM, 32 cores, 32
slices, 2.56 TB SSD, 326 TB max/cluster
dS2.xlarge: 15 GB RAM, 4 cores, 2 slices, 2
TB HDD, 64 TB max/cluster
ds2.8xlarge: 244 GB RAM, 36 cores, 16
slices, 16 TB SSD, 2 PB max/cluster
DenseComputeDenseStorage
¨ Geared to high performance
¨ SSD Storage (326 TB max)
¨ ~ 95 GB member per TB of storage
¨ Starts at $0.25/hr
¨ Geared to large data sets
¨ HDD Storage (2PB max)
¨ ~ 15 GB memory per TB of storage
¨ Starts at $0.85/hr
9
Understand and use
sort keys properly
Zone Maps
Read
Min: 5
Max 45
Read
Min: 9
Max: 32
Min: 30
Max: 42
Read
Min: 22
Max : 80
Read
Min: 18
Max: 50
10
Min: 1
Max 10
Read
Min: 11
Max: 25
Min: 26
Max: 40
Min: 41
Max : 55
Min: 56
Max: 95
Select count(*) from customers where age = 24
Unsorted Sorted
Sort Key Options
11
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Sort Key Options
12
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Compound Sort Key • Table is sorted by 1st column , then 2nd column etc.
• Queries that use 1st column as primary filter, then other columnss
• Can speed up joins and group bys
• Slower to VACUUM
Sort Key Options
13
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Compound Sort Key • Table is sorted by 1st column , then 2nd column etc.
• Queries that use 1st column as primary filter, then other columnss
• Can speed up joins and group bys
• Slower to VACUUM
Interleaved Sort Key • Equal weight is given to each column
• Queries that use different columns in filter
• Queries get fasterthe more columns used in the filter (up to 8)
• Slowest to VACUUM
• More effective with large tables (> 100M+ rows)
14
Understand and use
distribution styles and
keys properly
Distribution Style Options
15
All
Node  1
Slice  
1
Slice  
2
Node  2
Slice  
3
Slice  
4
All   data  on  every  node
Key
Node  1
Slice  
1
Slice  
2
Node  2
Slice  
3
Slice  
4
Same   key  to  same  location
Node  1
Slice  
1
Slice  
2
Node  2
Slice  
3
Slice  
4
Even
Round  robin  distribution
• Tables with no joins or
group-bys
• Small dimension tables
(<1000 rows)
• Medium dimension
tables (1K – 2M)
• Large fact tables
• Large dimension tables
16
Primary keys and
foreign keys don’t
work the way you
think
How are they different?
17
u Primary and foreign key constraints are not enforced by Redshift
u Indexes are not created (only sort keys exist for indexing)
u They do help with query plan optimization though
18
Compress your
columns
Redshift Compression
19
u Each column can be compressed with most appropriatealgorithm for content
u Many algorithms supported
u Raw encoding, Byte-dictionary, Delta encoding, Mostly encoding, Runlength encoding, Text encoding,
LZO encoding
u Average of 2-4x compression rates are common
u Can cut query time as much as 50%
u Use analyze  compression to get recommendations
20
Vacuum and analyze
regularly
Addition of new rows create unsorted regions
21
Vacuum reclaims space and re-sorts tables
22
Vacuum
23
u 4 modes:
u FULL – Reclaims space and re-sorts
u DELETE ONLY – Reclaims space but does not re-sort
u SORT ONLY – Re-sorts but does not reclaim space
u REINDEX – Used for INTERLEAVED sort keys. Re-Analyzes sort keys and then runs FULL VACUUM
u Vacuum is I/O intensive and can take time to run
u Run regularly to minimize impact
Analyze
24
u Updates statistics used by the query planner
u Run regularly to keep statistics up to date
u Especially after large data loads
25
Monitor and tune
workload management
Workload Management
26
u Workload management is about creating queues for different workloads
User Group A
Short-running queueLong-running queue
Short
Query Group
Long
Query Group
Thank you!
u Contact me:
u michael.krouze@chartersolutions.com
u @mjkrouze
u Resources:
u www.chartersolutions.com
u github.com/awslabs/amazon-redshift-utils
u AWS YouTube channel
u AWS on SlideShare

More Related Content

Viewers also liked

Redshift Introduction
Redshift IntroductionRedshift Introduction
Redshift IntroductionDataKitchen
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
RFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the timeRFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the timeOlle E Johansson
 
syENGAGE Company Profile
syENGAGE Company ProfilesyENGAGE Company Profile
syENGAGE Company ProfileSimon Young
 
Time management, Portent-style
Time management, Portent-styleTime management, Portent-style
Time management, Portent-styleIan Lurie
 
Introduction And Graphs
Introduction And GraphsIntroduction And Graphs
Introduction And GraphsZia Khan
 
Using Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the CloudUsing Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the CloudAndrew Kennedy
 
Copiade Vuelode Gansos
Copiade Vuelode GansosCopiade Vuelode Gansos
Copiade Vuelode Gansosjoanvinpa
 
Proxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos SalgueiriñosProxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos Salgueiriñosbng.compostela
 
Day 3 2nd_weekcris
Day 3 2nd_weekcrisDay 3 2nd_weekcris
Day 3 2nd_weekcriscristiarnau
 
Creating a Culture around Social Media
Creating a Culture around Social MediaCreating a Culture around Social Media
Creating a Culture around Social MediaSimon Young
 
Video Game Collection @ Your Library
Video Game Collection @ Your LibraryVideo Game Collection @ Your Library
Video Game Collection @ Your LibraryMaggie Hommel Thomann
 
כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?elijacobs
 
Het Spel Van De Wereld
Het Spel Van De WereldHet Spel Van De Wereld
Het Spel Van De WereldyentelB
 
SYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCHSYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCHZia Khan
 

Viewers also liked (20)

Redshift Introduction
Redshift IntroductionRedshift Introduction
Redshift Introduction
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Design Pattern
Design PatternDesign Pattern
Design Pattern
 
RFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the timeRFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the time
 
syENGAGE Company Profile
syENGAGE Company ProfilesyENGAGE Company Profile
syENGAGE Company Profile
 
Time management, Portent-style
Time management, Portent-styleTime management, Portent-style
Time management, Portent-style
 
Introduction And Graphs
Introduction And GraphsIntroduction And Graphs
Introduction And Graphs
 
Using Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the CloudUsing Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the Cloud
 
Copiade Vuelode Gansos
Copiade Vuelode GansosCopiade Vuelode Gansos
Copiade Vuelode Gansos
 
Proxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos SalgueiriñosProxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos Salgueiriños
 
Day 3 2nd_weekcris
Day 3 2nd_weekcrisDay 3 2nd_weekcris
Day 3 2nd_weekcris
 
Creating a Culture around Social Media
Creating a Culture around Social MediaCreating a Culture around Social Media
Creating a Culture around Social Media
 
Video Game Collection @ Your Library
Video Game Collection @ Your LibraryVideo Game Collection @ Your Library
Video Game Collection @ Your Library
 
כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?
 
Het Spel Van De Wereld
Het Spel Van De WereldHet Spel Van De Wereld
Het Spel Van De Wereld
 
SYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCHSYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCH
 

Similar to Redshift 101

AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6Mohd Harris Ahmad Jaal
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Lviv Startup Club
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisOfer Zelig
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataAmazon Web Services
 
Indy pass writing efficient queries – part 1 - indexing
Indy pass   writing efficient queries – part 1 - indexingIndy pass   writing efficient queries – part 1 - indexing
Indy pass writing efficient queries – part 1 - indexingeddiew
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)Shy Engelberg
 

Similar to Redshift 101 (20)

AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Indy pass writing efficient queries – part 1 - indexing
Indy pass   writing efficient queries – part 1 - indexingIndy pass   writing efficient queries – part 1 - indexing
Indy pass writing efficient queries – part 1 - indexing
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 

Recently uploaded

The Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data EcosystemThe Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data EcosystemSafe Software
 
Book industry state of the nation 2024 - Tech Forum 2024
Book industry state of the nation 2024 - Tech Forum 2024Book industry state of the nation 2024 - Tech Forum 2024
Book industry state of the nation 2024 - Tech Forum 2024BookNet Canada
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶HostedbyConfluent
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
Introduction to Cybersecurity | IIT(BHU)CyberSec
Introduction to Cybersecurity | IIT(BHU)CyberSecIntroduction to Cybersecurity | IIT(BHU)CyberSec
Introduction to Cybersecurity | IIT(BHU)CyberSecYashSomalkar
 
Brick-by-Brick: Exploring the Elements of Apache Kafka®
Brick-by-Brick: Exploring the Elements of Apache Kafka®Brick-by-Brick: Exploring the Elements of Apache Kafka®
Brick-by-Brick: Exploring the Elements of Apache Kafka®HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
Tecnogravura, Cylinder Engraving for Rotogravure
Tecnogravura, Cylinder Engraving for RotogravureTecnogravura, Cylinder Engraving for Rotogravure
Tecnogravura, Cylinder Engraving for RotogravureAntonio de Llamas
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Mastering Kafka Consumer Distribution: A Guide to Efficient Scaling and Resou...
Mastering Kafka Consumer Distribution: A Guide to Efficient Scaling and Resou...Mastering Kafka Consumer Distribution: A Guide to Efficient Scaling and Resou...
Mastering Kafka Consumer Distribution: A Guide to Efficient Scaling and Resou...HostedbyConfluent
 
Error Handling with Kafka: From Patterns to Code
Error Handling with Kafka: From Patterns to CodeError Handling with Kafka: From Patterns to Code
Error Handling with Kafka: From Patterns to CodeHostedbyConfluent
 
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
How to Build an Event-based Control Center for the Electrical Grid
How to Build an Event-based Control Center for the Electrical GridHow to Build an Event-based Control Center for the Electrical Grid
How to Build an Event-based Control Center for the Electrical GridHostedbyConfluent
 
Women in Automation 2024: Technical session - Get your career started in auto...
Women in Automation 2024: Technical session - Get your career started in auto...Women in Automation 2024: Technical session - Get your career started in auto...
Women in Automation 2024: Technical session - Get your career started in auto...DianaGray10
 
#SCIT 2024 LatAm Delegation Overview + SPONSORSHIP.pdf
#SCIT 2024 LatAm Delegation Overview + SPONSORSHIP.pdf#SCIT 2024 LatAm Delegation Overview + SPONSORSHIP.pdf
#SCIT 2024 LatAm Delegation Overview + SPONSORSHIP.pdfREFASHIOND
 
BODYPACK DIGITAL TECHNOLOGY STACK - 2024
BODYPACK DIGITAL TECHNOLOGY STACK - 2024BODYPACK DIGITAL TECHNOLOGY STACK - 2024
BODYPACK DIGITAL TECHNOLOGY STACK - 2024Andri H.
 
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdf
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdfWeb Development Solutions 2024 A Beginner's Comprehensive Handbook.pdf
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdfSeasia Infotech
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 

Recently uploaded (20)

The Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data EcosystemThe Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data Ecosystem
 
Book industry state of the nation 2024 - Tech Forum 2024
Book industry state of the nation 2024 - Tech Forum 2024Book industry state of the nation 2024 - Tech Forum 2024
Book industry state of the nation 2024 - Tech Forum 2024
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
Introduction to Cybersecurity | IIT(BHU)CyberSec
Introduction to Cybersecurity | IIT(BHU)CyberSecIntroduction to Cybersecurity | IIT(BHU)CyberSec
Introduction to Cybersecurity | IIT(BHU)CyberSec
 
Brick-by-Brick: Exploring the Elements of Apache Kafka®
Brick-by-Brick: Exploring the Elements of Apache Kafka®Brick-by-Brick: Exploring the Elements of Apache Kafka®
Brick-by-Brick: Exploring the Elements of Apache Kafka®
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
Tecnogravura, Cylinder Engraving for Rotogravure
Tecnogravura, Cylinder Engraving for RotogravureTecnogravura, Cylinder Engraving for Rotogravure
Tecnogravura, Cylinder Engraving for Rotogravure
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Mastering Kafka Consumer Distribution: A Guide to Efficient Scaling and Resou...
Mastering Kafka Consumer Distribution: A Guide to Efficient Scaling and Resou...Mastering Kafka Consumer Distribution: A Guide to Efficient Scaling and Resou...
Mastering Kafka Consumer Distribution: A Guide to Efficient Scaling and Resou...
 
Error Handling with Kafka: From Patterns to Code
Error Handling with Kafka: From Patterns to CodeError Handling with Kafka: From Patterns to Code
Error Handling with Kafka: From Patterns to Code
 
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
How to Build an Event-based Control Center for the Electrical Grid
How to Build an Event-based Control Center for the Electrical GridHow to Build an Event-based Control Center for the Electrical Grid
How to Build an Event-based Control Center for the Electrical Grid
 
Women in Automation 2024: Technical session - Get your career started in auto...
Women in Automation 2024: Technical session - Get your career started in auto...Women in Automation 2024: Technical session - Get your career started in auto...
Women in Automation 2024: Technical session - Get your career started in auto...
 
#SCIT 2024 LatAm Delegation Overview + SPONSORSHIP.pdf
#SCIT 2024 LatAm Delegation Overview + SPONSORSHIP.pdf#SCIT 2024 LatAm Delegation Overview + SPONSORSHIP.pdf
#SCIT 2024 LatAm Delegation Overview + SPONSORSHIP.pdf
 
BODYPACK DIGITAL TECHNOLOGY STACK - 2024
BODYPACK DIGITAL TECHNOLOGY STACK - 2024BODYPACK DIGITAL TECHNOLOGY STACK - 2024
BODYPACK DIGITAL TECHNOLOGY STACK - 2024
 
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdf
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdfWeb Development Solutions 2024 A Beginner's Comprehensive Handbook.pdf
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdf
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 

Redshift 101

  • 1. Understanding the Basics & Avoiding Common Mistakes Presented by: Michael Krouze, CTO & VP Analytics, Charter Solutions, Inc. Redshift 101
  • 3. What is Amazon Redshift? 3 Amazon Redshift is a cloud hosted, fast, fully-managed, petabyte- scale data warehouse.
  • 4. Distributed rather than single node 4 vs.
  • 5. Columnar rather than row-based 5
  • 6. Enough intro, on to the meat of the presentation
  • 7. 7 Pick the right node type for your cluster
  • 8. Redshift Node Options 8 dc1.large: 15 GB RAM, 2 cores, 2 slices, 160 GB SSD, 5.12 TB max/cluster dc1.8xlarge: 244 GB RAM, 32 cores, 32 slices, 2.56 TB SSD, 326 TB max/cluster dS2.xlarge: 15 GB RAM, 4 cores, 2 slices, 2 TB HDD, 64 TB max/cluster ds2.8xlarge: 244 GB RAM, 36 cores, 16 slices, 16 TB SSD, 2 PB max/cluster DenseComputeDenseStorage ¨ Geared to high performance ¨ SSD Storage (326 TB max) ¨ ~ 95 GB member per TB of storage ¨ Starts at $0.25/hr ¨ Geared to large data sets ¨ HDD Storage (2PB max) ¨ ~ 15 GB memory per TB of storage ¨ Starts at $0.85/hr
  • 10. Zone Maps Read Min: 5 Max 45 Read Min: 9 Max: 32 Min: 30 Max: 42 Read Min: 22 Max : 80 Read Min: 18 Max: 50 10 Min: 1 Max 10 Read Min: 11 Max: 25 Min: 26 Max: 40 Min: 41 Max : 55 Min: 56 Max: 95 Select count(*) from customers where age = 24 Unsorted Sorted
  • 11. Sort Key Options 11 Single Column Sort Key • Table is sorted by 1 column • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group-bys • Quickest to VACUUM
  • 12. Sort Key Options 12 Single Column Sort Key • Table is sorted by 1 column • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group-bys • Quickest to VACUUM Compound Sort Key • Table is sorted by 1st column , then 2nd column etc. • Queries that use 1st column as primary filter, then other columnss • Can speed up joins and group bys • Slower to VACUUM
  • 13. Sort Key Options 13 Single Column Sort Key • Table is sorted by 1 column • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group-bys • Quickest to VACUUM Compound Sort Key • Table is sorted by 1st column , then 2nd column etc. • Queries that use 1st column as primary filter, then other columnss • Can speed up joins and group bys • Slower to VACUUM Interleaved Sort Key • Equal weight is given to each column • Queries that use different columns in filter • Queries get fasterthe more columns used in the filter (up to 8) • Slowest to VACUUM • More effective with large tables (> 100M+ rows)
  • 14. 14 Understand and use distribution styles and keys properly
  • 15. Distribution Style Options 15 All Node  1 Slice   1 Slice   2 Node  2 Slice   3 Slice   4 All   data  on  every  node Key Node  1 Slice   1 Slice   2 Node  2 Slice   3 Slice   4 Same   key  to  same  location Node  1 Slice   1 Slice   2 Node  2 Slice   3 Slice   4 Even Round  robin  distribution • Tables with no joins or group-bys • Small dimension tables (<1000 rows) • Medium dimension tables (1K – 2M) • Large fact tables • Large dimension tables
  • 16. 16 Primary keys and foreign keys don’t work the way you think
  • 17. How are they different? 17 u Primary and foreign key constraints are not enforced by Redshift u Indexes are not created (only sort keys exist for indexing) u They do help with query plan optimization though
  • 19. Redshift Compression 19 u Each column can be compressed with most appropriatealgorithm for content u Many algorithms supported u Raw encoding, Byte-dictionary, Delta encoding, Mostly encoding, Runlength encoding, Text encoding, LZO encoding u Average of 2-4x compression rates are common u Can cut query time as much as 50% u Use analyze  compression to get recommendations
  • 21. Addition of new rows create unsorted regions 21
  • 22. Vacuum reclaims space and re-sorts tables 22
  • 23. Vacuum 23 u 4 modes: u FULL – Reclaims space and re-sorts u DELETE ONLY – Reclaims space but does not re-sort u SORT ONLY – Re-sorts but does not reclaim space u REINDEX – Used for INTERLEAVED sort keys. Re-Analyzes sort keys and then runs FULL VACUUM u Vacuum is I/O intensive and can take time to run u Run regularly to minimize impact
  • 24. Analyze 24 u Updates statistics used by the query planner u Run regularly to keep statistics up to date u Especially after large data loads
  • 26. Workload Management 26 u Workload management is about creating queues for different workloads User Group A Short-running queueLong-running queue Short Query Group Long Query Group
  • 28. u Contact me: u michael.krouze@chartersolutions.com u @mjkrouze u Resources: u www.chartersolutions.com u github.com/awslabs/amazon-redshift-utils u AWS YouTube channel u AWS on SlideShare