SlideShare a Scribd company logo
Understanding the Basics & Avoiding Common Mistakes
Presented by: Michael Krouze, CTO & VP Analytics, Charter Solutions, Inc.
Redshift 101
Charter Solutions’ Partnerships
2
What is Amazon Redshift?
3
Amazon Redshift is a cloud hosted,
fast, fully-managed, petabyte-
scale data warehouse.
Distributed rather than single node
4
vs.
Columnar rather than row-based
5
Enough intro, on to the meat of the presentation
7
Pick the right node
type for your cluster
Redshift Node Options
8
dc1.large: 15 GB RAM, 2 cores, 2 slices,
160 GB SSD, 5.12 TB max/cluster
dc1.8xlarge: 244 GB RAM, 32 cores, 32
slices, 2.56 TB SSD, 326 TB max/cluster
dS2.xlarge: 15 GB RAM, 4 cores, 2 slices, 2
TB HDD, 64 TB max/cluster
ds2.8xlarge: 244 GB RAM, 36 cores, 16
slices, 16 TB SSD, 2 PB max/cluster
DenseComputeDenseStorage
¨ Geared to high performance
¨ SSD Storage (326 TB max)
¨ ~ 95 GB member per TB of storage
¨ Starts at $0.25/hr
¨ Geared to large data sets
¨ HDD Storage (2PB max)
¨ ~ 15 GB memory per TB of storage
¨ Starts at $0.85/hr
9
Understand and use
sort keys properly
Zone Maps
Read
Min: 5
Max 45
Read
Min: 9
Max: 32
Min: 30
Max: 42
Read
Min: 22
Max : 80
Read
Min: 18
Max: 50
10
Min: 1
Max 10
Read
Min: 11
Max: 25
Min: 26
Max: 40
Min: 41
Max : 55
Min: 56
Max: 95
Select count(*) from customers where age = 24
Unsorted Sorted
Sort Key Options
11
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Sort Key Options
12
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Compound Sort Key • Table is sorted by 1st column , then 2nd column etc.
• Queries that use 1st column as primary filter, then other columnss
• Can speed up joins and group bys
• Slower to VACUUM
Sort Key Options
13
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Compound Sort Key • Table is sorted by 1st column , then 2nd column etc.
• Queries that use 1st column as primary filter, then other columnss
• Can speed up joins and group bys
• Slower to VACUUM
Interleaved Sort Key • Equal weight is given to each column
• Queries that use different columns in filter
• Queries get fasterthe more columns used in the filter (up to 8)
• Slowest to VACUUM
• More effective with large tables (> 100M+ rows)
14
Understand and use
distribution styles and
keys properly
Distribution Style Options
15
All
Node  1
Slice  
1
Slice  
2
Node  2
Slice  
3
Slice  
4
All   data  on  every  node
Key
Node  1
Slice  
1
Slice  
2
Node  2
Slice  
3
Slice  
4
Same   key  to  same  location
Node  1
Slice  
1
Slice  
2
Node  2
Slice  
3
Slice  
4
Even
Round  robin  distribution
• Tables with no joins or
group-bys
• Small dimension tables
(<1000 rows)
• Medium dimension
tables (1K – 2M)
• Large fact tables
• Large dimension tables
16
Primary keys and
foreign keys don’t
work the way you
think
How are they different?
17
u Primary and foreign key constraints are not enforced by Redshift
u Indexes are not created (only sort keys exist for indexing)
u They do help with query plan optimization though
18
Compress your
columns
Redshift Compression
19
u Each column can be compressed with most appropriatealgorithm for content
u Many algorithms supported
u Raw encoding, Byte-dictionary, Delta encoding, Mostly encoding, Runlength encoding, Text encoding,
LZO encoding
u Average of 2-4x compression rates are common
u Can cut query time as much as 50%
u Use analyze  compression to get recommendations
20
Vacuum and analyze
regularly
Addition of new rows create unsorted regions
21
Vacuum reclaims space and re-sorts tables
22
Vacuum
23
u 4 modes:
u FULL – Reclaims space and re-sorts
u DELETE ONLY – Reclaims space but does not re-sort
u SORT ONLY – Re-sorts but does not reclaim space
u REINDEX – Used for INTERLEAVED sort keys. Re-Analyzes sort keys and then runs FULL VACUUM
u Vacuum is I/O intensive and can take time to run
u Run regularly to minimize impact
Analyze
24
u Updates statistics used by the query planner
u Run regularly to keep statistics up to date
u Especially after large data loads
25
Monitor and tune
workload management
Workload Management
26
u Workload management is about creating queues for different workloads
User Group A
Short-running queueLong-running queue
Short
Query Group
Long
Query Group
Thank you!
u Contact me:
u michael.krouze@chartersolutions.com
u @mjkrouze
u Resources:
u www.chartersolutions.com
u github.com/awslabs/amazon-redshift-utils
u AWS YouTube channel
u AWS on SlideShare

More Related Content

Viewers also liked

Redshift Introduction
Redshift IntroductionRedshift Introduction
Redshift Introduction
DataKitchen
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
Amazon Web Services
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
Amazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Amazon Web Services
 
RFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the timeRFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the time
Olle E Johansson
 
syENGAGE Company Profile
syENGAGE Company ProfilesyENGAGE Company Profile
syENGAGE Company ProfileSimon Young
 
Time management, Portent-style
Time management, Portent-styleTime management, Portent-style
Time management, Portent-style
Ian Lurie
 
Introduction And Graphs
Introduction And GraphsIntroduction And Graphs
Introduction And Graphs
Zia Khan
 
Using Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the CloudUsing Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the Cloud
Andrew Kennedy
 
Copiade Vuelode Gansos
Copiade Vuelode GansosCopiade Vuelode Gansos
Copiade Vuelode Gansosjoanvinpa
 
Proxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos SalgueiriñosProxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos Salgueiriñosbng.compostela
 
Day 3 2nd_weekcris
Day 3 2nd_weekcrisDay 3 2nd_weekcris
Day 3 2nd_weekcriscristiarnau
 
Creating a Culture around Social Media
Creating a Culture around Social MediaCreating a Culture around Social Media
Creating a Culture around Social Media
Simon Young
 
Video Game Collection @ Your Library
Video Game Collection @ Your LibraryVideo Game Collection @ Your Library
Video Game Collection @ Your Library
Maggie Hommel Thomann
 
כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?
elijacobs
 
Het Spel Van De Wereld
Het Spel Van De WereldHet Spel Van De Wereld
Het Spel Van De WereldyentelB
 
SYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCHSYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCHZia Khan
 

Viewers also liked (20)

Redshift Introduction
Redshift IntroductionRedshift Introduction
Redshift Introduction
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Design Pattern
Design PatternDesign Pattern
Design Pattern
 
RFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the timeRFC 7435 - Opportunistic security - Some protection most of the time
RFC 7435 - Opportunistic security - Some protection most of the time
 
syENGAGE Company Profile
syENGAGE Company ProfilesyENGAGE Company Profile
syENGAGE Company Profile
 
Time management, Portent-style
Time management, Portent-styleTime management, Portent-style
Time management, Portent-style
 
Introduction And Graphs
Introduction And GraphsIntroduction And Graphs
Introduction And Graphs
 
Using Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the CloudUsing Clocker with Project Calico - Running Production Workloads in the Cloud
Using Clocker with Project Calico - Running Production Workloads in the Cloud
 
Copiade Vuelode Gansos
Copiade Vuelode GansosCopiade Vuelode Gansos
Copiade Vuelode Gansos
 
Proxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos SalgueiriñosProxecto de recuperación do río Corgo nos Salgueiriños
Proxecto de recuperación do río Corgo nos Salgueiriños
 
Day 3 2nd_weekcris
Day 3 2nd_weekcrisDay 3 2nd_weekcris
Day 3 2nd_weekcris
 
Creating a Culture around Social Media
Creating a Culture around Social MediaCreating a Culture around Social Media
Creating a Culture around Social Media
 
Video Game Collection @ Your Library
Video Game Collection @ Your LibraryVideo Game Collection @ Your Library
Video Game Collection @ Your Library
 
כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?כיצד מראיינים כתב טכני טוב?
כיצד מראיינים כתב טכני טוב?
 
Het Spel Van De Wereld
Het Spel Van De WereldHet Spel Van De Wereld
Het Spel Van De Wereld
 
SYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCHSYNOPSIS PELVIC PAIN RESEARCH
SYNOPSIS PELVIC PAIN RESEARCH
 

Similar to Redshift 101

AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
Kel Graham
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
Amazon Web Services
 
Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6
Mohd Harris Ahmad Jaal
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
Lviv Startup Club
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Lucidworks
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
SudheerKumar499932
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
Amazon Web Services LATAM
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
Justin Swanhart
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Ofer Zelig
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
Amazon Web Services
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
Cloudera, Inc.
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
Amazon Web Services
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Indy pass writing efficient queries – part 1 - indexing
Indy pass   writing efficient queries – part 1 - indexingIndy pass   writing efficient queries – part 1 - indexing
Indy pass writing efficient queries – part 1 - indexing
eddiew
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
Shy Engelberg
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 

Similar to Redshift 101 (20)

AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6Web Application Development using PHP Chapter 6
Web Application Development using PHP Chapter 6
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Indy pass writing efficient queries – part 1 - indexing
Indy pass   writing efficient queries – part 1 - indexingIndy pass   writing efficient queries – part 1 - indexing
Indy pass writing efficient queries – part 1 - indexing
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 

Redshift 101

  • 1. Understanding the Basics & Avoiding Common Mistakes Presented by: Michael Krouze, CTO & VP Analytics, Charter Solutions, Inc. Redshift 101
  • 3. What is Amazon Redshift? 3 Amazon Redshift is a cloud hosted, fast, fully-managed, petabyte- scale data warehouse.
  • 4. Distributed rather than single node 4 vs.
  • 5. Columnar rather than row-based 5
  • 6. Enough intro, on to the meat of the presentation
  • 7. 7 Pick the right node type for your cluster
  • 8. Redshift Node Options 8 dc1.large: 15 GB RAM, 2 cores, 2 slices, 160 GB SSD, 5.12 TB max/cluster dc1.8xlarge: 244 GB RAM, 32 cores, 32 slices, 2.56 TB SSD, 326 TB max/cluster dS2.xlarge: 15 GB RAM, 4 cores, 2 slices, 2 TB HDD, 64 TB max/cluster ds2.8xlarge: 244 GB RAM, 36 cores, 16 slices, 16 TB SSD, 2 PB max/cluster DenseComputeDenseStorage ¨ Geared to high performance ¨ SSD Storage (326 TB max) ¨ ~ 95 GB member per TB of storage ¨ Starts at $0.25/hr ¨ Geared to large data sets ¨ HDD Storage (2PB max) ¨ ~ 15 GB memory per TB of storage ¨ Starts at $0.85/hr
  • 10. Zone Maps Read Min: 5 Max 45 Read Min: 9 Max: 32 Min: 30 Max: 42 Read Min: 22 Max : 80 Read Min: 18 Max: 50 10 Min: 1 Max 10 Read Min: 11 Max: 25 Min: 26 Max: 40 Min: 41 Max : 55 Min: 56 Max: 95 Select count(*) from customers where age = 24 Unsorted Sorted
  • 11. Sort Key Options 11 Single Column Sort Key • Table is sorted by 1 column • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group-bys • Quickest to VACUUM
  • 12. Sort Key Options 12 Single Column Sort Key • Table is sorted by 1 column • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group-bys • Quickest to VACUUM Compound Sort Key • Table is sorted by 1st column , then 2nd column etc. • Queries that use 1st column as primary filter, then other columnss • Can speed up joins and group bys • Slower to VACUUM
  • 13. Sort Key Options 13 Single Column Sort Key • Table is sorted by 1 column • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group-bys • Quickest to VACUUM Compound Sort Key • Table is sorted by 1st column , then 2nd column etc. • Queries that use 1st column as primary filter, then other columnss • Can speed up joins and group bys • Slower to VACUUM Interleaved Sort Key • Equal weight is given to each column • Queries that use different columns in filter • Queries get fasterthe more columns used in the filter (up to 8) • Slowest to VACUUM • More effective with large tables (> 100M+ rows)
  • 14. 14 Understand and use distribution styles and keys properly
  • 15. Distribution Style Options 15 All Node  1 Slice   1 Slice   2 Node  2 Slice   3 Slice   4 All   data  on  every  node Key Node  1 Slice   1 Slice   2 Node  2 Slice   3 Slice   4 Same   key  to  same  location Node  1 Slice   1 Slice   2 Node  2 Slice   3 Slice   4 Even Round  robin  distribution • Tables with no joins or group-bys • Small dimension tables (<1000 rows) • Medium dimension tables (1K – 2M) • Large fact tables • Large dimension tables
  • 16. 16 Primary keys and foreign keys don’t work the way you think
  • 17. How are they different? 17 u Primary and foreign key constraints are not enforced by Redshift u Indexes are not created (only sort keys exist for indexing) u They do help with query plan optimization though
  • 19. Redshift Compression 19 u Each column can be compressed with most appropriatealgorithm for content u Many algorithms supported u Raw encoding, Byte-dictionary, Delta encoding, Mostly encoding, Runlength encoding, Text encoding, LZO encoding u Average of 2-4x compression rates are common u Can cut query time as much as 50% u Use analyze  compression to get recommendations
  • 21. Addition of new rows create unsorted regions 21
  • 22. Vacuum reclaims space and re-sorts tables 22
  • 23. Vacuum 23 u 4 modes: u FULL – Reclaims space and re-sorts u DELETE ONLY – Reclaims space but does not re-sort u SORT ONLY – Re-sorts but does not reclaim space u REINDEX – Used for INTERLEAVED sort keys. Re-Analyzes sort keys and then runs FULL VACUUM u Vacuum is I/O intensive and can take time to run u Run regularly to minimize impact
  • 24. Analyze 24 u Updates statistics used by the query planner u Run regularly to keep statistics up to date u Especially after large data loads
  • 26. Workload Management 26 u Workload management is about creating queues for different workloads User Group A Short-running queueLong-running queue Short Query Group Long Query Group
  • 28. u Contact me: u michael.krouze@chartersolutions.com u @mjkrouze u Resources: u www.chartersolutions.com u github.com/awslabs/amazon-redshift-utils u AWS YouTube channel u AWS on SlideShare