SlideShare a Scribd company logo
1 of 62
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pavan Pothukuchi, Principal Product Manager , AWS
September 20, 2016
Deep Dive: Amazon Redshift for Big Data
Analytics
Agenda
• Service Overview
• Best Practices
• Schema / Table Design
• Data Ingestion
• Database Tuning
• Migration
• Examples
Service Overview
Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
Selected Amazon Redshift customers
Amazon Redshift system architecture
Leader node
• SQL endpoint
• Stores metadata
• Coordinates query execution
Compute nodes
• Local, columnar storage
• Execute queries in parallel
• Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB, Amazon EMR, or SSH
Two hardware platforms
• Optimized for data processing
• DS2: HDD; scale from 2TB to 2PB
• DC1: SSD; scale from 160GB to 326TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
A deeper look at compute node architecture
Each node contains multiple slices
• DS2 – 2 slices on XL, 16 on 8XL
• DC1 – 2 slices on L, 32 on 8XL
A slice can be thought as a “virtual
compute node”
• Unit of data partitioning
• Parallel query processing
Facts about slices:
• Each compute node has either 2,
16, or 32 slices
• Table rows are distributed to slices
• A slice processes only its own data
Leader Node
Amazon Redshift dramatically reduces I/O
Data compression
Zone maps
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• Calculating SUM(Amount) with
row storage:
– Need to read everything
– Unnecessary I/O
ID Age State Amount
Amazon Redshift dramatically reduces I/O
Data compression
Zone maps
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• Calculating SUM(Amount) with
column storage:
– Only scan the necessary
blocks
ID Age State Amount
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
• Columnar compression
– Effective due to like data
– Reduces storage
requirements
– Reduces I/O
ID Age State Amount
analyze compression orders;
Table | Column | Encoding
--------+-------------+----------
orders | id | mostly32
orders | age | mostly32
orders | state | lzo
orders | amount | mostly32
Amazon Redshift dramatically reduces I/O
Column storage
Data compression
Zone maps
• In-memory block metadata
• Contains per-block MIN and MAX value
• Effectively prunes blocks which don’t
contain data for a given query
• Minimize unnecessary I/O
ID Age State Amount
Best Practices: Schema Design
Data Distribution
• Distribution style is a table property which dictates how that table’s data is
distributed throughout the cluster:
• KEY: Value is hashed, same value goes to same location (slice)
• ALL: Full table data goes to first slice of every node
• EVEN: Round robin
• Goals:
• Distribute data evenly for parallel processing
• Minimize data movement during query processing
KEY
ALL
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
EVEN
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
2
3
4
ID Gender Name
101 M John Smith
306 F Lisa Green
ID Gender Name
292 F Jane Jones
209 M James White
ID Gender Name
139 M Peter Black
164 M Brian Snail
ID Gender Name
446 M Pat Partridge
658 F Sarah Cyan
Round
Robin
DISTSTYLE EVEN
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
Hash
Function
ID Gender Name
101 M John Smith
306 F Lisa Green
ID Gender Name
292 F Jane Jones
209 M James White
ID Gender Name
139 M Peter Black
164 M Brian Snail
ID Gender Name
446 M Pat Partridge
658 F Sarah Cyan
DISTSTYLE KEY
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
Hash
Function
ID Gender Name
101 M John Smith
139 M Peter Black
446 M Pat Partridge
164 M Brian Snail
209 M James White
ID Gender Name
292 F Jane Jones
658 F Sarah Cyan
306 F Lisa Green
DISTSTYLE KEY
ID Gender Name
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M James White
306 F Lisa Green
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
101 M John Smith
292 F Jane Jones
139 M Peter Black
446 M Pat Partridge
658 F Sarah Cyan
164 M Brian Snail
209 M Lisa Green
306 F James White
ALL
DISTSTYLE ALL
CUSTOMERS
CUST_ID GENDER NAME
101 M John Smith
306 F James White
ORDERS
ORDER_ID CUST_ID Amount
A1600 101 120
B8765 306 340
RESULTS
CUST_ID GENDER Amount
101 M 120
306 F 340
CUSTOMERS
CUST_ID GENDER NAME
292 F Jane Jones
209 M Lyall Green
ORDERS
ORDER_ID CUST_ID Amount
C0967 292 750
D8753 209 601
RESULTS
CUST_ID GENDER Amount
292 F 750
209 M 601
CUSTOMERS
CUST_ID GENDER NAME
101 M John Smith
306 F James White
ORDERS
ORDER_ID CUST_ID Amount
A1600 101 120
B8765 306 340
RESULTS
CUST_ID GENDER Amount
101 M 120
306 F 340
CUSTOMERS
CUST_ID GENDER NAME
292 F Jane Jones
209 M Lyall Green
ORDERS
ORDER_ID CUST_ID Amount
C0967 292 750
D8753 209 601
RESULTS
CUST_ID GENDER Amount
292 F 750
209 M 601
Choosing a Distribution Style
KEY
• Large FACT tables
• Large or rapidly changing
tables used in joins
• Localize columns used within
aggregations
ALL
• Have slowly changing data
• Reasonable size (i.e., few
millions but not 100’s of
millions of rows)
• No common distribution key for
frequent joins
• Typical use case – joined
dimension table without a
common distribution key
EVEN
• Tables not frequently joined or
aggregated
• Large tables without acceptable
candidate keys
Data Sorting
Goals
Physically order rows of table data based on certain column(s)
Optimize effectiveness of zone maps
Enable MERGE JOIN operations
Impact
Enables rrscans to prune blocks by leveraging zone maps
Overall reduction in block IO
Achieved with the table property SORTKEY defined over one or more columns
Optimal SORTKEY is dependent on:
Query patterns
Data profile
Business requirements
Zone Maps
SELECT COUNT(*) FROM LOGS WHERE DATE = ‘09-JUNE-2013’
MIN: 01-JUNE-2013
MAX: 20-JUNE-2013
MIN: 08-JUNE-2013
MAX: 30-JUNE-2013
MIN: 12-JUNE-2013
MAX: 20-JUNE-2013
MIN: 02-JUNE-2013
MAX: 25-JUNE-2013
MIN: 06-JUNE-2013
MAX: 12-JUNE-2013
Unsorted Table
MIN: 01-JUNE-2013
MAX: 06-JUNE-2013
MIN: 07-JUNE-2013
MAX: 12-JUNE-2013
MIN: 13-JUNE-2013
MAX: 18-JUNE-2013
MIN: 19-JUNE-2013
MAX: 24-JUNE-2013
MIN: 25-JUNE-2013
MAX: 30-JUNE-2013
Sorted By Date
READ
READ
READ
READ
READ
Single Column
• Table is sorted by 1 column
Date Region Country
2-JUN-2015 Oceania New Zealand
2-JUN-2015 Asia Singapore
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Hong Kong
3-JUN-2015 Europe Germany
3-JUN-2015 Asia Korea
[ SORTKEY ( date ) ]
Best for:
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group bys
Compound
Date Region Country
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Korea
2-JUN-2015 Asia Singapore
2-JUN-2015 Europe Germany
3-JUN-2015 Asia Hong Kong
3-JUN-2015 Asia Korea
[ SORTKEY COMPOUND ( date, region, country) ]
Best for:
• Queries that use 1st column as primary filter, then other cols
• Can speed up joins and group bys
Interleaved
• Equal weight is given to each column.
Date Region Country
2-JUN-2015 Africa Zaire
3-JUN-2015 Asia Singapore
2-JUN-2015 Asia Korea
2-JUN-2015 Europe Germany
3-JUN-2015 Asia Hong Kong
2-JUN-2015 Asia Korea
[ SORTKEY INTERLEAVED ( date, region, country) ]
Best for:
• Queries that use different columns in filter
• Queries get faster the more columns used in the filter
COMPOUND
• Most Common
• Well defined filter criteria
• Time-series data
Choosing a SORTKEY
INTERLEAVED
• Edge Cases
• Large tables (>Billion Rows)
• No common filter criteria
• Non time-series data
• Primarily as a query predicate (date, identifier, …)
• Optionally choose a column frequently used for aggregates
• Optionally choose same as distribution key column for most efficient
joins (merge join)
Compressing Data
• COPY automatically analyzes and compresses data
when loading into empty tables
• ANALYZE COMPRESSION checks existing tables and
proposes optimal compression algorithms for each
column
• Changing column encoding requires a table rebuild
Compressing Data
If you have a regular ETL process and you use temp tables
or staging tables, turn off automatic compression
• Use analyze compression to determine the right encodings
• Bake those encodings into your DML
• Use CREATE TABLE … LIKE
Compressing Data
• From the zone maps we know:
• Which block(s) contain the
range
• Which row offsets to scan
• Highly compressed sort keys:
• Many rows per block
• Large row offset
Skip compression on just the
leading column of the compound
sortkey
Best Practices: Ingestion
Amazon Redshift Loading Data Overview
AWS CloudCorporate Data center
Amazon
DynamoDB
Amazon S3
Data
Volume
Amazon Elastic
MapReduce
Amazon
RDS
Amazon
Redshift
Amazon
Glacier
logs / files
Source DBs
VPN
Connection
AWS Direct
Connect
S3 Multipart
Upload
AWS Import/
Export
EC2 or On-
Prem (using
SSH)
Parallelism is a function of load files
Each slice’s query processors are able to load one file at a time
• Streaming Decompression
• Parse
• Distribute
• Write
A single input file means
only one slice is ingesting data
Realizing only partial cluster usage as 6.25% of slices are active
2 4 6 8 10 12 141 3 5 7 9 11 13 15
Maximize Throughput with Multiple Files
Use at least as many input files as
there are slices in cluster
With 16 input files, all slices are
working so you maximize
throughput
COPY continues to scale linearly
as you add additional nodes
2 4 6 8 10 12 141 3 5 7 9 11 13 15
New feature: ALTER TABLE APPEND
ELT workloads typically “massage” or aggregate data in a
staging table and then append to production table
ALTER TABLE APPEND moves data from staging to
production table by manipulating metadata
Much faster than INSERT INTO as data is not duplicated
Best Practices: Performance
Tuning
Optimizing a database for querying
• Periodically check your table status
• Vacuum and Analyze regularly
• SVV_TABLE_INFO
• Missing statistics
• Table skew
• Uncompressed Columns
• Unsorted Data
• Check your cluster status
• WLM queuing
• Commit queuing
• Database Locks
Missing Statistics
• Amazon Redshift’s query
optimizer relies on up-to-date
statistics
• Statistics are only necessary for
data which you are accessing
• Updated stats important on:
• SORTKEY
• DISTKEY
• Columns in query predicates
Table Skew
• Unbalanced workload
• Query completes as fast as the
slowest slice completes
• Can cause skew inflight:
• Temp data fills a single
node resulting in query
failure
Table Maintenance and Status
Unsorted Table
• Sortkey is just a guide, but data
needs to actually be sorted
• VACUUM or DEEP COPY to
sort
• Scans against unsorted tables
continue to benefit from zone
maps:
• Load sequential blocks
WLM Queue
Identify short/long-running queries
and prioritize them
Define multiple queues to route
queries appropriately.
Default concurrency of 5
Leverage wlm_apex_hourly to tune
WLM based on peak concurrency
requirements
Cluster Status: Commits and WLM
Commit Queue
How long is your commit queue?
• Identify needless transactions
• Group dependent statements
within a single transaction
• Offload operational workloads
• STL_COMMIT_STATS
Cluster Status: Database Locks
• Database Locks
• Read locks, Write locks, Exclusive locks
• Reads block exclusive
• Writes block writes and exclusive
• Exclusives block everything
• Ungranted locks block subsequent lock requests
• Exposed through SVV_TRANSACTIONS
Migration Considerations
Typical ETL/ELT on legacy data warehouse
• One file per table, maybe a few if too big
• Many updates (“massage” the data)
• Every job clears the data, then loads
• Count on primary key to block double loads
• High concurrency of load jobs
• Small table(s) to control the job stream
Two questions to ask
Why you do what you do?
• Many times, users don’t know
What is the customer need?
• Many times, needs do not match current practice
• You might benefit from adding other AWS services
On Amazon Redshift
Updates are delete + insert of the row
• Deletes just mark rows for deletion
Blocks are immutable
• Minimum space used is one block per column, per slice
Commits are expensive
• 4 GB write on 8XL per node
• Mirrors WHOLE dictionary
• Cluster-wide serialized
On Amazon Redshift
• Not all aggregations created equal
• Pre-aggregation can help
• Order on group by matters
• Concurrency should be low for better throughput
• Caching layer for dashboards is recommended
• WLM parcels RAM to queries. Use multiple queues for
better control.
Workload Management (WLM)
Concurrency and memory can now be changed dynamically
You can have distinct values for load time and query time
Use wlm_apex_hourly.sql to monitor “queue pressure”
New Feature – WLM Queue Hopping
Query throughput vs. Concurrency
• Query throughput (QPM or QPH) is more representative
of end user experience than concurrency
• Several improvements over the last 6 months
• Commit improvements
• Dynamic resource management
• Query throughput doubled over the last 6 months
Resources
https://github.com/awslabs/amazon-redshift-utils
https://github.com/awslabs/amazon-redshift-monitoring
https://github.com/awslabs/amazon-redshift-udfs
https://s3.amazonaws.com/chriz-webinar/webinar.zip
Admin scripts
Collection of utilities for running diagnostics on your cluster
Admin views
Collection of utilities for managing your cluster, generating schema DDL, etc.
ColumnEncodingUtility
Gives you the ability to apply optimal column encoding to an established
schema with data already loaded
Monday, October 24, 2016
JW Marriot Austin
https://aws.amazon.com/events/devday-austin
Free, one-day developer event featuring tracks,
labs, and workshops around Serverless,
Containers, IoT, and Mobile
Q&A
If you want to learn more, register for our upcoming DevDay Austin:
Appendix: Performance
optimization examples
Use SORTKEYs to effectively prune blocks
Use SORTKEYs to effectively prune blocks
Use SORTKEYs to effectively prune blocks
Don’t compress initial SORTKEY column
Use compression encoding to reduce I/O
Choose a DISTKEY which avoids data skew
Ingest: Disable predictable compression analysis
Ingest: Load multiple files to match cluster slices
VACUUM to physically removed deleted rows
VACUUM to keep your tables sorted
Gather statistics to assist the query planner

More Related Content

What's hot

Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan BlueDatabricks
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon Web Services Korea
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseDatabricks
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareAltinity Ltd
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
 
AWS Cloud 환경으로​ DB Migration 전략 수립하기
AWS Cloud 환경으로​ DB Migration 전략 수립하기AWS Cloud 환경으로​ DB Migration 전략 수립하기
AWS Cloud 환경으로​ DB Migration 전략 수립하기BESPIN GLOBAL
 
Amazon Redshift パフォーマンスチューニングテクニックと最新アップデート
Amazon Redshift パフォーマンスチューニングテクニックと最新アップデートAmazon Redshift パフォーマンスチューニングテクニックと最新アップデート
Amazon Redshift パフォーマンスチューニングテクニックと最新アップデートAmazon Web Services Japan
 
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal HealthSRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal HealthAmazon Web Services
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfAmazon Web Services
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 

What's hot (20)

Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan Blue
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
AWS Cloud 환경으로​ DB Migration 전략 수립하기
AWS Cloud 환경으로​ DB Migration 전략 수립하기AWS Cloud 환경으로​ DB Migration 전략 수립하기
AWS Cloud 환경으로​ DB Migration 전략 수립하기
 
Amazon Redshift パフォーマンスチューニングテクニックと最新アップデート
Amazon Redshift パフォーマンスチューニングテクニックと最新アップデートAmazon Redshift パフォーマンスチューニングテクニックと最新アップデート
Amazon Redshift パフォーマンスチューニングテクニックと最新アップデート
 
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal HealthSRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 

Viewers also liked

Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...Amazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
RedShift-Performance turning in few clicks
RedShift-Performance turning in few clicksRedShift-Performance turning in few clicks
RedShift-Performance turning in few clicksSadagopan Iyengar
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL ServicesAmazon Web Services
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
Redshift performance tuning
Redshift performance tuningRedshift performance tuning
Redshift performance tuningCarlos del Cacho
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722Amazon Web Services
 
Security Innovations in the Cloud
Security Innovations in the CloudSecurity Innovations in the Cloud
Security Innovations in the CloudAmazon Web Services
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSAmazon Web Services
 
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAmazon Web Services
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheAmazon Web Services
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...Amazon Web Services
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSAmazon Web Services
 
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAmazon Web Services
 
DevOps at Amazon: A Look at Our Tools and Processes
 DevOps at Amazon: A Look at Our Tools and Processes DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and ProcessesAmazon Web Services
 
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAmazon Web Services
 
AWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAmazon Web Services
 

Viewers also liked (20)

Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...Building prediction models with Amazon Redshift and Amazon Machine Learning -...
Building prediction models with Amazon Redshift and Amazon Machine Learning -...
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
RedShift-Performance turning in few clicks
RedShift-Performance turning in few clicksRedShift-Performance turning in few clicks
RedShift-Performance turning in few clicks
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Redshift performance tuning
Redshift performance tuningRedshift performance tuning
Redshift performance tuning
 
Benchmark slideshow
Benchmark slideshowBenchmark slideshow
Benchmark slideshow
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722
 
Security Innovations in the Cloud
Security Innovations in the CloudSecurity Innovations in the Cloud
Security Innovations in the Cloud
 
Getting Started on AWS
Getting Started on AWS Getting Started on AWS
Getting Started on AWS
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECS
 
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCache
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWS
 
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
 
DevOps at Amazon: A Look at Our Tools and Processes
 DevOps at Amazon: A Look at Our Tools and Processes DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and Processes
 
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
 
AWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing Zone
 

Similar to Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series

Introdução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon RedshiftIntrodução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon RedshiftAmazon Web Services LATAM
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftAmazon Web Services
 
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStoreSesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStoreMariaDB plc
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Julien SIMON
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceAmazon Web Services
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 
Getting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoGetting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAmazon Web Services Korea
 
What You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsWhat You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsDell World
 
Deploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWSDeploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWSAmazon Web Services
 

Similar to Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series (20)

Introdução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon RedshiftIntrodução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
 
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStoreSesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Getting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoGetting started with amazon redshift - Toronto
Getting started with amazon redshift - Toronto
 
DBMS Chapter-3.ppsx
DBMS Chapter-3.ppsxDBMS Chapter-3.ppsx
DBMS Chapter-3.ppsx
 
SQL
SQLSQL
SQL
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell Nash
 
What You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsWhat You Need To Know About The Top Database Trends
What You Need To Know About The Top Database Trends
 
Deploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWSDeploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWS
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pavan Pothukuchi, Principal Product Manager , AWS September 20, 2016 Deep Dive: Amazon Redshift for Big Data Analytics
  • 2. Agenda • Service Overview • Best Practices • Schema / Table Design • Data Ingestion • Database Tuning • Migration • Examples
  • 4. Relational data warehouse Massively parallel; petabyte scale Fully managed HDD and SSD platforms $1,000/TB/year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 6. Amazon Redshift system architecture Leader node • SQL endpoint • Stores metadata • Coordinates query execution Compute nodes • Local, columnar storage • Execute queries in parallel • Load, backup, restore via Amazon S3; load from Amazon DynamoDB, Amazon EMR, or SSH Two hardware platforms • Optimized for data processing • DS2: HDD; scale from 2TB to 2PB • DC1: SSD; scale from 160GB to 326TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 7. A deeper look at compute node architecture Each node contains multiple slices • DS2 – 2 slices on XL, 16 on 8XL • DC1 – 2 slices on L, 32 on 8XL A slice can be thought as a “virtual compute node” • Unit of data partitioning • Parallel query processing Facts about slices: • Each compute node has either 2, 16, or 32 slices • Table rows are distributed to slices • A slice processes only its own data Leader Node
  • 8. Amazon Redshift dramatically reduces I/O Data compression Zone maps ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 • Calculating SUM(Amount) with row storage: – Need to read everything – Unnecessary I/O ID Age State Amount
  • 9. Amazon Redshift dramatically reduces I/O Data compression Zone maps ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 • Calculating SUM(Amount) with column storage: – Only scan the necessary blocks ID Age State Amount
  • 10. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps • Columnar compression – Effective due to like data – Reduces storage requirements – Reduces I/O ID Age State Amount analyze compression orders; Table | Column | Encoding --------+-------------+---------- orders | id | mostly32 orders | age | mostly32 orders | state | lzo orders | amount | mostly32
  • 11. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps • In-memory block metadata • Contains per-block MIN and MAX value • Effectively prunes blocks which don’t contain data for a given query • Minimize unnecessary I/O ID Age State Amount
  • 13. Data Distribution • Distribution style is a table property which dictates how that table’s data is distributed throughout the cluster: • KEY: Value is hashed, same value goes to same location (slice) • ALL: Full table data goes to first slice of every node • EVEN: Round robin • Goals: • Distribute data evenly for parallel processing • Minimize data movement during query processing KEY ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN
  • 14. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green 2 3 4 ID Gender Name 101 M John Smith 306 F Lisa Green ID Gender Name 292 F Jane Jones 209 M James White ID Gender Name 139 M Peter Black 164 M Brian Snail ID Gender Name 446 M Pat Partridge 658 F Sarah Cyan Round Robin DISTSTYLE EVEN
  • 15. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green Hash Function ID Gender Name 101 M John Smith 306 F Lisa Green ID Gender Name 292 F Jane Jones 209 M James White ID Gender Name 139 M Peter Black 164 M Brian Snail ID Gender Name 446 M Pat Partridge 658 F Sarah Cyan DISTSTYLE KEY
  • 16. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green Hash Function ID Gender Name 101 M John Smith 139 M Peter Black 446 M Pat Partridge 164 M Brian Snail 209 M James White ID Gender Name 292 F Jane Jones 658 F Sarah Cyan 306 F Lisa Green DISTSTYLE KEY
  • 17. ID Gender Name 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M James White 306 F Lisa Green 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White 101 M John Smith 292 F Jane Jones 139 M Peter Black 446 M Pat Partridge 658 F Sarah Cyan 164 M Brian Snail 209 M Lisa Green 306 F James White ALL DISTSTYLE ALL
  • 18. CUSTOMERS CUST_ID GENDER NAME 101 M John Smith 306 F James White ORDERS ORDER_ID CUST_ID Amount A1600 101 120 B8765 306 340 RESULTS CUST_ID GENDER Amount 101 M 120 306 F 340 CUSTOMERS CUST_ID GENDER NAME 292 F Jane Jones 209 M Lyall Green ORDERS ORDER_ID CUST_ID Amount C0967 292 750 D8753 209 601 RESULTS CUST_ID GENDER Amount 292 F 750 209 M 601
  • 19. CUSTOMERS CUST_ID GENDER NAME 101 M John Smith 306 F James White ORDERS ORDER_ID CUST_ID Amount A1600 101 120 B8765 306 340 RESULTS CUST_ID GENDER Amount 101 M 120 306 F 340 CUSTOMERS CUST_ID GENDER NAME 292 F Jane Jones 209 M Lyall Green ORDERS ORDER_ID CUST_ID Amount C0967 292 750 D8753 209 601 RESULTS CUST_ID GENDER Amount 292 F 750 209 M 601
  • 20. Choosing a Distribution Style KEY • Large FACT tables • Large or rapidly changing tables used in joins • Localize columns used within aggregations ALL • Have slowly changing data • Reasonable size (i.e., few millions but not 100’s of millions of rows) • No common distribution key for frequent joins • Typical use case – joined dimension table without a common distribution key EVEN • Tables not frequently joined or aggregated • Large tables without acceptable candidate keys
  • 21. Data Sorting Goals Physically order rows of table data based on certain column(s) Optimize effectiveness of zone maps Enable MERGE JOIN operations Impact Enables rrscans to prune blocks by leveraging zone maps Overall reduction in block IO Achieved with the table property SORTKEY defined over one or more columns Optimal SORTKEY is dependent on: Query patterns Data profile Business requirements
  • 22. Zone Maps SELECT COUNT(*) FROM LOGS WHERE DATE = ‘09-JUNE-2013’ MIN: 01-JUNE-2013 MAX: 20-JUNE-2013 MIN: 08-JUNE-2013 MAX: 30-JUNE-2013 MIN: 12-JUNE-2013 MAX: 20-JUNE-2013 MIN: 02-JUNE-2013 MAX: 25-JUNE-2013 MIN: 06-JUNE-2013 MAX: 12-JUNE-2013 Unsorted Table MIN: 01-JUNE-2013 MAX: 06-JUNE-2013 MIN: 07-JUNE-2013 MAX: 12-JUNE-2013 MIN: 13-JUNE-2013 MAX: 18-JUNE-2013 MIN: 19-JUNE-2013 MAX: 24-JUNE-2013 MIN: 25-JUNE-2013 MAX: 30-JUNE-2013 Sorted By Date READ READ READ READ READ
  • 23. Single Column • Table is sorted by 1 column Date Region Country 2-JUN-2015 Oceania New Zealand 2-JUN-2015 Asia Singapore 2-JUN-2015 Africa Zaire 2-JUN-2015 Asia Hong Kong 3-JUN-2015 Europe Germany 3-JUN-2015 Asia Korea [ SORTKEY ( date ) ] Best for: • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group bys
  • 24. Compound Date Region Country 2-JUN-2015 Africa Zaire 2-JUN-2015 Asia Korea 2-JUN-2015 Asia Singapore 2-JUN-2015 Europe Germany 3-JUN-2015 Asia Hong Kong 3-JUN-2015 Asia Korea [ SORTKEY COMPOUND ( date, region, country) ] Best for: • Queries that use 1st column as primary filter, then other cols • Can speed up joins and group bys
  • 25. Interleaved • Equal weight is given to each column. Date Region Country 2-JUN-2015 Africa Zaire 3-JUN-2015 Asia Singapore 2-JUN-2015 Asia Korea 2-JUN-2015 Europe Germany 3-JUN-2015 Asia Hong Kong 2-JUN-2015 Asia Korea [ SORTKEY INTERLEAVED ( date, region, country) ] Best for: • Queries that use different columns in filter • Queries get faster the more columns used in the filter
  • 26. COMPOUND • Most Common • Well defined filter criteria • Time-series data Choosing a SORTKEY INTERLEAVED • Edge Cases • Large tables (>Billion Rows) • No common filter criteria • Non time-series data • Primarily as a query predicate (date, identifier, …) • Optionally choose a column frequently used for aggregates • Optionally choose same as distribution key column for most efficient joins (merge join)
  • 27. Compressing Data • COPY automatically analyzes and compresses data when loading into empty tables • ANALYZE COMPRESSION checks existing tables and proposes optimal compression algorithms for each column • Changing column encoding requires a table rebuild
  • 28. Compressing Data If you have a regular ETL process and you use temp tables or staging tables, turn off automatic compression • Use analyze compression to determine the right encodings • Bake those encodings into your DML • Use CREATE TABLE … LIKE
  • 29. Compressing Data • From the zone maps we know: • Which block(s) contain the range • Which row offsets to scan • Highly compressed sort keys: • Many rows per block • Large row offset Skip compression on just the leading column of the compound sortkey
  • 31. Amazon Redshift Loading Data Overview AWS CloudCorporate Data center Amazon DynamoDB Amazon S3 Data Volume Amazon Elastic MapReduce Amazon RDS Amazon Redshift Amazon Glacier logs / files Source DBs VPN Connection AWS Direct Connect S3 Multipart Upload AWS Import/ Export EC2 or On- Prem (using SSH)
  • 32. Parallelism is a function of load files Each slice’s query processors are able to load one file at a time • Streaming Decompression • Parse • Distribute • Write A single input file means only one slice is ingesting data Realizing only partial cluster usage as 6.25% of slices are active 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 33. Maximize Throughput with Multiple Files Use at least as many input files as there are slices in cluster With 16 input files, all slices are working so you maximize throughput COPY continues to scale linearly as you add additional nodes 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 34. New feature: ALTER TABLE APPEND ELT workloads typically “massage” or aggregate data in a staging table and then append to production table ALTER TABLE APPEND moves data from staging to production table by manipulating metadata Much faster than INSERT INTO as data is not duplicated
  • 36. Optimizing a database for querying • Periodically check your table status • Vacuum and Analyze regularly • SVV_TABLE_INFO • Missing statistics • Table skew • Uncompressed Columns • Unsorted Data • Check your cluster status • WLM queuing • Commit queuing • Database Locks
  • 37. Missing Statistics • Amazon Redshift’s query optimizer relies on up-to-date statistics • Statistics are only necessary for data which you are accessing • Updated stats important on: • SORTKEY • DISTKEY • Columns in query predicates
  • 38. Table Skew • Unbalanced workload • Query completes as fast as the slowest slice completes • Can cause skew inflight: • Temp data fills a single node resulting in query failure Table Maintenance and Status Unsorted Table • Sortkey is just a guide, but data needs to actually be sorted • VACUUM or DEEP COPY to sort • Scans against unsorted tables continue to benefit from zone maps: • Load sequential blocks
  • 39. WLM Queue Identify short/long-running queries and prioritize them Define multiple queues to route queries appropriately. Default concurrency of 5 Leverage wlm_apex_hourly to tune WLM based on peak concurrency requirements Cluster Status: Commits and WLM Commit Queue How long is your commit queue? • Identify needless transactions • Group dependent statements within a single transaction • Offload operational workloads • STL_COMMIT_STATS
  • 40. Cluster Status: Database Locks • Database Locks • Read locks, Write locks, Exclusive locks • Reads block exclusive • Writes block writes and exclusive • Exclusives block everything • Ungranted locks block subsequent lock requests • Exposed through SVV_TRANSACTIONS
  • 42. Typical ETL/ELT on legacy data warehouse • One file per table, maybe a few if too big • Many updates (“massage” the data) • Every job clears the data, then loads • Count on primary key to block double loads • High concurrency of load jobs • Small table(s) to control the job stream
  • 43. Two questions to ask Why you do what you do? • Many times, users don’t know What is the customer need? • Many times, needs do not match current practice • You might benefit from adding other AWS services
  • 44. On Amazon Redshift Updates are delete + insert of the row • Deletes just mark rows for deletion Blocks are immutable • Minimum space used is one block per column, per slice Commits are expensive • 4 GB write on 8XL per node • Mirrors WHOLE dictionary • Cluster-wide serialized
  • 45. On Amazon Redshift • Not all aggregations created equal • Pre-aggregation can help • Order on group by matters • Concurrency should be low for better throughput • Caching layer for dashboards is recommended • WLM parcels RAM to queries. Use multiple queues for better control.
  • 46. Workload Management (WLM) Concurrency and memory can now be changed dynamically You can have distinct values for load time and query time Use wlm_apex_hourly.sql to monitor “queue pressure”
  • 47. New Feature – WLM Queue Hopping
  • 48. Query throughput vs. Concurrency • Query throughput (QPM or QPH) is more representative of end user experience than concurrency • Several improvements over the last 6 months • Commit improvements • Dynamic resource management • Query throughput doubled over the last 6 months
  • 49. Resources https://github.com/awslabs/amazon-redshift-utils https://github.com/awslabs/amazon-redshift-monitoring https://github.com/awslabs/amazon-redshift-udfs https://s3.amazonaws.com/chriz-webinar/webinar.zip Admin scripts Collection of utilities for running diagnostics on your cluster Admin views Collection of utilities for managing your cluster, generating schema DDL, etc. ColumnEncodingUtility Gives you the ability to apply optimal column encoding to an established schema with data already loaded
  • 50. Monday, October 24, 2016 JW Marriot Austin https://aws.amazon.com/events/devday-austin Free, one-day developer event featuring tracks, labs, and workshops around Serverless, Containers, IoT, and Mobile Q&A If you want to learn more, register for our upcoming DevDay Austin:
  • 52. Use SORTKEYs to effectively prune blocks
  • 53. Use SORTKEYs to effectively prune blocks
  • 54. Use SORTKEYs to effectively prune blocks
  • 55. Don’t compress initial SORTKEY column
  • 56. Use compression encoding to reduce I/O
  • 57. Choose a DISTKEY which avoids data skew
  • 58. Ingest: Disable predictable compression analysis
  • 59. Ingest: Load multiple files to match cluster slices
  • 60. VACUUM to physically removed deleted rows
  • 61. VACUUM to keep your tables sorted
  • 62. Gather statistics to assist the query planner