SlideShare a Scribd company logo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Eric Ferreira, Principal Engineer, AWS
Philipp Mohr, Sr. CRM Director , King.com
November 29, 2016
Best Practices for Data Warehousing
with Amazon Redshift
BDM402
What to Expect from the Session
• Brief recap of Amazon Redshift service
• How King implemented their CRM
• Why their best practices work
What is Amazon Redshift ?
• Relational data warehouse
• Massively parallel; petabyte
scale
• Fully managed
• HDD and SSD platforms
• $1,000/TB/year; starts at
$0.25/hour
Columnar
MPP
OLAP
IAMAmazon
VPC
Amazon SWF
Amazon
S3 AWS KMS Amazon
Route 53
Amazon
CloudWatch
Amazon
EC2
regionvirtual private cloud
Availability Zone
CN0 CN1 CN2 CN3
Leader Node
February 2013
November 2016
> 135 Significant Features
Are you a user ?
© King.com Ltd 2016 – Commercially confidential
© King.com Ltd 2016 – Commercially confidential
as an operational CRM database
@
© King.com Ltd 2016 – Commercially confidential Page 11
Business challenges @ CRM
© King.com Ltd 2016 – Commercially
© King.com Ltd 2016 – Commercially confidential
SAGA
CRM
Previously in...
© King.com Ltd 2016 – Commercially confidential
Campaign
Email
Extraction
Email
CRM
The CRM Saga
© King.com Ltd 2016 – Commercially confidential
The scale we are talking about…#
9.5K
campaigns
executed / week
1.5B
messages
sent / month
12
Games
supported
8
promotions
specialists
© King.com Ltd 2016 – Commercially confidential
The scale we are talking about…#
9.5K
campaigns
executed / week
1.5B
messages
sent / month
12
Games
supported
8
promotions
specialists
5
campaigns
executed / week
23k
messages
sent / month
5
Game
supported
10
Promotions specialist
DS and Dev support
Starting point
© King.com Ltd 2016 – Commercially confidential
Emisario: campaign manager
© King.com Ltd 2016 – Commercially confidential
Why Amazon Redshift?
No dedicated admin
team needed
Peace of mind support
Column based and
massively parallel nature
Execute analytical /
marketing queries quickly
Quick time
to market
Scalability
> 10k
queries per
day
Value
Bs customer
records
+
+
+
+
+
++
+
+
© King.com Ltd 2016 – Commercially confidential
Why Amazon Redshift?
No dedicated admin
team needed
Peace of mind support
Column based and
massively parallel nature
Execute analytical /
marketing queries quickly
Quick time
to market
Scalability
> 10k
queries per
day
Value
Bs customer
records
Performanc
e
+
+
+
+
+
++
+
+
Performanc
e
Performanc
ePerformance
© King.com Ltd 2016 – Commercially confidential
6 x DC1.Large
24 x DC1.8Xlarge
EC2 Compute Units
(768 virtual cores) 60 TB
2 x DC1.Large
Development
Staging
ProductionScale of Amazon Redshift clusters
© King.com Ltd 2016 – Commercially confidential
Technical architecture
© King.com Ltd 2016 – Commercially confidential
Requirements for Amazon
Redshift
• DB needs to be part of an operational system
• Must be able to handle parallel queries on very large and dynamic data
• Must respond to queries within 15 seconds in order not to disrupt user experience
• Must be financially viable
© King.com Ltd 2016 – Commercially confidential
Use of distribution keys in all joins
• Segmentation and data merging queries require joining multiple tables with up to 4 billion rows each
• In such cases, anything other than a Merge-join was practically impossible
• Extra join condition added to all queries to join on the Distribution key even when they are
semantically redundant
• Dramatic reduction of query times. In certain cases, up to 1,000% increase in performance
SELECT ...
FROM tbl_crm_event e
JOIN tbl_crm_visit v on v.visitid = e.associatedvisitid
and e.playerid = v.playerid
© King.com Ltd 2016 – Commercially confidential
Migrate to natural distribution keys
• Restructuring the data and switching to natural distribution keys reduced the average
completion time to less than 30 minutes (quite often less than 5 minutes)
PROBLEM
SOLUTION
WHY
• Merge process can join existing and new data using the common distribution key
• Multiple processing steps of updating primary keys and related foreign keys were no longer
necessary
• No operation required data re-distribution
• update of their values requires moving data between nodes, which is costly
• When scaling from a 100M rows to 3B+, the merge (upsert) process took over 24 hours to complete
© King.com Ltd 2016 – Commercially confidential
Data pre-processing outside the primary
schema tables
• Segmentation can run in parallel without affecting performance
• (mostly) they do not access the same tables, and therefore, are not affected by locks
ACTIONS
IMPACT
• Merge (upsert) process performs all pre-processing on temporary tables
• If needed necessary primary tables are used by segmenting (read) queries
• E.g. final insert/update of the pre-processed data
© King.com Ltd 2016 – Commercially confidential
Thanks to column compression encoding…
• Heavy reduction of I/O
Use Amazon Redshift column encoding utility to determine best encoding
• Cluster size reduced from
48 X DC1.8XLarge to 24 nodes
• Near 100% performance
increase compared to raw
uncompressed data
© King.com Ltd 2016 – Commercially confidential
• Amazon Redshift utils from GitHub https://github.com/awslabs/amazon-redshift-utils
/* query showing queries which are waiting on a WLM Query Slot */
SELECT w.query
,substring(q.querytxt,1,100) AS querytxt
,w.queue_start_time
,w.service_class AS class
,w.slot_count AS slots
,w.total_queue_time / 1000000 AS queue_seconds
,w.total_exec_time / 1000000 exec_seconds
,(w.total_queue_time + w.total_Exec_time) / 1000000 AS total_seconds
FROM stl_wlm_query w
LEFT JOIN stl_query q
ON q.query = w.query
AND q.userid = w.userid
WHERE w.queue_start_Time >= dateadd(day,-7,CURRENT_DATE)
AND w.total_queue_Time > 0
ORDER BY w.total_queue_time DESC
,w.queue_start_time DESC limit 35
Concurrency optimizations in WLM
queue_seconds
© King.com Ltd 2016 – Commercially confidential
• Workload management (WLM) defines the number of query queues that are
available and how queries are routed to those queues for processing
Default configuration
["query_concurrency":5,]
Current configuration
["query_concurrency":10,]
Concurrency optimizations in WLM contd…
Extensive tests need to be done to ensure no query runs out of memory
© King.com Ltd 2016 – Commercially confidential
Eliminate concurrent modification of data
• All data upserts are handled by a single process
• No concurrent writes
• Performance of sequential batch queries are better than parallel small queries
© King.com Ltd 2016 – Commercially confidential
• Data merge 10% of data can get updated
• Daily vacuum not sufficient as in 24 hours query performance is severely affected
On demand vacuum based on the table state
• Periodically monitor unsorted regions of tables + vacuum them when it’s above threshold X
• Set threshold value per table
• SVV_TABLE_INFO system view used to diagnose and address table design issues that can influence
query performance, including issues with compression encoding, distribution keys, sort style, data
distribution skew, table size, and statistics.
• Less fluctuations, and therefore, predictable query performance
PROBLEM
SOLUTION
© King.com Ltd 2016 – Commercially confidential
Reduce the number of selected columns
• Due to columnar model, extracting extra columns is more expensive compared to OLTP databases
PROBLEM
PERFORMANCE IMPACT
SOLUTION
• Query generation process optimized to select ONLY the columns that are required for a certain use-case.
• Segmentation queries are automatically generated
• Often requested more columns than necessary for the use-case
© King.com Ltd 2016 – Commercially confidential
Increase the batch size as much as possible
• Increased performance: less selects performed
• We operate at 5 million batch size (up from 100K)
• Upper limit set by memory constraints on operational servers
• But: Balance with data freshness requirements
Batch
Operations
© King.com Ltd 2016 – Commercially confidential
Reduce use of leader node as much as possible
• Often, the leader node acts as a bottleneck
• Extracting a large number of rows (Some segmentation queries return hundreds of millions
of rows)
• Aggregate calculations across distribution keys
Problem
Solution
• Ensure data is unloaded to S3 (or other AWS channels) which the individual nodes
can communicate directly with
• Modify, if possible, queries to NOT span distribution keys. Each calculation can be
performed in each node
© King.com Ltd 2016 – Commercially confidential
Technical recommendations
• Use distribution keys that can
be used in all joins
• Migrate to natural keys
• Reduce use of leader-node
as much as possible
• Column compression
encoding
• Data pre-processing
outside the main tables
• WLM optimizations
• Increase batch-size as
much as possible
• Prohibit concurrent
modification of data
• Reduce selected columns
• On-demand vacuum
based on the state of the
database
Our vision:
Fast, Cheap and
Easy-to-use
Think: Toaster
You submit your job
Choose a few options
It runs
Amazon Redshift Cluster Architecture
Massively parallel, shared nothing
Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, backup, restore
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
S3 / EMR / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with row storage:
– Need to read everything
– Unnecessary I/O
aid loc dt
CREATE TABLE reinvent_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with columnar storage:
– Only scan blocks for relevant
column
aid loc dt
CREATE TABLE reinvent_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Columns grow and shrink independently
• Effective compression ratios due to like data
• Reduces storage requirements
• Reduces I/O
aid loc dt
CREATE TABLE reinvent_deep_dive (
aid INT ENCODE LZO
,loc CHAR(3) ENCODE BYTEDICT
,dt DATE ENCODE RUNLENGTH
);
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt
CREATE TABLE reinvent_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
• In-memory block metadata
• Contains per-block MIN and MAX value
• Effectively prunes blocks which cannot
contain data for a given query
• Eliminates unnecessary I/O
SELECT COUNT(*) FROM reinvent_deep_dive WHERE DT = ‘09-JUNE-2013’
MIN: 01-JUNE-2013
MAX: 20-JUNE-2013
MIN: 08-JUNE-2013
MAX: 30-JUNE-2013
MIN: 12-JUNE-2013
MAX: 20-JUNE-2013
MIN: 02-JUNE-2013
MAX: 25-JUNE-2013
Unsorted Table
MIN: 01-JUNE-2013
MAX: 06-JUNE-2013
MIN: 07-JUNE-2013
MAX: 12-JUNE-2013
MIN: 13-JUNE-2013
MAX: 18-JUNE-2013
MIN: 19-JUNE-2013
MAX: 24-JUNE-2013
Sorted By Date
Zone Maps
Compound Sort Keys
• Records in Amazon Redshift are stored in
blocks
• For this illustration, let’s assume that four
records fill a block
• Records with a given cust_id are all in one
block
• However, records with a given prod_id are
spread across four blocks
Interleaved Sort Keys
• Column values mapped in buckets and
bits interleaved (order is maintained)
• Data is sorted in equal measures for both
keys
• New values get assigned “others” bucket
• User has to re-map and re-write the whole
table to incorporate new mappings
• Records with a given cust_id are spread
across two blocks
• Records with a given prod_id are also
spread across two blocks
Interleaved Sort Key - Limitations
• Only makes sense on very large tables
Table Size: Blocks per column per slice
• Columns domain should be stable
Data Distribution
• Distribute data evenly for parallel processing
• Minimize data movement
• Co-located joins
• Localized aggregations
All
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Full table data on first
slice of every node
Distribution key
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Same key to same location
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Even
Round robin
distribution
We help you migrate your database…
AWS Schema Conversion Tool Current Sources:
• Oracle
• Teradata
• Netezza
• Greenplum
• Redshift
Data migration available though partners
today…
• Schema optimized for Amazon Redshift
• Convert SQL inside your code
QuickSight + Redshift
Redshift is one the fastest growing services in the AWS platform. QuickSight seamlessly
connects to Redshift giving you native access to all of your instances, and tables.
Amazon Redshift
Achieve high concurrency by
offloading end user queries to SPICE
Calculations can be done in SPICE reducing
the load on the underlying database
Parallelism considerations with Amazon Redshift slices
DS2.8XL Compute Node
Ingestion Throughput:
• Each slice’s query processors can load one file at a time:
• Streaming decompression
• Parse
• Distribute
• Write
Realizing only partial node usage as 6.25% of slices are active
0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
Design considerations for Amazon Redshift slices
Use at least as many input files
as there are slices in the cluster
With 16 input files, all slices are
working so you maximize
throughput
COPY continues to scale linearly
as you add nodes 16 Input Files
DS2.8XL Compute Node
0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
Optimizing a database for querying
• Periodically check your table status
• Vacuum and analyze regularly
• SVV_TABLE_INFO
• Missing statistics
• Table skew
• Uncompressed columns
• Unsorted data
• Check your cluster status
• WLM queuing
• Commit queuing
• Database locks
Missing statistics
• Amazon Redshift query optimizer
relies on up-to-date statistics
• Statistics are necessary only for
data that you are accessing
• Updated stats important on:
• SORTKEY
• DISTKEY
• Columns in query predicates
Table skew
• Unbalanced workload
• Query completes as fast as the
slowest slice completes
• Can cause skew inflight:
• Temp data fills a single
node, resulting in query
failure
Table maintenance and status
Unsorted table
• Sortkey is just a guide, but
data actually needs to be
sorted
• VACUUM or DEEP COPY to
sort
• Scans against unsorted tables
continue to benefit from zone
maps:
• Load sequential blocks
WLM queue
Identify short/long-running queries
and prioritize them
Define multiple queues to route
queries appropriately
Default concurrency of 5
Leverage wlm_apex_hourly to tune
WLM based on peak concurrency
requirements
Cluster status: commits and WLM
Commit queue
How long is your commit queue?
• Identify needless transactions
• Group dependent statements
within a single transaction
• Offload operational workloads
• STL_COMMIT_STATS
Open source tools
https://github.com/awslabs/amazon-redshift-utils
https://github.com/awslabs/amazon-redshift-monitoring
https://github.com/awslabs/amazon-redshift-udfs
Admin scripts
Collection of utilities for running diagnostics on your cluster
Admin views
Collection of utilities for managing your cluster, generating schema DDL, etc.
ColumnEncodingUtility
Gives you the ability to apply optimal column encoding to an established
schema with data already loaded
What’s next ?
Don’t Miss…
BDA304 - What’s New with Amazon Redshift
DAT202-R - [REPEAT] Migrating Your Data Warehouse to
Amazon Redshift
BDA203 - Billions of Rows Transformed in Record Time
Using Matillion ETL for Amazon Redshift
BDM306-R - [REPEAT] Netflix: Using Amazon S3 as the
fabric of our big data ecosystem
Thank you!
Remember to complete
your evaluations!

More Related Content

What's hot

AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
Amazon Web Services
 
Deep Dive on MySQL Databases on AWS - AWS Online Tech Talks
Deep Dive on MySQL Databases on AWS - AWS Online Tech TalksDeep Dive on MySQL Databases on AWS - AWS Online Tech Talks
Deep Dive on MySQL Databases on AWS - AWS Online Tech Talks
Amazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Amazon Web Services
 
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
Amazon Web Services
 
Deep Dive On Amazon Redshift
Deep Dive On Amazon RedshiftDeep Dive On Amazon Redshift
Deep Dive On Amazon Redshift
Amazon Web Services
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the Cloud
Amazon Web Services
 
AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
Amazon Web Services
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
Amazon Web Services
 
AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)
AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)
AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)
Amazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
Amazon Web Services
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
Amazon Web Services
 
ENT202 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity O...
ENT202 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity O...ENT202 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity O...
ENT202 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity O...
Amazon Web Services
 
ENT401 Deep Dive with Amazon EC2 Systems Manager
ENT401 Deep Dive with Amazon EC2 Systems ManagerENT401 Deep Dive with Amazon EC2 Systems Manager
ENT401 Deep Dive with Amazon EC2 Systems Manager
Amazon Web Services
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
Amazon Web Services
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
Amazon Web Services
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
Amazon Web Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
Amazon Web Services
 
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
Amazon Web Services
 
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Amazon Web Services
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
Amazon Web Services
 

What's hot (20)

AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
 
Deep Dive on MySQL Databases on AWS - AWS Online Tech Talks
Deep Dive on MySQL Databases on AWS - AWS Online Tech TalksDeep Dive on MySQL Databases on AWS - AWS Online Tech Talks
Deep Dive on MySQL Databases on AWS - AWS Online Tech Talks
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
 
Deep Dive On Amazon Redshift
Deep Dive On Amazon RedshiftDeep Dive On Amazon Redshift
Deep Dive On Amazon Redshift
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the Cloud
 
AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
 
AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)
AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)
AWS re:Invent 2016: Amazon EC2 Foundations (CMP203)
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
ENT202 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity O...
ENT202 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity O...ENT202 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity O...
ENT202 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity O...
 
ENT401 Deep Dive with Amazon EC2 Systems Manager
ENT401 Deep Dive with Amazon EC2 Systems ManagerENT401 Deep Dive with Amazon EC2 Systems Manager
ENT401 Deep Dive with Amazon EC2 Systems Manager
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
 
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 

Similar to AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift (BDM402)

Get the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost downGet the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost down
Agilisium Consulting
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
Amazon Web Services
 
Which Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoWhich Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San Francisco
Amazon Web Services
 
Which Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFWhich Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SF
Amazon Web Services
 
Which Database is Right for My Workload?
Which Database is Right for My Workload?Which Database is Right for My Workload?
Which Database is Right for My Workload?
Amazon Web Services
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Web Services
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Amazon Web Services
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
Amazon Web Services
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
AWS Summit 2013 | Auckland - Building Web Scale Applications with AWS
AWS Summit 2013 | Auckland - Building Web Scale Applications with AWSAWS Summit 2013 | Auckland - Building Web Scale Applications with AWS
AWS Summit 2013 | Auckland - Building Web Scale Applications with AWS
Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
 
Optimization SQL Server for Dynamics AX 2012 R3
Optimization SQL Server for Dynamics AX 2012 R3Optimization SQL Server for Dynamics AX 2012 R3
Optimization SQL Server for Dynamics AX 2012 R3
Juan Fabian
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
Amazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
Amazon Web Services
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
Amazon Web Services
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
Amazon Web Services
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
Amazon Web Services
 

Similar to AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift (BDM402) (20)

Get the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost downGet the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost down
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
 
Which Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoWhich Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San Francisco
 
Which Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFWhich Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SF
 
Which Database is Right for My Workload?
Which Database is Right for My Workload?Which Database is Right for My Workload?
Which Database is Right for My Workload?
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
AWS Summit 2013 | Auckland - Building Web Scale Applications with AWS
AWS Summit 2013 | Auckland - Building Web Scale Applications with AWSAWS Summit 2013 | Auckland - Building Web Scale Applications with AWS
AWS Summit 2013 | Auckland - Building Web Scale Applications with AWS
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Optimization SQL Server for Dynamics AX 2012 R3
Optimization SQL Server for Dynamics AX 2012 R3Optimization SQL Server for Dynamics AX 2012 R3
Optimization SQL Server for Dynamics AX 2012 R3
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift (BDM402)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Eric Ferreira, Principal Engineer, AWS Philipp Mohr, Sr. CRM Director , King.com November 29, 2016 Best Practices for Data Warehousing with Amazon Redshift BDM402
  • 2. What to Expect from the Session • Brief recap of Amazon Redshift service • How King implemented their CRM • Why their best practices work
  • 3. What is Amazon Redshift ? • Relational data warehouse • Massively parallel; petabyte scale • Fully managed • HDD and SSD platforms • $1,000/TB/year; starts at $0.25/hour
  • 4. Columnar MPP OLAP IAMAmazon VPC Amazon SWF Amazon S3 AWS KMS Amazon Route 53 Amazon CloudWatch Amazon EC2
  • 5. regionvirtual private cloud Availability Zone CN0 CN1 CN2 CN3 Leader Node
  • 6. February 2013 November 2016 > 135 Significant Features
  • 7. Are you a user ?
  • 8.
  • 9. © King.com Ltd 2016 – Commercially confidential
  • 10. © King.com Ltd 2016 – Commercially confidential as an operational CRM database @
  • 11. © King.com Ltd 2016 – Commercially confidential Page 11 Business challenges @ CRM © King.com Ltd 2016 – Commercially
  • 12. © King.com Ltd 2016 – Commercially confidential SAGA CRM Previously in...
  • 13. © King.com Ltd 2016 – Commercially confidential Campaign Email Extraction Email CRM The CRM Saga
  • 14. © King.com Ltd 2016 – Commercially confidential The scale we are talking about…# 9.5K campaigns executed / week 1.5B messages sent / month 12 Games supported 8 promotions specialists
  • 15. © King.com Ltd 2016 – Commercially confidential The scale we are talking about…# 9.5K campaigns executed / week 1.5B messages sent / month 12 Games supported 8 promotions specialists 5 campaigns executed / week 23k messages sent / month 5 Game supported 10 Promotions specialist DS and Dev support Starting point
  • 16. © King.com Ltd 2016 – Commercially confidential Emisario: campaign manager
  • 17. © King.com Ltd 2016 – Commercially confidential Why Amazon Redshift? No dedicated admin team needed Peace of mind support Column based and massively parallel nature Execute analytical / marketing queries quickly Quick time to market Scalability > 10k queries per day Value Bs customer records + + + + + ++ + +
  • 18. © King.com Ltd 2016 – Commercially confidential Why Amazon Redshift? No dedicated admin team needed Peace of mind support Column based and massively parallel nature Execute analytical / marketing queries quickly Quick time to market Scalability > 10k queries per day Value Bs customer records Performanc e + + + + + ++ + + Performanc e Performanc ePerformance
  • 19. © King.com Ltd 2016 – Commercially confidential 6 x DC1.Large 24 x DC1.8Xlarge EC2 Compute Units (768 virtual cores) 60 TB 2 x DC1.Large Development Staging ProductionScale of Amazon Redshift clusters
  • 20. © King.com Ltd 2016 – Commercially confidential Technical architecture
  • 21. © King.com Ltd 2016 – Commercially confidential Requirements for Amazon Redshift • DB needs to be part of an operational system • Must be able to handle parallel queries on very large and dynamic data • Must respond to queries within 15 seconds in order not to disrupt user experience • Must be financially viable
  • 22. © King.com Ltd 2016 – Commercially confidential Use of distribution keys in all joins • Segmentation and data merging queries require joining multiple tables with up to 4 billion rows each • In such cases, anything other than a Merge-join was practically impossible • Extra join condition added to all queries to join on the Distribution key even when they are semantically redundant • Dramatic reduction of query times. In certain cases, up to 1,000% increase in performance SELECT ... FROM tbl_crm_event e JOIN tbl_crm_visit v on v.visitid = e.associatedvisitid and e.playerid = v.playerid
  • 23. © King.com Ltd 2016 – Commercially confidential Migrate to natural distribution keys • Restructuring the data and switching to natural distribution keys reduced the average completion time to less than 30 minutes (quite often less than 5 minutes) PROBLEM SOLUTION WHY • Merge process can join existing and new data using the common distribution key • Multiple processing steps of updating primary keys and related foreign keys were no longer necessary • No operation required data re-distribution • update of their values requires moving data between nodes, which is costly • When scaling from a 100M rows to 3B+, the merge (upsert) process took over 24 hours to complete
  • 24. © King.com Ltd 2016 – Commercially confidential Data pre-processing outside the primary schema tables • Segmentation can run in parallel without affecting performance • (mostly) they do not access the same tables, and therefore, are not affected by locks ACTIONS IMPACT • Merge (upsert) process performs all pre-processing on temporary tables • If needed necessary primary tables are used by segmenting (read) queries • E.g. final insert/update of the pre-processed data
  • 25. © King.com Ltd 2016 – Commercially confidential Thanks to column compression encoding… • Heavy reduction of I/O Use Amazon Redshift column encoding utility to determine best encoding • Cluster size reduced from 48 X DC1.8XLarge to 24 nodes • Near 100% performance increase compared to raw uncompressed data
  • 26. © King.com Ltd 2016 – Commercially confidential • Amazon Redshift utils from GitHub https://github.com/awslabs/amazon-redshift-utils /* query showing queries which are waiting on a WLM Query Slot */ SELECT w.query ,substring(q.querytxt,1,100) AS querytxt ,w.queue_start_time ,w.service_class AS class ,w.slot_count AS slots ,w.total_queue_time / 1000000 AS queue_seconds ,w.total_exec_time / 1000000 exec_seconds ,(w.total_queue_time + w.total_Exec_time) / 1000000 AS total_seconds FROM stl_wlm_query w LEFT JOIN stl_query q ON q.query = w.query AND q.userid = w.userid WHERE w.queue_start_Time >= dateadd(day,-7,CURRENT_DATE) AND w.total_queue_Time > 0 ORDER BY w.total_queue_time DESC ,w.queue_start_time DESC limit 35 Concurrency optimizations in WLM queue_seconds
  • 27. © King.com Ltd 2016 – Commercially confidential • Workload management (WLM) defines the number of query queues that are available and how queries are routed to those queues for processing Default configuration ["query_concurrency":5,] Current configuration ["query_concurrency":10,] Concurrency optimizations in WLM contd… Extensive tests need to be done to ensure no query runs out of memory
  • 28. © King.com Ltd 2016 – Commercially confidential Eliminate concurrent modification of data • All data upserts are handled by a single process • No concurrent writes • Performance of sequential batch queries are better than parallel small queries
  • 29. © King.com Ltd 2016 – Commercially confidential • Data merge 10% of data can get updated • Daily vacuum not sufficient as in 24 hours query performance is severely affected On demand vacuum based on the table state • Periodically monitor unsorted regions of tables + vacuum them when it’s above threshold X • Set threshold value per table • SVV_TABLE_INFO system view used to diagnose and address table design issues that can influence query performance, including issues with compression encoding, distribution keys, sort style, data distribution skew, table size, and statistics. • Less fluctuations, and therefore, predictable query performance PROBLEM SOLUTION
  • 30. © King.com Ltd 2016 – Commercially confidential Reduce the number of selected columns • Due to columnar model, extracting extra columns is more expensive compared to OLTP databases PROBLEM PERFORMANCE IMPACT SOLUTION • Query generation process optimized to select ONLY the columns that are required for a certain use-case. • Segmentation queries are automatically generated • Often requested more columns than necessary for the use-case
  • 31. © King.com Ltd 2016 – Commercially confidential Increase the batch size as much as possible • Increased performance: less selects performed • We operate at 5 million batch size (up from 100K) • Upper limit set by memory constraints on operational servers • But: Balance with data freshness requirements Batch Operations
  • 32. © King.com Ltd 2016 – Commercially confidential Reduce use of leader node as much as possible • Often, the leader node acts as a bottleneck • Extracting a large number of rows (Some segmentation queries return hundreds of millions of rows) • Aggregate calculations across distribution keys Problem Solution • Ensure data is unloaded to S3 (or other AWS channels) which the individual nodes can communicate directly with • Modify, if possible, queries to NOT span distribution keys. Each calculation can be performed in each node
  • 33. © King.com Ltd 2016 – Commercially confidential Technical recommendations • Use distribution keys that can be used in all joins • Migrate to natural keys • Reduce use of leader-node as much as possible • Column compression encoding • Data pre-processing outside the main tables • WLM optimizations • Increase batch-size as much as possible • Prohibit concurrent modification of data • Reduce selected columns • On-demand vacuum based on the state of the database
  • 34.
  • 35. Our vision: Fast, Cheap and Easy-to-use
  • 36. Think: Toaster You submit your job Choose a few options It runs
  • 37. Amazon Redshift Cluster Architecture Massively parallel, shared nothing Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing Compute nodes • Local, columnar storage • Executes queries in parallel • Load, backup, restore 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores S3 / EMR / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 38. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with row storage: – Need to read everything – Unnecessary I/O aid loc dt CREATE TABLE reinvent_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 39. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with columnar storage: – Only scan blocks for relevant column aid loc dt CREATE TABLE reinvent_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 40. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Columns grow and shrink independently • Effective compression ratios due to like data • Reduces storage requirements • Reduces I/O aid loc dt CREATE TABLE reinvent_deep_dive ( aid INT ENCODE LZO ,loc CHAR(3) ENCODE BYTEDICT ,dt DATE ENCODE RUNLENGTH );
  • 41. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 aid loc dt CREATE TABLE reinvent_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); • In-memory block metadata • Contains per-block MIN and MAX value • Effectively prunes blocks which cannot contain data for a given query • Eliminates unnecessary I/O
  • 42. SELECT COUNT(*) FROM reinvent_deep_dive WHERE DT = ‘09-JUNE-2013’ MIN: 01-JUNE-2013 MAX: 20-JUNE-2013 MIN: 08-JUNE-2013 MAX: 30-JUNE-2013 MIN: 12-JUNE-2013 MAX: 20-JUNE-2013 MIN: 02-JUNE-2013 MAX: 25-JUNE-2013 Unsorted Table MIN: 01-JUNE-2013 MAX: 06-JUNE-2013 MIN: 07-JUNE-2013 MAX: 12-JUNE-2013 MIN: 13-JUNE-2013 MAX: 18-JUNE-2013 MIN: 19-JUNE-2013 MAX: 24-JUNE-2013 Sorted By Date Zone Maps
  • 43. Compound Sort Keys • Records in Amazon Redshift are stored in blocks • For this illustration, let’s assume that four records fill a block • Records with a given cust_id are all in one block • However, records with a given prod_id are spread across four blocks
  • 44. Interleaved Sort Keys • Column values mapped in buckets and bits interleaved (order is maintained) • Data is sorted in equal measures for both keys • New values get assigned “others” bucket • User has to re-map and re-write the whole table to incorporate new mappings • Records with a given cust_id are spread across two blocks • Records with a given prod_id are also spread across two blocks
  • 45. Interleaved Sort Key - Limitations • Only makes sense on very large tables Table Size: Blocks per column per slice • Columns domain should be stable
  • 46. Data Distribution • Distribute data evenly for parallel processing • Minimize data movement • Co-located joins • Localized aggregations All Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Full table data on first slice of every node Distribution key Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Same key to same location Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Even Round robin distribution
  • 47. We help you migrate your database… AWS Schema Conversion Tool Current Sources: • Oracle • Teradata • Netezza • Greenplum • Redshift Data migration available though partners today… • Schema optimized for Amazon Redshift • Convert SQL inside your code
  • 48. QuickSight + Redshift Redshift is one the fastest growing services in the AWS platform. QuickSight seamlessly connects to Redshift giving you native access to all of your instances, and tables. Amazon Redshift Achieve high concurrency by offloading end user queries to SPICE Calculations can be done in SPICE reducing the load on the underlying database
  • 49. Parallelism considerations with Amazon Redshift slices DS2.8XL Compute Node Ingestion Throughput: • Each slice’s query processors can load one file at a time: • Streaming decompression • Parse • Distribute • Write Realizing only partial node usage as 6.25% of slices are active 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 50. Design considerations for Amazon Redshift slices Use at least as many input files as there are slices in the cluster With 16 input files, all slices are working so you maximize throughput COPY continues to scale linearly as you add nodes 16 Input Files DS2.8XL Compute Node 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 51. Optimizing a database for querying • Periodically check your table status • Vacuum and analyze regularly • SVV_TABLE_INFO • Missing statistics • Table skew • Uncompressed columns • Unsorted data • Check your cluster status • WLM queuing • Commit queuing • Database locks
  • 52. Missing statistics • Amazon Redshift query optimizer relies on up-to-date statistics • Statistics are necessary only for data that you are accessing • Updated stats important on: • SORTKEY • DISTKEY • Columns in query predicates
  • 53. Table skew • Unbalanced workload • Query completes as fast as the slowest slice completes • Can cause skew inflight: • Temp data fills a single node, resulting in query failure Table maintenance and status Unsorted table • Sortkey is just a guide, but data actually needs to be sorted • VACUUM or DEEP COPY to sort • Scans against unsorted tables continue to benefit from zone maps: • Load sequential blocks
  • 54. WLM queue Identify short/long-running queries and prioritize them Define multiple queues to route queries appropriately Default concurrency of 5 Leverage wlm_apex_hourly to tune WLM based on peak concurrency requirements Cluster status: commits and WLM Commit queue How long is your commit queue? • Identify needless transactions • Group dependent statements within a single transaction • Offload operational workloads • STL_COMMIT_STATS
  • 55. Open source tools https://github.com/awslabs/amazon-redshift-utils https://github.com/awslabs/amazon-redshift-monitoring https://github.com/awslabs/amazon-redshift-udfs Admin scripts Collection of utilities for running diagnostics on your cluster Admin views Collection of utilities for managing your cluster, generating schema DDL, etc. ColumnEncodingUtility Gives you the ability to apply optimal column encoding to an established schema with data already loaded
  • 57. Don’t Miss… BDA304 - What’s New with Amazon Redshift DAT202-R - [REPEAT] Migrating Your Data Warehouse to Amazon Redshift BDA203 - Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift BDM306-R - [REPEAT] Netflix: Using Amazon S3 as the fabric of our big data ecosystem
  • 58.