SlideShare a Scribd company logo
1 of 23
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
GPS: Optimizing Tips: Amazon
Redshift for Cloud Data
WarehousingM i k e K a l b e r e r , S o l u t i o n s A r c h i t e c t
A n u p a m M i s h r a , P r i n c i p a l S o l u t i o n s A r c h i t e c t
G P S T E C 3 1 5
N o v e m b e r 2 8 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
Service overview of Amazon Redshift
Typical node architecture
Optimization tips
Demonstration
Question & Answer (approx. 15 minutes)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customers Using Amazon Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Service Overview
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift
• Fast, simple, secure, cost-effective data warehousing
• Petabyte-scale data warehousing
• Query your Amazon S3 “data lake”
• No up-front costs
Amazon
Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Node Architecture
Leader node
• Optimizes
• Compiles
• Coordinates query execution with the
compute nodes
Compute nodes
• Parallel execution of queries
• Local, columnar storage
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Node Architecture
• Each slice has a portion of the overall
node’s memory and disk
• The number of slices per node is
determined by the node size of the cluster
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Load Data Efficiently
Load data using COPY command
• Leverages all nodes in the cluster to load data in parallel
Compress the files whenever possible
• COPY command will automatically perform compression analysis
• Manually specify compression with help from ANALYZE COMPRESSION command
Splitting your data into multiple files
• File count should be multiple of the number of slices in cluster
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift
• Up-front table design is critical
• DISTKEY and SORTKEY influence performance
• PRIMARY KEY, FOREIGN KEY, UNIQUE constraints will
help (not enforced)
• Compress for faster speed and lower cost
Amazon
Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sort Keys
WHERE to use sort keys
• Create on columns which are most commonly used in WHERE clauses
COMPOUND sort key
• Good for known query patterns
• Time-series data
INTERLEAVED sort key
• Gives equal weight to each column in the sort key
• Large tables (> billion rows)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Distribution Keys
• With an uneven
distribution, your
queries will complete
as fast as your slowest
slice
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Distribution Keys
Skewed distribution
• Consider changing distribution key to a column that exhibits high cardinality and
uniform distribution
• Run TABLE_INSPECTOR.SQL against tables to analyze data skew
• Try EVEN distribution if there is no good distribution key in record set
• If you create a new table with the same data as the USERS table but set
the DISTSTYLE to EVEN, rows are always evenly distributed across slices
• Use DISTYLE ALL for smaller tables
• Distributes table data to every node
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Table Column Size
Large VARCHAR columns
• Complex queries may need to store data in temporary tables
• Temporary tables are not compressed (consume extra memory and disk)
• Use the smallest possible column size for the use case
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Disk-Based Queries
Unnecessary I/O if queries need to write to disk—a performance hit when compared
to queries executed in memory
Identify queries writing to disk
• SELECT distinct query FROM svl_query_summary WHERE is_diskbased='t‘
Queue assignment rules
• Allow additional memory allocation for selected queue
WLM dynamic memory allocation
• Increase allocated memory to specific sessions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Commit Usage
Single commit queue
• Commits are expensive
• Cause queries to wait
• Identify unnecessary transactions
• Group dependent statements
• Check commit queue with STL_COMMIT_STATS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Commit Usage
• Check commit queue with STL_COMMIT_STATS
• Table only visible to superusers
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WLM Queue
• Two queues created by default
• Superuser
• Only use this queue when you need to run queries that affect the
system or for troubleshooting purposes
• Default user
• The default queue is initially configured to run five queries
concurrently; you can change the concurrency, timeout, and memory-
allocation properties for the default queue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WLM Queue
• Identify short- and long-running queries and prioritize them
• Define multiple queues to route queries
• Use WLM_APEX_HOURLY.SQL to tune on peak concurrency
• https://github.com/awslabs/amazon-redshift-
utils/blob/master/src/AdminScripts/wlm_apex_hourly.sql
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WLM Queue Routing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Table Maintenance
Table statistics missing or out of date
• Run ANALYZE weekly for all columns
• Run ANALYZE when loading popular columns
Vacuum tables
• As often as possible (weekly)
Use AWS AnalyzeVacuumUtility tool
• https://github.com/awslabs/amazon-redshift-utils/tree/master/src/AnalyzeVacuumUtility
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Optimization Demo
• WLM queue optimization
• Updating stale table statistics
• Amazon Redshift Spectrum
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Question & Answer
Mike Kalberer—mjk@amazon.com
Anupam Mishra—anupamm@amazon.com
https://github.com/awslabs/amazon-redshift-
utils/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
G P S T E C 3 1 5

More Related Content

What's hot

CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2Amazon Web Services
 
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...Amazon Web Services
 
GPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital MarketsGPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital MarketsAmazon Web Services
 
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and BeyondGPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and BeyondAmazon Web Services
 
ABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueAmazon Web Services
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Amazon Web Services
 
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...Amazon Web Services
 
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDGPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDAmazon Web Services
 
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017Amazon Web Services
 
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise WorkloadsDAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise WorkloadsAmazon Web Services
 
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...Amazon Web Services
 
MCL302_Maximizing the Customer Experience with AI on AWS
MCL302_Maximizing the Customer Experience with AI on AWSMCL302_Maximizing the Customer Experience with AI on AWS
MCL302_Maximizing the Customer Experience with AI on AWSAmazon Web Services
 
An Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale MigrationsAn Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale MigrationsAmazon Web Services
 
Build a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersBuild a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersAmazon Web Services
 
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017Amazon Web Services
 
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...Amazon Web Services
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017Amazon Web Services
 
STG205_#EarthOnAWS How NASA is Using AWS
STG205_#EarthOnAWS How NASA is Using AWSSTG205_#EarthOnAWS How NASA is Using AWS
STG205_#EarthOnAWS How NASA is Using AWSAmazon Web Services
 
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...Amazon Web Services
 
GPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial IntelligenceGPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial IntelligenceAmazon Web Services
 

What's hot (20)

CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
 
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
 
GPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital MarketsGPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital Markets
 
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and BeyondGPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
 
ABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS Glue
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
 
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
 
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDGPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
 
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017
 
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise WorkloadsDAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
 
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...
 
MCL302_Maximizing the Customer Experience with AI on AWS
MCL302_Maximizing the Customer Experience with AI on AWSMCL302_Maximizing the Customer Experience with AI on AWS
MCL302_Maximizing the Customer Experience with AI on AWS
 
An Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale MigrationsAn Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale Migrations
 
Build a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersBuild a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million users
 
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
 
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
 
STG205_#EarthOnAWS How NASA is Using AWS
STG205_#EarthOnAWS How NASA is Using AWSSTG205_#EarthOnAWS How NASA is Using AWS
STG205_#EarthOnAWS How NASA is Using AWS
 
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...
DynamoDB adaptive capacity: smooth performance for chaotic workloads - DAT327...
 
GPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial IntelligenceGPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial Intelligence
 

Similar to Optimize Cloud Data Warehousing with Amazon Redshift Tips

Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Amazon Web Services
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Amazon Web Services
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSAmazon Web Services
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceAmazon Web Services
 
DAT203_Running MySQL Databases on AWS
DAT203_Running MySQL Databases on AWSDAT203_Running MySQL Databases on AWS
DAT203_Running MySQL Databases on AWSAmazon Web Services
 
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)Amazon Web Services
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Web Services
 
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Amazon Web Services
 
Use AWS DMS to Securely Migrate Your Oracle Database to Amazon Aurora with Mi...
Use AWS DMS to Securely Migrate Your Oracle Database to Amazon Aurora with Mi...Use AWS DMS to Securely Migrate Your Oracle Database to Amazon Aurora with Mi...
Use AWS DMS to Securely Migrate Your Oracle Database to Amazon Aurora with Mi...Amazon Web Services
 
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017Amazon Web Services
 
AWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 RecapAWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 RecapAPI Talent
 
What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018
What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018
What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018Amazon Web Services
 
Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...
Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...
Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...Amazon Web Services
 
Relational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill BaldwinRelational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill BaldwinAmazon Web Services
 
WIN301-Migrating Microsoft SQL Server Databases to AWS-Best Practices and Pat...
WIN301-Migrating Microsoft SQL Server Databases to AWS-Best Practices and Pat...WIN301-Migrating Microsoft SQL Server Databases to AWS-Best Practices and Pat...
WIN301-Migrating Microsoft SQL Server Databases to AWS-Best Practices and Pat...Amazon Web Services
 
Migrating Microsoft SQL Server Databases to AWS – Best Practices and Patterns...
Migrating Microsoft SQL Server Databases to AWS – Best Practices and Patterns...Migrating Microsoft SQL Server Databases to AWS – Best Practices and Patterns...
Migrating Microsoft SQL Server Databases to AWS – Best Practices and Patterns...Amazon Web Services
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Relational Database Services on AWS
Relational Database Services on AWSRelational Database Services on AWS
Relational Database Services on AWSAmazon Web Services
 

Similar to Optimize Cloud Data Warehousing with Amazon Redshift Tips (20)

Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
 
DAT203_Running MySQL Databases on AWS
DAT203_Running MySQL Databases on AWSDAT203_Running MySQL Databases on AWS
DAT203_Running MySQL Databases on AWS
 
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
 
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
 
Use AWS DMS to Securely Migrate Your Oracle Database to Amazon Aurora with Mi...
Use AWS DMS to Securely Migrate Your Oracle Database to Amazon Aurora with Mi...Use AWS DMS to Securely Migrate Your Oracle Database to Amazon Aurora with Mi...
Use AWS DMS to Securely Migrate Your Oracle Database to Amazon Aurora with Mi...
 
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
 
AWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 RecapAWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 Recap
 
What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018
What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018
What's New in Amazon Relational Database Service (DAT203) - AWS re:Invent 2018
 
SQL Server on AWS
SQL Server on AWSSQL Server on AWS
SQL Server on AWS
 
Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...
Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...
Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...
 
Relational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill BaldwinRelational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill Baldwin
 
WIN301-Migrating Microsoft SQL Server Databases to AWS-Best Practices and Pat...
WIN301-Migrating Microsoft SQL Server Databases to AWS-Best Practices and Pat...WIN301-Migrating Microsoft SQL Server Databases to AWS-Best Practices and Pat...
WIN301-Migrating Microsoft SQL Server Databases to AWS-Best Practices and Pat...
 
Migrating Microsoft SQL Server Databases to AWS – Best Practices and Patterns...
Migrating Microsoft SQL Server Databases to AWS – Best Practices and Patterns...Migrating Microsoft SQL Server Databases to AWS – Best Practices and Patterns...
Migrating Microsoft SQL Server Databases to AWS – Best Practices and Patterns...
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
Relational Database Services on AWS
Relational Database Services on AWSRelational Database Services on AWS
Relational Database Services on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Optimize Cloud Data Warehousing with Amazon Redshift Tips

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT GPS: Optimizing Tips: Amazon Redshift for Cloud Data WarehousingM i k e K a l b e r e r , S o l u t i o n s A r c h i t e c t A n u p a m M i s h r a , P r i n c i p a l S o l u t i o n s A r c h i t e c t G P S T E C 3 1 5 N o v e m b e r 2 8 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Service overview of Amazon Redshift Typical node architecture Optimization tips Demonstration Question & Answer (approx. 15 minutes)
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customers Using Amazon Redshift
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Service Overview
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift • Fast, simple, secure, cost-effective data warehousing • Petabyte-scale data warehousing • Query your Amazon S3 “data lake” • No up-front costs Amazon Redshift
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift Node Architecture Leader node • Optimizes • Compiles • Coordinates query execution with the compute nodes Compute nodes • Parallel execution of queries • Local, columnar storage
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift Node Architecture • Each slice has a portion of the overall node’s memory and disk • The number of slices per node is determined by the node size of the cluster
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Load Data Efficiently Load data using COPY command • Leverages all nodes in the cluster to load data in parallel Compress the files whenever possible • COPY command will automatically perform compression analysis • Manually specify compression with help from ANALYZE COMPRESSION command Splitting your data into multiple files • File count should be multiple of the number of slices in cluster
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift • Up-front table design is critical • DISTKEY and SORTKEY influence performance • PRIMARY KEY, FOREIGN KEY, UNIQUE constraints will help (not enforced) • Compress for faster speed and lower cost Amazon Redshift
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sort Keys WHERE to use sort keys • Create on columns which are most commonly used in WHERE clauses COMPOUND sort key • Good for known query patterns • Time-series data INTERLEAVED sort key • Gives equal weight to each column in the sort key • Large tables (> billion rows)
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Distribution Keys • With an uneven distribution, your queries will complete as fast as your slowest slice
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Distribution Keys Skewed distribution • Consider changing distribution key to a column that exhibits high cardinality and uniform distribution • Run TABLE_INSPECTOR.SQL against tables to analyze data skew • Try EVEN distribution if there is no good distribution key in record set • If you create a new table with the same data as the USERS table but set the DISTSTYLE to EVEN, rows are always evenly distributed across slices • Use DISTYLE ALL for smaller tables • Distributes table data to every node
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Table Column Size Large VARCHAR columns • Complex queries may need to store data in temporary tables • Temporary tables are not compressed (consume extra memory and disk) • Use the smallest possible column size for the use case
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Disk-Based Queries Unnecessary I/O if queries need to write to disk—a performance hit when compared to queries executed in memory Identify queries writing to disk • SELECT distinct query FROM svl_query_summary WHERE is_diskbased='t‘ Queue assignment rules • Allow additional memory allocation for selected queue WLM dynamic memory allocation • Increase allocated memory to specific sessions
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Commit Usage Single commit queue • Commits are expensive • Cause queries to wait • Identify unnecessary transactions • Group dependent statements • Check commit queue with STL_COMMIT_STATS
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Commit Usage • Check commit queue with STL_COMMIT_STATS • Table only visible to superusers
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WLM Queue • Two queues created by default • Superuser • Only use this queue when you need to run queries that affect the system or for troubleshooting purposes • Default user • The default queue is initially configured to run five queries concurrently; you can change the concurrency, timeout, and memory- allocation properties for the default queue
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WLM Queue • Identify short- and long-running queries and prioritize them • Define multiple queues to route queries • Use WLM_APEX_HOURLY.SQL to tune on peak concurrency • https://github.com/awslabs/amazon-redshift- utils/blob/master/src/AdminScripts/wlm_apex_hourly.sql
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WLM Queue Routing
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Table Maintenance Table statistics missing or out of date • Run ANALYZE weekly for all columns • Run ANALYZE when loading popular columns Vacuum tables • As often as possible (weekly) Use AWS AnalyzeVacuumUtility tool • https://github.com/awslabs/amazon-redshift-utils/tree/master/src/AnalyzeVacuumUtility
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift Optimization Demo • WLM queue optimization • Updating stale table statistics • Amazon Redshift Spectrum
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Question & Answer Mike Kalberer—mjk@amazon.com Anupam Mishra—anupamm@amazon.com https://github.com/awslabs/amazon-redshift- utils/
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! G P S T E C 3 1 5