SlideShare a Scribd company logo
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift Best Practices
Part 2
May 2013
Eric Ferreira & John Loughlin
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
Introduction & Recap
Best Practices for
• Workload Migration
• Copy Command Options
• Vacuum
• Space Management
Q&A
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon DynamoDB
Fast, Predictable, Highly-Scalable NoSQL Data Store
Amazon RDS
Managed Relational Database Service for
MySQL, Oracle and SQL Server
Amazon ElastiCache
In-Memory Caching Service
Amazon Redshift
Fast, Powerful, Fully Managed, Petabyte-Scale
Data Warehouse Service
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
AWS Database
Services
Scalable High Performance
Application Storage in the Cloud
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift architecture
Leader Node
• SQL endpoint
• Postgres based
• Stores metadata
• Communicates with client
• Compiles queries
• Coordinates query execution
Compute Nodes
• Local, columnar storage
• Execute queries in parallel - slices
• Load, backup, restore via Amazon S3
Everything is mirrored
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
In Part 1…
This is Part 2 of the Redshift Best Practices series.
Visit:
http://aws.amazon.com/resources/databaseservices/webin
ars/
To watch Part 1.
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Workload Migration
ELT/ETL Process
• Load Atomic Data (target table or staging area)
• Transform data (include cleanup and aggregation)
• Prepare target tables for query/reports
• Includes Statistics gathering and vacuum
• Includes data retention policy
Re-evaluate to take advantage of cloud characteristics.
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Workload Migration cont.
Make provision for testing multiple options before you
migrate the production workflow
• Different number of nodes
• Few large nodes versus many small nodes (16xXL versus 2x8XL)
• WLM Settings
• Concurrency versus response time
• Different Sort and Distribution Keys
• Test both queries and load/vacuum times
• Compression
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Workload Best Practices
Organizing and keeping your load files in S3 allows for re-run or scenario testing
as you evolve your workflow in the platform.
• Keep in S3 or Glacier for fiscal/legal reasons
Data updated for short-term
• consider having a short-term version of the table for staging and a long term version once
data gets stable.
Round Robin distribution key
• When you don’t have a good Distribution Key
• Check Part 1 for query on checking for distribution skew
• Trade off with collocated joins
Loading the target (final) table
• Use a chronological date/timestamp columns for first sortkey. Vacuum is needed less often
and runs faster
• When first sort column has low cardinality/resolution (i.e, date instead of timestamp),
subsequent columns should match common filters and/or grouping columns
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Workload Best Practices cont.
Use UNLOAD command to archive data that is not needed for
business reasons
• Data that needs to exist only for fiscal/legal reasons can be re-loaded as
needed.
Consider applying retention policies less often than the regular
workflow
• Weekly/Monthly process during a less busy time
• Make space provision for the data growth
• Make sure all queries have date/timestamp range filters (> and <)
• Keep a sliding window of data to minimize block re-write during vacuum
Take manual snapshots to save status at specific mileposts (year-
end).
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Workload Best Practices cont.
Ratio between Load/Query performance needs
• Low ratio: Consider Load -> Snapshot -> Spin “Query” clusters -
> Tear down
• High ratio: Consider Performance above space needs when
choosing number of nodes
Normalization Rule of Thumb
• De-normalize only to avoid non-collocated joins
• Slow Changing Dimensions (type II): Keep normalized, match
distkey with fact table
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
COPY Command
COPY table_name [ (column1 [,column2, ...]) ]
FROM 's3://objectpath' [ WITH ] CREDENTIALS [AS] 'aws_access_credentials'
[ option [ ... ] ]
Options worth mentioning:
GZIP
• Using compressed files saves network bandwidth and can speed up loads.
MAXERROR and NOLOAD
• Default maxerror is 0. Set to a larger value while troubleshooting new data
stream
• Use with noload option to speed up file validation
STATUPDATE
• When loading significant amount of data to non empty table can update stats at
the end the load.
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
COPY Command Common Issues
UTF-8
• Currently redshift can only load well-formed uft-8 characters up to 3 bytes.
NULL AS and ESCAPE
• Common issues loading files can be circumvented with these options
• Narrow down to small set of rows and visually find what type of problem you
have
• Note that the error message might refer to a later portion. For example
“Delimiter not found” might be caused by a EOL that was not escaped.
DATEFORMAT and TIMEFORMAT
• Currently all date/timestamp columns have to use the same formatting
defined by the option
• Using ACCEPTANYDATE will not generate errors but load NULL when
format does not match
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
COPY Command Troubleshooting
STL_LOAD_ERRORS / STL_LOADERROR_DETAIL
• Find errors during specific loads
• You can create a view to simplify troubleshooting process
create view loadview as (select distinct tbl, trim(name) as table_name, query, starttime,
trim(filename) as input, line_number, colname, err_code, trim(err_reason) as reason from
stl_load_errors sl, stv_tbl_perm sp where sl.tbl = sp.id);
• Then you “select * from loadview where table_name = <table>” if you have any issues.
STL_LOAD_COMMITS / STL_FILE_SCAN / STL_S3CLIENT
• Load times for specific files. Confirms a given file was read
STL_S3CLIENT_ERROR
• Information about specific S3 or file transfer errors that happen during load
process
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
COPY Command – Historical Information
Look back to confirm number files and bytes loaded by
each COPY statement
select substring(q.querytxt,1,40) as querytxt, s.n_files, size_mb, s.time_seconds,
s.size_mb/decode(s.time_seconds,0,1,s.time_seconds) as mb_per_s
from (select query, count(*) as n_files,
sum(transfer_size/(1024*1024)) as size_MB, (max(end_Time) -
min(start_Time))/(1000000) as time_seconds , max(end_time) as end_time
from stl_s3client where query > 0 and transfer_time > 0 group by query ) as s
LEFT JOIN stl_Query as q on q.query = s.query
order by mb_per_s desc
limit 10
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
COPY Command – Historical Information
cont.
querytxt | n_files | size_mb | time_seconds | mb_per_s
--------------------------------------------------------------+---------+---------+--------------+----------
copy lineitem from 's3://tpc-h/100/lineitem.tbl.' credential | 603 | 22201 | 2390 | 9
copy lineitem from 's3://tpc-h/1/lineitem.tbl.' credentials | 34 | 192 | 21 | 8
copy customer from 's3://tpc-h/100/customer.tbl.' credential | 152 | 750 | 85 | 8
copy partsupp from 's3://tpc-h/100/partsupp.tbl.' credential | 82 | 2720 | 367 | 7
COPY ANALYZE part | 22 | 40 | 7 | 5
copy orders from 's3://tpc-h/100/orders.tbl.' credentials '' | 152 | 4800 | 1035 | 4
copy orders from 's3://tpc-h/1/orders.tbl.' credentials '' g | 34 | 32 | 7 | 4
copy part from 's3://tpc-h/100/part.tbl.' credentials '' gzi | 202 | 400 | 95 | 4
COPY ANALYZE supplier | 34 | 0 | 3 | 0
copy supplier from 's3://tpc-h/100/supplier.tbl.' credential | 102 | 0 | 10 | 0
(10 rows)
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Vacuum
Before Vacuum
• Data inserted goes to a “non-sorted” area at the end of the table
• As this area grows, query times grow
• Data deleted is “marked” in a special column
• As that column grows, query times grow
What vacuum does
• Non-sorted area gets sorted and integrated into the table
• Deleted rows are removed and blocks reorganized
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Vacuum cont.
• Vacuum takes advantage of sortkey and skips
blocks that don’t need to be modified.
• Vacuum is a maintenance type operation
• Only one vacuum can be running at a time
(cluster-wide)
• More Memory = Faster Vacuum
– set wlm_query_slot_count to 4;
• Keep track of Vacuum progress (ETA)
– SVV_VACUUM_PROGRESS
• Record vacuum details after to consider adjust
frequency
– SVV_VACUUM_SUMMARY
April/2013
May/2013
Unsorted
March/2013
May/2013
June/2013
April/2013
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Space Management
Redshift has a single pool of space used for tables and
temporary segments.
• Loads need 2.5 times the space of the data being loaded if table
has a sortkey
• Vacuum may need 2.5 times the size of the table.
Monitor the free space
• Performance Tab in the console
• Cloudwatch Alarms
• SQL
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Space Management cont.
Tables Sizes
select trim(pgdb.datname) as Database, trim(pgn.nspname) as
Schema,
trim(a.name) as Table, b.mbytes, a.rows
from ( select db_id, id, name, sum(rows) as rows
from stv_tbl_perm a group by db_id, id, name ) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes
from stv_blocklist group by tbl) b on a.id=b.tbl
order by mbytes desc, a.db_id, a.name;
Free Space
select sum(capacity)/1024 as capacity_gbytes,
sum(used)/1024 as used_gbytes,
(sum(capacity) - sum(used))/1024
as free_gbytes
from stv_partitions
where part_begin=0;
• Redshift allows you to resize your cluster up and down and across node
types. Online (read-only access).
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Summary
• Experiment to optimize your workflows
• Various STL/STV tables hold most information needed for
troubleshooting
• Space Management and Vacuum schedule should be
considered during implementation phase
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
More information
COPY Command
http://docs.aws.amazon.com/redshift/latest/dg/t_Loading_tables_with_the_COPY_command.html
Loads Troubleshooting
http://docs.aws.amazon.com/redshift/latest/dg/t_Troubleshooting_load_errors.html
Vacuum
http://docs.aws.amazon.com/redshift/latest/dg/t_Reclaiming_storage_space202.html
UNLOADING data
http://docs.aws.amazon.com/redshift/latest/dg/c_unloading_data.html
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Q&A

More Related Content

More from Amazon Web Services

Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
Amazon Web Services
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei server
Amazon Web Services
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSight
Amazon Web Services
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Amazon Web Services
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows
Amazon Web Services
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
Amazon Web Services
 

More from Amazon Web Services (20)

Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei server
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSight
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
 

Recently uploaded

4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
onlyfansmanagedau
 
Innovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & InnovationInnovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & Innovation
Operational Excellence Consulting
 
Digital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital ExcellenceDigital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital Excellence
Operational Excellence Consulting
 
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
taqyea
 
Innovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and DesignInnovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and Design
Chandresh Chudasama
 
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdfGarments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
Pridesys IT Ltd.
 
Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024
Adnet Communications
 
Chapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .pptChapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .ppt
ssuser567e2d
 
Call8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessingCall8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessing
➑➌➋➑➒➎➑➑➊➍
 
list of states and organizations .pdf
list of  states  and  organizations .pdflist of  states  and  organizations .pdf
list of states and organizations .pdf
Rbc Rbcua
 
The latest Heat Pump Manual from Newentide
The latest Heat Pump Manual from NewentideThe latest Heat Pump Manual from Newentide
The latest Heat Pump Manual from Newentide
JoeYangGreatMachiner
 
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Herman Kienhuis
 
GKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt PresentationGKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt Presentation
GraceKohler1
 
Industrial Tech SW: Category Renewal and Creation
Industrial Tech SW:  Category Renewal and CreationIndustrial Tech SW:  Category Renewal and Creation
Industrial Tech SW: Category Renewal and Creation
Christian Dahlen
 
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
taqyea
 
How HR Search Helps in Company Success.pdf
How HR Search Helps in Company Success.pdfHow HR Search Helps in Company Success.pdf
How HR Search Helps in Company Success.pdf
HumanResourceDimensi1
 
TIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup IndustryTIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup Industry
timesbpobusiness
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
my Pandit
 
DearbornMusic-KatherineJasperFullSailUni
DearbornMusic-KatherineJasperFullSailUniDearbornMusic-KatherineJasperFullSailUni
DearbornMusic-KatherineJasperFullSailUni
katiejasper96
 

Recently uploaded (20)

4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
 
Innovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & InnovationInnovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & Innovation
 
Digital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital ExcellenceDigital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital Excellence
 
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
 
Innovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and DesignInnovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and Design
 
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
 
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdfGarments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
 
Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024
 
Chapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .pptChapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .ppt
 
Call8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessingCall8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessing
 
list of states and organizations .pdf
list of  states  and  organizations .pdflist of  states  and  organizations .pdf
list of states and organizations .pdf
 
The latest Heat Pump Manual from Newentide
The latest Heat Pump Manual from NewentideThe latest Heat Pump Manual from Newentide
The latest Heat Pump Manual from Newentide
 
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
 
GKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt PresentationGKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt Presentation
 
Industrial Tech SW: Category Renewal and Creation
Industrial Tech SW:  Category Renewal and CreationIndustrial Tech SW:  Category Renewal and Creation
Industrial Tech SW: Category Renewal and Creation
 
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
 
How HR Search Helps in Company Success.pdf
How HR Search Helps in Company Success.pdfHow HR Search Helps in Company Success.pdf
How HR Search Helps in Company Success.pdf
 
TIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup IndustryTIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup Industry
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
 
DearbornMusic-KatherineJasperFullSailUni
DearbornMusic-KatherineJasperFullSailUniDearbornMusic-KatherineJasperFullSailUni
DearbornMusic-KatherineJasperFullSailUni
 

AWS Webcast - Amazon Redshift Best Practices Part 2 – Performance

  • 1. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon Redshift Best Practices Part 2 May 2013 Eric Ferreira & John Loughlin
  • 2. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Agenda Introduction & Recap Best Practices for • Workload Migration • Copy Command Options • Vacuum • Space Management Q&A
  • 3. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon DynamoDB Fast, Predictable, Highly-Scalable NoSQL Data Store Amazon RDS Managed Relational Database Service for MySQL, Oracle and SQL Server Amazon ElastiCache In-Memory Caching Service Amazon Redshift Fast, Powerful, Fully Managed, Petabyte-Scale Data Warehouse Service Compute Storage AWS Global Infrastructure Database Application Services Deployment & Administration Networking AWS Database Services Scalable High Performance Application Storage in the Cloud
  • 4. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon Redshift architecture Leader Node • SQL endpoint • Postgres based • Stores metadata • Communicates with client • Compiles queries • Coordinates query execution Compute Nodes • Local, columnar storage • Execute queries in parallel - slices • Load, backup, restore via Amazon S3 Everything is mirrored 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 5. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. In Part 1… This is Part 2 of the Redshift Best Practices series. Visit: http://aws.amazon.com/resources/databaseservices/webin ars/ To watch Part 1.
  • 6. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Workload Migration ELT/ETL Process • Load Atomic Data (target table or staging area) • Transform data (include cleanup and aggregation) • Prepare target tables for query/reports • Includes Statistics gathering and vacuum • Includes data retention policy Re-evaluate to take advantage of cloud characteristics.
  • 7. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Workload Migration cont. Make provision for testing multiple options before you migrate the production workflow • Different number of nodes • Few large nodes versus many small nodes (16xXL versus 2x8XL) • WLM Settings • Concurrency versus response time • Different Sort and Distribution Keys • Test both queries and load/vacuum times • Compression
  • 8. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Workload Best Practices Organizing and keeping your load files in S3 allows for re-run or scenario testing as you evolve your workflow in the platform. • Keep in S3 or Glacier for fiscal/legal reasons Data updated for short-term • consider having a short-term version of the table for staging and a long term version once data gets stable. Round Robin distribution key • When you don’t have a good Distribution Key • Check Part 1 for query on checking for distribution skew • Trade off with collocated joins Loading the target (final) table • Use a chronological date/timestamp columns for first sortkey. Vacuum is needed less often and runs faster • When first sort column has low cardinality/resolution (i.e, date instead of timestamp), subsequent columns should match common filters and/or grouping columns
  • 9. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Workload Best Practices cont. Use UNLOAD command to archive data that is not needed for business reasons • Data that needs to exist only for fiscal/legal reasons can be re-loaded as needed. Consider applying retention policies less often than the regular workflow • Weekly/Monthly process during a less busy time • Make space provision for the data growth • Make sure all queries have date/timestamp range filters (> and <) • Keep a sliding window of data to minimize block re-write during vacuum Take manual snapshots to save status at specific mileposts (year- end).
  • 10. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Workload Best Practices cont. Ratio between Load/Query performance needs • Low ratio: Consider Load -> Snapshot -> Spin “Query” clusters - > Tear down • High ratio: Consider Performance above space needs when choosing number of nodes Normalization Rule of Thumb • De-normalize only to avoid non-collocated joins • Slow Changing Dimensions (type II): Keep normalized, match distkey with fact table
  • 11. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. COPY Command COPY table_name [ (column1 [,column2, ...]) ] FROM 's3://objectpath' [ WITH ] CREDENTIALS [AS] 'aws_access_credentials' [ option [ ... ] ] Options worth mentioning: GZIP • Using compressed files saves network bandwidth and can speed up loads. MAXERROR and NOLOAD • Default maxerror is 0. Set to a larger value while troubleshooting new data stream • Use with noload option to speed up file validation STATUPDATE • When loading significant amount of data to non empty table can update stats at the end the load.
  • 12. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. COPY Command Common Issues UTF-8 • Currently redshift can only load well-formed uft-8 characters up to 3 bytes. NULL AS and ESCAPE • Common issues loading files can be circumvented with these options • Narrow down to small set of rows and visually find what type of problem you have • Note that the error message might refer to a later portion. For example “Delimiter not found” might be caused by a EOL that was not escaped. DATEFORMAT and TIMEFORMAT • Currently all date/timestamp columns have to use the same formatting defined by the option • Using ACCEPTANYDATE will not generate errors but load NULL when format does not match
  • 13. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. COPY Command Troubleshooting STL_LOAD_ERRORS / STL_LOADERROR_DETAIL • Find errors during specific loads • You can create a view to simplify troubleshooting process create view loadview as (select distinct tbl, trim(name) as table_name, query, starttime, trim(filename) as input, line_number, colname, err_code, trim(err_reason) as reason from stl_load_errors sl, stv_tbl_perm sp where sl.tbl = sp.id); • Then you “select * from loadview where table_name = <table>” if you have any issues. STL_LOAD_COMMITS / STL_FILE_SCAN / STL_S3CLIENT • Load times for specific files. Confirms a given file was read STL_S3CLIENT_ERROR • Information about specific S3 or file transfer errors that happen during load process
  • 14. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. COPY Command – Historical Information Look back to confirm number files and bytes loaded by each COPY statement select substring(q.querytxt,1,40) as querytxt, s.n_files, size_mb, s.time_seconds, s.size_mb/decode(s.time_seconds,0,1,s.time_seconds) as mb_per_s from (select query, count(*) as n_files, sum(transfer_size/(1024*1024)) as size_MB, (max(end_Time) - min(start_Time))/(1000000) as time_seconds , max(end_time) as end_time from stl_s3client where query > 0 and transfer_time > 0 group by query ) as s LEFT JOIN stl_Query as q on q.query = s.query order by mb_per_s desc limit 10
  • 15. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. COPY Command – Historical Information cont. querytxt | n_files | size_mb | time_seconds | mb_per_s --------------------------------------------------------------+---------+---------+--------------+---------- copy lineitem from 's3://tpc-h/100/lineitem.tbl.' credential | 603 | 22201 | 2390 | 9 copy lineitem from 's3://tpc-h/1/lineitem.tbl.' credentials | 34 | 192 | 21 | 8 copy customer from 's3://tpc-h/100/customer.tbl.' credential | 152 | 750 | 85 | 8 copy partsupp from 's3://tpc-h/100/partsupp.tbl.' credential | 82 | 2720 | 367 | 7 COPY ANALYZE part | 22 | 40 | 7 | 5 copy orders from 's3://tpc-h/100/orders.tbl.' credentials '' | 152 | 4800 | 1035 | 4 copy orders from 's3://tpc-h/1/orders.tbl.' credentials '' g | 34 | 32 | 7 | 4 copy part from 's3://tpc-h/100/part.tbl.' credentials '' gzi | 202 | 400 | 95 | 4 COPY ANALYZE supplier | 34 | 0 | 3 | 0 copy supplier from 's3://tpc-h/100/supplier.tbl.' credential | 102 | 0 | 10 | 0 (10 rows)
  • 16. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Vacuum Before Vacuum • Data inserted goes to a “non-sorted” area at the end of the table • As this area grows, query times grow • Data deleted is “marked” in a special column • As that column grows, query times grow What vacuum does • Non-sorted area gets sorted and integrated into the table • Deleted rows are removed and blocks reorganized
  • 17. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Vacuum cont. • Vacuum takes advantage of sortkey and skips blocks that don’t need to be modified. • Vacuum is a maintenance type operation • Only one vacuum can be running at a time (cluster-wide) • More Memory = Faster Vacuum – set wlm_query_slot_count to 4; • Keep track of Vacuum progress (ETA) – SVV_VACUUM_PROGRESS • Record vacuum details after to consider adjust frequency – SVV_VACUUM_SUMMARY April/2013 May/2013 Unsorted March/2013 May/2013 June/2013 April/2013
  • 18. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Space Management Redshift has a single pool of space used for tables and temporary segments. • Loads need 2.5 times the space of the data being loaded if table has a sortkey • Vacuum may need 2.5 times the size of the table. Monitor the free space • Performance Tab in the console • Cloudwatch Alarms • SQL
  • 19. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Space Management cont. Tables Sizes select trim(pgdb.datname) as Database, trim(pgn.nspname) as Schema, trim(a.name) as Table, b.mbytes, a.rows from ( select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name ) as a join pg_class as pgc on pgc.oid = a.id join pg_namespace as pgn on pgn.oid = pgc.relnamespace join pg_database as pgdb on pgdb.oid = a.db_id join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl order by mbytes desc, a.db_id, a.name; Free Space select sum(capacity)/1024 as capacity_gbytes, sum(used)/1024 as used_gbytes, (sum(capacity) - sum(used))/1024 as free_gbytes from stv_partitions where part_begin=0; • Redshift allows you to resize your cluster up and down and across node types. Online (read-only access).
  • 20. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Summary • Experiment to optimize your workflows • Various STL/STV tables hold most information needed for troubleshooting • Space Management and Vacuum schedule should be considered during implementation phase
  • 21. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. More information COPY Command http://docs.aws.amazon.com/redshift/latest/dg/t_Loading_tables_with_the_COPY_command.html Loads Troubleshooting http://docs.aws.amazon.com/redshift/latest/dg/t_Troubleshooting_load_errors.html Vacuum http://docs.aws.amazon.com/redshift/latest/dg/t_Reclaiming_storage_space202.html UNLOADING data http://docs.aws.amazon.com/redshift/latest/dg/c_unloading_data.html
  • 22. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 23. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Q&A

Editor's Notes

  1. Usual Progression: Steps that happen at a certain frequency (daily, hourly, weekly)
  2. If your data has updates in the short term, consider having a short-term version of the table for staging and a long term version once data gets stable - Example: Orders stay on a short term table while in process and goes to