SlideShare a Scribd company logo
1 of 40
Download to read offline
Sao Paulo
Amazon Redshift Deep Dive
Eric Ferreira
AWS
Wanderlei Paiva
Movile
Amazon Redshift system architecture
• Leader node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB, Amazon EMR, or SSH
• Two hardware platforms
– Optimized for data processing
– DW1: HDD; scale from 2TB to 2PB
– DW2: SSD; scale from 160GB to 326TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
A deeper look at compute node architecture
• Each node is split into slices
– One slice per core
– DW1 – 2 slices on XL, 16 on 8XL
– DW2 – 2 slices on L, 32 on 8XL
• Each slice is allocated memory,
CPU, and disk space
• Each slice processes a piece of
the workload in parallel
Leader Node
Implications for ingestion
Use multiple input files to maximize
throughput
• Use the COPY command
• Each slice can load one file at
a time
• A single input file means only
one slice is ingesting data
• Instead of 100MB/s, you’re
only getting 6.25MB/s
Use multiple input files to maximize
throughput
• Use the COPY command
• You need at least as many
input files as you have slices
• With 16 input files, all slices
are working so you maximize
throughput
• Get 100MB/s per node; scale
linearly as you add nodes
Primary keys and manifest files
• Amazon Redshift doesn’t enforce primary key
constraints
– If you load data multiple times, Amazon Redshift won’t complain
– If you declare primary keys in your DML, the optimizer will expect
the data to be unique
• Use manifest files to control exactly what is loaded
and how to respond if input files are missing
– Define a JSON manifest on Amazon S3
– Ensures the cluster loads exactly what you want
Analyze sort/dist key columns after every load
• Amazon Redshift’s query
optimizer relies on up-to-
date statistics
• Maximize performance by
updating stats on sort/dist
key columns after every
load
Automatic compression is a good thing (mostly)
• Better performance, lower costs
• COPY samples data automatically when loading into an
empty table
– Samples up to 100,000 rows and picks optimal encoding
• If you have a regular ETL process and you use temp tables or
staging tables, turn off automatic compression
– Use analyze compression to determine the right encodings
– Bake those encodings into your DML
Be careful when compressing your sort keys
• Zone maps store min/max per block
• Once we know which block(s) contain the
range, we know which row offsets to scan
• Highly compressed sort keys means many
rows per block
• You’ll scan more data blocks than you need
• If your sort keys compress significantly
more than your data columns, you may
want to skip compression
Keep your columns as narrow as possible
• During queries and ingestion,
the system allocates buffers
based on column width
• Wider than needed columns
mean memory is wasted
• Fewer rows fit into memory;
increased likelihood of
queries spilling to disk
Customer Testimony .
Wanderlei Paiva
“Com os serviços da AWS
pudemos dosar os investimento
iniciais e prospectar os custos
para expansões futuras”
• Líder em Mobile Commerce na América Latina
– 50 Milhões de pessoas usam serviços Movile todo mês
– Estamos conectados a + de 70 Operadoras em toda
América
– + de 50 Bilhões de transações por ano
– + de 700 colaboradores em 11 escritórios (AL e EUA)
• PlayKids
– 10M de downloads / 3M usuários ativos
– Conteúdo licenciado em 27 países e usuários em 102
países (6 idiomas: português, inglês, espanhol, alemão,
francês e chines)
– App #1 top grossing for Kids in Apple Store
“O Redshift nos
permitiu
transformar dados
em informações
self-service”
- Wanderley Paiva
Database Specialist
PlayKids iFood MapLink Apontador
Rapiddo Superplayer Cinepapaya ChefTim
e
O Desafio
• Escalabilidade
• Disponibilidade
• Centralização dos dados
• Custos reduzidos e
preferencialmente diluído
Solução
Solução v2
Expanding Amazon Redshift’s query
capabilities
New SQL functions
• We add SQL functions regularly to expand Amazon Redshift’s query
capabilities
• Added 25+ window and aggregate functions since launch, including:
– APPROXIMATE_COUNT
– DROP IF EXISTS, CREATE IF NOT EXISTS
– REGEXP_SUBSTR, _COUNT, _INSTR, _REPLACE
– PERCENTILE_CONT, _DISC, MEDIAN
– PERCENT_RANK, RATIO_TO_REPORT
• We’ll continue iterating but also want to enable you to write your own
User Defined Functions
• We’re enabling User Defined Functions (UDFs) so you can
add your own
– Scalar and Aggregate Functions supported
• You’ll be able to write UDFs using Python 2.7
– Syntax is largely identical to PostgreSQL UDF Syntax
– System and network calls within UDFs are prohibited
• Comes with Pandas, NumPy, and SciPy pre-installed
– You’ll also be able import your own libraries for even more flexibility
Scalar UDF example – URL parsing
CREATE FUNCTION f_hostname (VARCHAR url)
RETURNS varchar
IMMUTABLE AS $$
import urlparse
return urlparse.urlparse(url).hostname
$$ LANGUAGE plpythonu;
Multidimensional indexing
with space filling curves
You’re a small Internet bookstore
• You’re interested in
how you’re doing
– Total sales
– Best customers
– Best-selling items
– Top-selling author this
month
• A row store with
indexes works well
Orders
Product
Time
Customer
Site
You get a little bigger
• Your queries start taking
longer
• You move to a column store
• Now you have zone maps,
large data blocks, but no
indexes
• You have to choose which
queries you want to be fast
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
Today’s state of the art: Zone maps, sorting,
projections
• Zone maps store the min/max values for every block in memory
• Works great for sorted columns
– O(log n) access to blocks
• Doesn’t work so well for unsorted columns
– O(n) access
• Projections are multiple copies of data sorted different ways
– Optimizer decides which copy to use for responding to queries
– Loads are slower
– Gets unwieldy quickly. If you have 8 columns, you need have 8 factorial (40,320) combinations.
Blocks are points in multidimensional space
00 01 10 11
00
01
10
11
Customers
Products
00 01 10 11
00
01
10
11
Customers
Products
• The 2D tables on
the left are over-
specified
• You don’t need
every product or
customer to be in
consecutive rows
• You just need to
make sure that
each appears in the
right sequence
Space filling curves
00 01 10 11
00
01
10
11
Customers
Products
• You need a way of traversing
the space that preserves order
• And you need to touch every
point in the space
• You need a space filling curve
– There are many of these, for example the
curve on the right
• Products appear in order as
do customers and you don’t
favor one over the other
Compound Sort Keys Illustrated
• Records in Redshift
are stored in blocks.
• For this illustration,
let’s assume that
four records fill a
block
• Records with a given
cust_id are all in one
block
• However, records
with a given prod_id
are spread across
four blocks
1
1
1
1
2
3
4
1
4
4
4
2
3
4
4
1
3
3
3
2
3
4
3
1
2
2
2
2
3
4
2
1
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
cust_id prod_id other columns blocks
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
Interleaved Sort Keys Illustrated
• Records with a given
cust_id are spread
across two blocks
• Records with a given
prod_id are also
spread across two
blocks
• Data is sorted in equal
measures for both
keys
1
1
2
2
2
1
2
3
3
4
4
4
3
4
3
1
3
4
4
2
1
2
3
3
1
2
2
4
3
4
1
1
cust_id prod_id other columns blocks
How to use the feature
• New keyword ‘INTERLEAVED’ when defining sort keys
– Existing syntax will still work and behavior is unchanged
– You can choose up to 8 columns to include and can query with any or all of them
• No change needed to queries
• We’re just getting started with this feature
– Benefits are significant; load penalty is higher than we’d like and we’ll fix that quickly
• Will be available in a couple of weeks and we’d love to get your feedback
[[ COMPOUND | INTERLEAVED ] SORTKEY ( column_name [, ...] ) ]
Migration Considerations
Forklift = BAD
Typical ETL/ELT
• One file per table, maybe a few if too big
• Many updates (“massage” the data)
• Every job clears the data, then load
• Count on PK to block double loads
• High concurrency of load jobs
• Small table(s) to control the job stream
Two Questions to ask
• Why you do what you do?
– Many times they don’t even know
• What is the customer need ?
– Many times needs do not match practice
– You might have to add other AWS solutions
On Redshift
• Updates are delete + insert
– Deletes just mark rows for deletion
• Commits are expensive
– 4GB write on 8XL per node
– Mirrors WHOLE dictionary
– Cluster-wide Serialized
On Redshift
• Not all Aggregations created equal
– Pre aggregation can help
• Concurrency should be low
• No dashboard connected directly to RS
• WLM only parcels RAM to sessions, not priority
• Compression is for speed as well
• Distkey, Sortkey and datatypes are important
Not all MPP/Columnar are the same
• Only RS can DIST STYLE ALL and have a copy
per node (not per slice/core)
• Some columnar DB have a row-based version of
data on insert. Or a option for it
• RS does not charge millions of dollars and come
do the work for you.
Open Source Tools
• https://github.com/awslabs/amazon-redshift-utils
• Admin Scripts
– Collection of utilities for running diagnostics on your Cluster
• Admin Views
– Collection of utilities for managing your Cluster, generating Schema
DDL, etc
• Column Encoding Utility
– Gives you the ability to apply optimal Column Encoding to an
established Schema with data already loaded
Sao Paulo

More Related Content

What's hot

Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceAmazon Web Services
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon RedshiftAmazon Web Services
 
Scalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query SpeedScalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query SpeedFlyData Inc.
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Web Services
 
Building AWS Redshift Data Warehouse with Matillion and Tableau
Building AWS Redshift Data Warehouse with Matillion and TableauBuilding AWS Redshift Data Warehouse with Matillion and Tableau
Building AWS Redshift Data Warehouse with Matillion and TableauLynn Langit
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Amazon Web Services
 
Migration to Redshift from SQL Server
Migration to Redshift from SQL ServerMigration to Redshift from SQL Server
Migration to Redshift from SQL Serverjoeharris76
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftAmazon Web Services
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData FlyData Inc.
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features Amazon Web Services
 
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013Amazon Web Services
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesAmazon Web Services
 

What's hot (20)

Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
 
Scalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query SpeedScalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query Speed
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
Building AWS Redshift Data Warehouse with Matillion and Tableau
Building AWS Redshift Data Warehouse with Matillion and TableauBuilding AWS Redshift Data Warehouse with Matillion and Tableau
Building AWS Redshift Data Warehouse with Matillion and Tableau
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
 
Migration to Redshift from SQL Server
Migration to Redshift from SQL ServerMigration to Redshift from SQL Server
Migration to Redshift from SQL Server
 
What's New in Amazon Aurora
What's New in Amazon AuroraWhat's New in Amazon Aurora
What's New in Amazon Aurora
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
 

Similar to Redshift deep dive

Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftAmazon Web Services LATAM
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...Edgar Alejandro Villegas
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesAlexandra Sasha Blumenfeld
 
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...Microsoft
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixJason Brown
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event PresentationChartio
 
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)Amazon Web Services
 
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Michael Bohlig
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 

Similar to Redshift deep dive (20)

Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Breaking data
Breaking dataBreaking data
Breaking data
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon Redshift
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
 
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...
Business Insight 2014 - Microsofts nye BI og database platform - Erling Skaal...
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event Presentation
 
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
AWS Summit London 2014 | Uses and Best Practices for Amazon Redshift (200)
 
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 

More from Amazon Web Services LATAM

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAmazon Web Services LATAM
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAmazon Web Services LATAM
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSAmazon Web Services LATAM
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSAmazon Web Services LATAM
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAmazon Web Services LATAM
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAmazon Web Services LATAM
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosAmazon Web Services LATAM
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSAmazon Web Services LATAM
 

More from Amazon Web Services LATAM (20)

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWS
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWS
 
Cómo empezar con Amazon EKS
Cómo empezar con Amazon EKSCómo empezar con Amazon EKS
Cómo empezar con Amazon EKS
 
Como começar com Amazon EKS
Como começar com Amazon EKSComo começar com Amazon EKS
Como começar com Amazon EKS
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWS
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWS
 
Ransomware: Estratégias de Mitigação
Ransomware: Estratégias de MitigaçãoRansomware: Estratégias de Mitigação
Ransomware: Estratégias de Mitigação
 
Ransomware: Estratégias de Mitigación
Ransomware: Estratégias de MitigaciónRansomware: Estratégias de Mitigación
Ransomware: Estratégias de Mitigación
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWS
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administrados
 
Simplifique su BI con AWS
Simplifique su BI con AWSSimplifique su BI con AWS
Simplifique su BI con AWS
 
Simplifique o seu BI com a AWS
Simplifique o seu BI com a AWSSimplifique o seu BI com a AWS
Simplifique o seu BI com a AWS
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWS
 

Recently uploaded

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 

Redshift deep dive

  • 2. Amazon Redshift Deep Dive Eric Ferreira AWS Wanderlei Paiva Movile
  • 3. Amazon Redshift system architecture • Leader node – SQL endpoint – Stores metadata – Coordinates query execution • Compute nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3; load from Amazon DynamoDB, Amazon EMR, or SSH • Two hardware platforms – Optimized for data processing – DW1: HDD; scale from 2TB to 2PB – DW2: SSD; scale from 160GB to 326TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 4. A deeper look at compute node architecture • Each node is split into slices – One slice per core – DW1 – 2 slices on XL, 16 on 8XL – DW2 – 2 slices on L, 32 on 8XL • Each slice is allocated memory, CPU, and disk space • Each slice processes a piece of the workload in parallel Leader Node
  • 6. Use multiple input files to maximize throughput • Use the COPY command • Each slice can load one file at a time • A single input file means only one slice is ingesting data • Instead of 100MB/s, you’re only getting 6.25MB/s
  • 7. Use multiple input files to maximize throughput • Use the COPY command • You need at least as many input files as you have slices • With 16 input files, all slices are working so you maximize throughput • Get 100MB/s per node; scale linearly as you add nodes
  • 8. Primary keys and manifest files • Amazon Redshift doesn’t enforce primary key constraints – If you load data multiple times, Amazon Redshift won’t complain – If you declare primary keys in your DML, the optimizer will expect the data to be unique • Use manifest files to control exactly what is loaded and how to respond if input files are missing – Define a JSON manifest on Amazon S3 – Ensures the cluster loads exactly what you want
  • 9. Analyze sort/dist key columns after every load • Amazon Redshift’s query optimizer relies on up-to- date statistics • Maximize performance by updating stats on sort/dist key columns after every load
  • 10. Automatic compression is a good thing (mostly) • Better performance, lower costs • COPY samples data automatically when loading into an empty table – Samples up to 100,000 rows and picks optimal encoding • If you have a regular ETL process and you use temp tables or staging tables, turn off automatic compression – Use analyze compression to determine the right encodings – Bake those encodings into your DML
  • 11. Be careful when compressing your sort keys • Zone maps store min/max per block • Once we know which block(s) contain the range, we know which row offsets to scan • Highly compressed sort keys means many rows per block • You’ll scan more data blocks than you need • If your sort keys compress significantly more than your data columns, you may want to skip compression
  • 12. Keep your columns as narrow as possible • During queries and ingestion, the system allocates buffers based on column width • Wider than needed columns mean memory is wasted • Fewer rows fit into memory; increased likelihood of queries spilling to disk
  • 14. “Com os serviços da AWS pudemos dosar os investimento iniciais e prospectar os custos para expansões futuras” • Líder em Mobile Commerce na América Latina – 50 Milhões de pessoas usam serviços Movile todo mês – Estamos conectados a + de 70 Operadoras em toda América – + de 50 Bilhões de transações por ano – + de 700 colaboradores em 11 escritórios (AL e EUA) • PlayKids – 10M de downloads / 3M usuários ativos – Conteúdo licenciado em 27 países e usuários em 102 países (6 idiomas: português, inglês, espanhol, alemão, francês e chines) – App #1 top grossing for Kids in Apple Store “O Redshift nos permitiu transformar dados em informações self-service” - Wanderley Paiva Database Specialist PlayKids iFood MapLink Apontador Rapiddo Superplayer Cinepapaya ChefTim e
  • 15. O Desafio • Escalabilidade • Disponibilidade • Centralização dos dados • Custos reduzidos e preferencialmente diluído
  • 18. Expanding Amazon Redshift’s query capabilities
  • 19. New SQL functions • We add SQL functions regularly to expand Amazon Redshift’s query capabilities • Added 25+ window and aggregate functions since launch, including: – APPROXIMATE_COUNT – DROP IF EXISTS, CREATE IF NOT EXISTS – REGEXP_SUBSTR, _COUNT, _INSTR, _REPLACE – PERCENTILE_CONT, _DISC, MEDIAN – PERCENT_RANK, RATIO_TO_REPORT • We’ll continue iterating but also want to enable you to write your own
  • 20. User Defined Functions • We’re enabling User Defined Functions (UDFs) so you can add your own – Scalar and Aggregate Functions supported • You’ll be able to write UDFs using Python 2.7 – Syntax is largely identical to PostgreSQL UDF Syntax – System and network calls within UDFs are prohibited • Comes with Pandas, NumPy, and SciPy pre-installed – You’ll also be able import your own libraries for even more flexibility
  • 21. Scalar UDF example – URL parsing CREATE FUNCTION f_hostname (VARCHAR url) RETURNS varchar IMMUTABLE AS $$ import urlparse return urlparse.urlparse(url).hostname $$ LANGUAGE plpythonu;
  • 23. You’re a small Internet bookstore • You’re interested in how you’re doing – Total sales – Best customers – Best-selling items – Top-selling author this month • A row store with indexes works well Orders Product Time Customer Site
  • 24. You get a little bigger • Your queries start taking longer • You move to a column store • Now you have zone maps, large data blocks, but no indexes • You have to choose which queries you want to be fast 10 | 13 | 14 | 26 |… … | 100 | 245 | 324 375 | 393 | 417… … 512 | 549 | 623 637 | 712 | 809 … … | 834 | 921 | 959
  • 25. Today’s state of the art: Zone maps, sorting, projections • Zone maps store the min/max values for every block in memory • Works great for sorted columns – O(log n) access to blocks • Doesn’t work so well for unsorted columns – O(n) access • Projections are multiple copies of data sorted different ways – Optimizer decides which copy to use for responding to queries – Loads are slower – Gets unwieldy quickly. If you have 8 columns, you need have 8 factorial (40,320) combinations.
  • 26. Blocks are points in multidimensional space 00 01 10 11 00 01 10 11 Customers Products 00 01 10 11 00 01 10 11 Customers Products • The 2D tables on the left are over- specified • You don’t need every product or customer to be in consecutive rows • You just need to make sure that each appears in the right sequence
  • 27. Space filling curves 00 01 10 11 00 01 10 11 Customers Products • You need a way of traversing the space that preserves order • And you need to touch every point in the space • You need a space filling curve – There are many of these, for example the curve on the right • Products appear in order as do customers and you don’t favor one over the other
  • 28. Compound Sort Keys Illustrated • Records in Redshift are stored in blocks. • For this illustration, let’s assume that four records fill a block • Records with a given cust_id are all in one block • However, records with a given prod_id are spread across four blocks 1 1 1 1 2 3 4 1 4 4 4 2 3 4 4 1 3 3 3 2 3 4 3 1 2 2 2 2 3 4 2 1 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id cust_id prod_id other columns blocks
  • 29. 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id Interleaved Sort Keys Illustrated • Records with a given cust_id are spread across two blocks • Records with a given prod_id are also spread across two blocks • Data is sorted in equal measures for both keys 1 1 2 2 2 1 2 3 3 4 4 4 3 4 3 1 3 4 4 2 1 2 3 3 1 2 2 4 3 4 1 1 cust_id prod_id other columns blocks
  • 30. How to use the feature • New keyword ‘INTERLEAVED’ when defining sort keys – Existing syntax will still work and behavior is unchanged – You can choose up to 8 columns to include and can query with any or all of them • No change needed to queries • We’re just getting started with this feature – Benefits are significant; load penalty is higher than we’d like and we’ll fix that quickly • Will be available in a couple of weeks and we’d love to get your feedback [[ COMPOUND | INTERLEAVED ] SORTKEY ( column_name [, ...] ) ]
  • 33. Typical ETL/ELT • One file per table, maybe a few if too big • Many updates (“massage” the data) • Every job clears the data, then load • Count on PK to block double loads • High concurrency of load jobs • Small table(s) to control the job stream
  • 34. Two Questions to ask • Why you do what you do? – Many times they don’t even know • What is the customer need ? – Many times needs do not match practice – You might have to add other AWS solutions
  • 35. On Redshift • Updates are delete + insert – Deletes just mark rows for deletion • Commits are expensive – 4GB write on 8XL per node – Mirrors WHOLE dictionary – Cluster-wide Serialized
  • 36. On Redshift • Not all Aggregations created equal – Pre aggregation can help • Concurrency should be low • No dashboard connected directly to RS • WLM only parcels RAM to sessions, not priority • Compression is for speed as well • Distkey, Sortkey and datatypes are important
  • 37. Not all MPP/Columnar are the same • Only RS can DIST STYLE ALL and have a copy per node (not per slice/core) • Some columnar DB have a row-based version of data on insert. Or a option for it • RS does not charge millions of dollars and come do the work for you.
  • 38. Open Source Tools • https://github.com/awslabs/amazon-redshift-utils • Admin Scripts – Collection of utilities for running diagnostics on your Cluster • Admin Views – Collection of utilities for managing your Cluster, generating Schema DDL, etc • Column Encoding Utility – Gives you the ability to apply optimal Column Encoding to an established Schema with data already loaded
  • 39.