SlideShare a Scribd company logo
1 of 52
Download to read offline
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tony Gibbs, Data Warehousing Solutions Architect at AWS
Chris Sanders, Director Data Services at Ancestry
Tanner Pratt, Manager Data Warehousing at Ancestry
August 14, 2017
Ancestry’s Journey to Amazon Redshift
Deep Dive Overview
• Amazon Redshift history and development
• Cluster architecture
• Concepts and terminology
• New & upcoming features
• Ancestry’s journey to Amazon Redshift
• Open Q&A
Amazon Redshift History & Development
Columnar
MPP
OLAP
IAMAmazon
VPC
Amazon SWF
Amazon S3 AWS KMS Amazon
Route 53
Amazon
CloudWatch
Amazon
EC2
PostgreSQL
Amazon Redshift
February 2013
August 2017
> 100 Significant Patches
> 150 Significant Features
Amazon Redshift Cluster Architecture
Amazon Redshift Cluster Architecture
Massively parallel, shared nothing
Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, backup, restore
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
S3 / Amazon EMR / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
Concepts & Terminology
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with row storage:
o Need to read everything
o Unnecessary I/O
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
Designed for I/O Reduction
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with columnar storage
o Only scan blocks for relevant column
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
CREATE TABLE deep_dive (
aid INT ENCODE LZO
,loc CHAR(3) ENCODE BYTEDICT
,dt DATE ENCODE RUNLENGTH
);
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Columns grow and shrink independently
• Reduces storage requirements
• Reduces I/O
Designed for I/O Reduction
Columnar storage
Data compression
Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
• In-memory block metadata
• Contains per-block MIN and MAX value
• Effectively prunes blocks that cannot contain
data for a given query
• Eliminates unnecessary I/O
SELECT COUNT(*) FROM deep_dive WHERE dt = '09-JUNE-2013'
MIN: 01-JUNE-2013
MAX: 20-JUNE-2013
MIN: 08-JUNE-2013
MAX: 30-JUNE-2013
MIN: 12-JUNE-2013
MAX: 20-JUNE-2013
MIN: 02-JUNE-2013
MAX: 25-JUNE-2013
Unsorted Table
MIN: 01-JUNE-2013
MAX: 06-JUNE-2013
MIN: 07-JUNE-2013
MAX: 12-JUNE-2013
MIN: 13-JUNE-2013
MAX: 18-JUNE-2013
MIN: 19-JUNE-2013
MAX: 24-JUNE-2013
Sorted by Date
Zone Maps
Terminology and Concepts: Data Sorting
• Goals:
• Physically order rows of table data based on certain column(s)
• Optimize effectiveness of zone maps
• Enable MERGE JOIN operations
• Impact:
• Enables rrscans to prune blocks by leveraging zone maps
• Overall reduction in block I/O
• Achieved with the table property SORTKEY defined over one or more columns
• Optimal SORTKEY is dependent on:
• Query patterns
• Data profile
• Business requirements
Terminology and Concepts: Slices
A slice can be thought of like a “virtual compute node”
• Unit of data partitioning
• Parallel query processing
Facts about slices:
• Each compute node has either 2, 16, or 32 slices
• Table rows are distributed to slices
• A slice processes only its own data
Data Distribution
• Distribution style is a table property which dictates how that table’s data is
distributed throughout the cluster:
• KEY: Value is hashed, same value goes to same location (slice)
• ALL: Full table data goes to first slice of every node
• EVEN: Round robin
• Goals:
• Distribute data evenly for parallel processing
• Minimize data movement during query processing
KEY
ALL
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
EVEN
Data Distribution: Example
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE (EVEN|KEY|ALL);
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
Table: deep_dive
User
Columns
System
Columns
aid loc dt ins del row
Data Distribution: EVEN Example
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE EVEN;
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24 MB)
Rows: 1 Rows: 1 Rows: 1 Rows: 1
Data Distribution: KEY Example #1
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE KEY DISTKEY (loc);
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (2 slices) = 12 Blocks (12 MB)
Rows: 0Rows: 1
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2Rows: 0Rows: 1
Data Distribution: KEY Example #2
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE KEY DISTKEY (aid);
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24 MB)
Rows: 1 Rows: 1 Rows: 1 Rows: 1
Data Distribution: ALL Example
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE ALL;
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (2 slice) = 12 Blocks (12 MB)
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
Terminology and Concepts: Data Distribution
KEY
• The key creates an even distribution of data
• Joins are performed between large fact/dimension tables
• Optimizing merge joins and group by
ALL
• Small and medium size dimension tables (< 2–3M)
EVEN
• When key cannot produce an even distribution
New & Upcoming Features
Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query
High concurrency:
Multiple clusters access
same data
No ETL: Query data in-
place using open file
formats
Full Amazon Redshift
SQL support
S3
SQL
Enter Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using
thousands of nodes
Analyze Subsets of Data Analyze ALL Available Data
Traditional Approach Redshift Spectrum Approach
Had to pick and choose which data you wanted to analyze
Analyze only the data that fits
in your data warehouse
Analyze any of the data in your
data lake
Paradigm Shift Enabled by Redshift Spectrum
Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
Amazon Redshift Spectrum is secure
End-to-end
data encryption
Alerts &
notifications
Virtual private cloud
Audit logging
Certifications &
compliance
Encrypt S3 data using SSE and
AWS KMS
Encrypt all Amazon Redshift
data using KMS, AWS
CloudHSM, or your on-premises
HSMs
Enforce SSL with perfect forward
encryption using ECDHE
Amazon Redshift leader node
in your VPC. Compute nodes in
private VPC. Redshift Spectrum
nodes in private VPC, store no
state.
Communicate event-specific
notifications via email, text
message, or call with Amazon
SNS
All API calls are logged using
AWS CloudTrail
All SQL statements are logged
within Amazon Redshift
PCI/DSSFedRAMP
SOC1/2/3 HIPAA/BAA
“Redshift Spectrum will let us expand the
universe of the data we analyze to hundreds of
petabytes over time. This is truly a game
changer, and we can think of no other system in
the world that can get us there.”
“Multiple teams can now query the
same Amazon S3 data sets using both
Amazon Redshift and Amazon EMR.”
“Redshift Spectrum will help us scale yet
further while also lowering our costs.”
“Redshift Spectrum’s fast performance
across massive data sets is
unprecedented.”
“Redshift Spectrum enables us to directly
operate on our data in its native format in
Amazon S3 with no preprocessing or
transformation.”
“Our data science team using Amazon EMR can
now collaborate with our marketing and product
teams using Redshift Spectrum to analyze the
same Amazon S3 data sets.”
Customers Love Amazon Redshift Spectrum
Allows automatic handling of runaway (poorly written) queries
Metrics with operators and values (e.g. query_cpu_time > 1000) create a predicate
Multiple predicates can be AND-ed together to create a rule
Multiple rules can be defined for a queue in WLM. These rules are OR-ed together
If { rule } then [action]
{ rule : metric operator value } e.g.: rows_scanned > 100000
• Metric: cpu_time, query_blocks_read, rows scanned, query
execution time, cpu & io skew per slice, join_row_count, etc.
• Operator: <, >, ==
• Value: integer
[action]: hop, log, abort
Query Monitoring Rules (QMR)
BI tools SQL clientsAnalytics tools
Client AWS
Amazon
Redshift
ADFS
Corporate
Active Directory IAM
Amazon Redshift
ODBC/JDBC
User groups Individual user
Single Sign-On
Identity providers
New Amazon
Redshift
ODBC/JDBC
drivers. Grab the
ticket (userid) and
get a SAML
assertion.
IAM Authentication
Coming Soon: SQL Scalar User-Defined Functions
Language SQL Support Added for Scalar UDFs
CREATE OR REPLACE FUNCTION inet_ntoa(bigint)
RETURNS varchar(15)
AS
$$
SELECT
CASE WHEN $1 BETWEEN 0 AND 4294967295 THEN
(($1 >> 24) & 255) || '.' ||
(($1 >> 16) & 255) || '.' ||
(($1 >> 8) & 255) || '.' ||
(($1 >> 0) & 255)
ELSE
NULL
END
$$ LANGUAGE SQL IMMUTABLE;
Example:
Ancestry’s Journey to Amazon Redshift
20 billion
historical records
90 million
family trees
10 billion
profiles
330 million
user-generated photos and
stories
5 million
people in genomics networks
Why DNA matching is so powerful
The number of ways I could pick two different
people in this room:
0.0001875Chance that a random pair of individuals share
enough DNA to be considered fourth cousins:
19,900
97.6%Chance that there is at least one pair of genetic
fourth cousin pairs in the room:
200Number of people in the room:
Where we were
• Actian Matrix
• On Premises
• Hardware Bound
• Single Admin
• SSIS for ETL
• 161 Packages
• 50+ Data Sources
How we successfully migrated.
Step 1. PLAN
• Document ALL Components That Need to Migrate
• Identify Dependencies
• Prioritize Importance
• Create Roadmap with Milestones
• Align Resources to Swimlanes
• Rally Team
Roadmap
JAN FEB MAR APR
Q1 Q2
Infra.DataWarehouse
AWS Infrastructure
(VPC, Subnet, Network)
DW Environment Design & PoC
DW Infra.
Configuration
(EDW)
Milestones
JAN 27
Core Infrastructure Established
PoC Complete
FEB 7
Amazon
Redshift Env.
Established
Design
Extract
Extract and Transfer
Data
Design S3 to Amazon
Redshift Load
Refactor and Test ELT Packages
(161 Packages, 5 Resources 80% allocated)
Migrate to
AWS Run ELT to “Catch Up”
Load Schema & Data into Amazon
Redshift
MAR 17
Data Loaded into Amazon Redshift
ELT Packages Migrated
to AWS and Verified
FEB 24
Data
Loaded onto S3
Amount of time determines
ELT “Catch Up” time
BackupTableau
& MicroStrategy
Validate Tableau & MicroStrategy
(900 Tableau Workbooks)
Restore to AWS
Export, Transfer and Load
SQL Server Data
Validate Environment
& Cutover
Run MATRIX and
Amazon Redshift in Parallel
APR 7
Redshift ”Caught Up”
Begin Running in Parallel
APR 20
Business Live on
Amazon Redshift
Today, Apr 18
AWS and DW Security
(ACL,Groups, IAM, AIM, Encryption)
DW Infra. Configuration
(SOX, Tableau)
DW Infra.
(Monitoring, DR)
• Cutover to Amazon Redshift completed 9 days
ahead of schedule! Business is live on
Amazon Redshift.
• ETL and Tools Migration 96% complete
• Tableau/MicroStrategy Migration 94% complete
How we successfully migrated.
Step 2. ADJUST
• Review Tasks and Roadmap Frequently
• There Will Be Unknowns
• Evaluate Impacts
• Be Flexible and Agile
How we successfully migrated.
Step 3. COMMUNICATE
• Internal with Team
• Up with Executives and Management
• Out with Customers
• Be Transparent, Honest, and Realistic
Data Warehouse Migration
Status Scorecard – 4/18
Phase Status Complete In Progress
AWS Infrastructure Common Infrastructure – 100% complete
Security Management Security Management – 100% complete
DW Infrastructure Configuration
DW Infrastructure Configuration – 100%
complete
Monitoring and DR Setup – 100% complete
DW Data Extract and Load to AWS Data Extract and Load to AWS –100% complete
ETL and Tools Migration
ETL and Tools Migration – 96% complete
100% (165/165) migration complete
100% (127/127) ETL catchup complete
85% (108/127) validation complete
4/19 – 85% validation complete
4/28 – Run MATRIX and Amazon Redshift in parallel, 100%
validation complete
SQL Server Data Migration SQL Server Data Migration – 100% complete
Tableau and MicroStrategy
Migration
Tableau/MicroStrategy Migration – 94%
complete
Final Tableau Backup and Restore
4/12 - Test connectivity using the ELB/ENI for Tableau
4/18 – Upgrade MicroStrategy
4/19 – MicroStrategy Validation
4/21 – Tableau Validation
Not on schedule & high risk to complete on time On schedule & moderate risk to complete on time On track to deliver on time
How we successfully migrated.
Step 4. RUN PARALLEL
• Start as SOON as Possible
• Quickly Identify Issues
• Added Work, but WORTH IT!
• Insights Are Invaluable
How we successfully migrated.
Step 5. HAVE THE RIGHT PARTNER
• We Were Scared
• Meet Often
• Communicate Concerns
• Amazon Has Been AMAZING
How we successfully migrated.
End result:
• Anticlimactic Failover
• Flawless Migration
• Zero Business Downtime
• 88 Business Days. 9 Days EARLY!
• 470 BILLION Records. Migrated AND Validated. 100% Accurate.
• 161 ETL Packages Migrated
• 600+ Tableau Workbooks Migrated
Lessons learned during migration
1. AWS Infrastructure Creation
2. Data Migration
3. Code Migration
Lessons learned during AWS infrastructure
How do I setup a production level AWS account with all
appropriate services in 3 weeks?
• Preparation
• Practice
• Patience
Lessons learned during data migration
How do I get 80 TB of highly compressed data to the cloud
securely, quickly, and accurately?
• AWS Snowball method
• AWS Direct Connect method
VS
AWS Snowball
Physical data migration
Lessons learned during code migration
How do I migrate 161 ETL code packages
with no business logic loss or data loss?
• Automation of changes
• Strict tracking of changes
• No deleting source data
Optimized for the cloud
Changes we have made since migrations
• Change in mindset
• Node size and type change: 96 DC1.8XL -> 47 DS2.8XL
• Rewrite of ETL code to leverage commit blocks, temp tables, dist
keys, sort keys etc.
drop table if exists foo.me ;
Create table foo.me as (
Select x , y
from foo.you ) ;
2 commits hitting commit queue
Begin;
drop table if exists foo.me ;
Create table foo.me as (
Select x , y
from foo.you );
Commit;
1 Commit
Amazon Redshift Spectrum use case
• Had 3 simple use cases of analysis that needed to be
performed on the data
• Loaded 20 TB of data from source data systems to
Amazon S3 in raw then convert to parquet
• Partitioned by day
• Delivered analysis that was previously impossible to
produce
• https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/
Unlock your past.
Inspire your future.
Thank You!
Tony Gibbs tonygibb@amazon.com
Chris Sanders csanders@ancestry.com
Tanner Pratt tpratt@ancestry.com

More Related Content

What's hot

Database Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationDatabase Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationAmazon Web Services
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDBAmazon Web Services
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...Amazon Web Services
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon EC2 and AWS Compute Services
Getting Started with Amazon EC2 and AWS Compute ServicesGetting Started with Amazon EC2 and AWS Compute Services
Getting Started with Amazon EC2 and AWS Compute ServicesAmazon Web Services
 
How to Migrate your Startup to AWS
How to Migrate your Startup to AWSHow to Migrate your Startup to AWS
How to Migrate your Startup to AWSAmazon Web Services
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...Amazon Web Services
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...Amazon Web Services
 
Getting started with amazon aurora - Toronto
Getting started with amazon aurora - TorontoGetting started with amazon aurora - Toronto
Getting started with amazon aurora - TorontoAmazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)Amazon Web Services
 
How to Scale to Millions of Users with AWS
How to Scale to Millions of Users with AWSHow to Scale to Millions of Users with AWS
How to Scale to Millions of Users with AWSAmazon Web Services
 
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesAmazon Web Services
 
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisManaging Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisAmazon Web Services
 
Getting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDBGetting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDBAmazon Web Services
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraAmazon Web Services
 

What's hot (20)

Database Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationDatabase Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform Migration
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDB
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
Getting Started with Amazon EC2 and AWS Compute Services
Getting Started with Amazon EC2 and AWS Compute ServicesGetting Started with Amazon EC2 and AWS Compute Services
Getting Started with Amazon EC2 and AWS Compute Services
 
How to Migrate your Startup to AWS
How to Migrate your Startup to AWSHow to Migrate your Startup to AWS
How to Migrate your Startup to AWS
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
 
Getting started with amazon aurora - Toronto
Getting started with amazon aurora - TorontoGetting started with amazon aurora - Toronto
Getting started with amazon aurora - Toronto
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)
 
How to Scale to Millions of Users with AWS
How to Scale to Millions of Users with AWSHow to Scale to Millions of Users with AWS
How to Scale to Millions of Users with AWS
 
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
 
Accelerating DynamoDB with DAX
Accelerating DynamoDB with DAXAccelerating DynamoDB with DAX
Accelerating DynamoDB with DAX
 
AWS RDS Migration Tool
AWS RDS Migration Tool AWS RDS Migration Tool
AWS RDS Migration Tool
 
Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute Services
 
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisManaging Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
 
Getting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDBGetting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDB
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon Aurora
 

Similar to SRV405 Ancestry's Journey to Amazon Redshift

Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftData Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftAmazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataAmazon Web Services
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumAmazon Web Services
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 

Similar to SRV405 Ancestry's Journey to Amazon Redshift (20)

Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftData Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Presentation
PresentationPresentation
Presentation
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

SRV405 Ancestry's Journey to Amazon Redshift

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tony Gibbs, Data Warehousing Solutions Architect at AWS Chris Sanders, Director Data Services at Ancestry Tanner Pratt, Manager Data Warehousing at Ancestry August 14, 2017 Ancestry’s Journey to Amazon Redshift
  • 2. Deep Dive Overview • Amazon Redshift history and development • Cluster architecture • Concepts and terminology • New & upcoming features • Ancestry’s journey to Amazon Redshift • Open Q&A
  • 3. Amazon Redshift History & Development
  • 4. Columnar MPP OLAP IAMAmazon VPC Amazon SWF Amazon S3 AWS KMS Amazon Route 53 Amazon CloudWatch Amazon EC2 PostgreSQL Amazon Redshift
  • 5. February 2013 August 2017 > 100 Significant Patches > 150 Significant Features
  • 6. Amazon Redshift Cluster Architecture
  • 7. Amazon Redshift Cluster Architecture Massively parallel, shared nothing Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing Compute nodes • Local, columnar storage • Executes queries in parallel • Load, backup, restore 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores S3 / Amazon EMR / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 9. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with row storage: o Need to read everything o Unnecessary I/O
  • 10. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt Designed for I/O Reduction CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with columnar storage o Only scan blocks for relevant column
  • 11. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt CREATE TABLE deep_dive ( aid INT ENCODE LZO ,loc CHAR(3) ENCODE BYTEDICT ,dt DATE ENCODE RUNLENGTH ); aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Columns grow and shrink independently • Reduces storage requirements • Reduces I/O
  • 12. Designed for I/O Reduction Columnar storage Data compression Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 aid loc dt CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); • In-memory block metadata • Contains per-block MIN and MAX value • Effectively prunes blocks that cannot contain data for a given query • Eliminates unnecessary I/O
  • 13. SELECT COUNT(*) FROM deep_dive WHERE dt = '09-JUNE-2013' MIN: 01-JUNE-2013 MAX: 20-JUNE-2013 MIN: 08-JUNE-2013 MAX: 30-JUNE-2013 MIN: 12-JUNE-2013 MAX: 20-JUNE-2013 MIN: 02-JUNE-2013 MAX: 25-JUNE-2013 Unsorted Table MIN: 01-JUNE-2013 MAX: 06-JUNE-2013 MIN: 07-JUNE-2013 MAX: 12-JUNE-2013 MIN: 13-JUNE-2013 MAX: 18-JUNE-2013 MIN: 19-JUNE-2013 MAX: 24-JUNE-2013 Sorted by Date Zone Maps
  • 14. Terminology and Concepts: Data Sorting • Goals: • Physically order rows of table data based on certain column(s) • Optimize effectiveness of zone maps • Enable MERGE JOIN operations • Impact: • Enables rrscans to prune blocks by leveraging zone maps • Overall reduction in block I/O • Achieved with the table property SORTKEY defined over one or more columns • Optimal SORTKEY is dependent on: • Query patterns • Data profile • Business requirements
  • 15. Terminology and Concepts: Slices A slice can be thought of like a “virtual compute node” • Unit of data partitioning • Parallel query processing Facts about slices: • Each compute node has either 2, 16, or 32 slices • Table rows are distributed to slices • A slice processes only its own data
  • 16. Data Distribution • Distribution style is a table property which dictates how that table’s data is distributed throughout the cluster: • KEY: Value is hashed, same value goes to same location (slice) • ALL: Full table data goes to first slice of every node • EVEN: Round robin • Goals: • Distribute data evenly for parallel processing • Minimize data movement during query processing KEY ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN
  • 17. Data Distribution: Example CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE (EVEN|KEY|ALL); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 Table: deep_dive User Columns System Columns aid loc dt ins del row
  • 18. Data Distribution: EVEN Example CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE EVEN; CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24 MB) Rows: 1 Rows: 1 Rows: 1 Rows: 1
  • 19. Data Distribution: KEY Example #1 CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (loc); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 2 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (2 slices) = 12 Blocks (12 MB) Rows: 0Rows: 1 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 2Rows: 0Rows: 1
  • 20. Data Distribution: KEY Example #2 CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (aid); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24 MB) Rows: 1 Rows: 1 Rows: 1 Rows: 1
  • 21. Data Distribution: ALL Example CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE ALL; CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (2 slice) = 12 Blocks (12 MB) Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
  • 22. Terminology and Concepts: Data Distribution KEY • The key creates an even distribution of data • Joins are performed between large fact/dimension tables • Optimizing merge joins and group by ALL • Small and medium size dimension tables (< 2–3M) EVEN • When key cannot produce an even distribution
  • 23. New & Upcoming Features
  • 24. Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query High concurrency: Multiple clusters access same data No ETL: Query data in- place using open file formats Full Amazon Redshift SQL support S3 SQL Enter Amazon Redshift Spectrum Run SQL queries directly against data in S3 using thousands of nodes
  • 25. Analyze Subsets of Data Analyze ALL Available Data Traditional Approach Redshift Spectrum Approach Had to pick and choose which data you wanted to analyze Analyze only the data that fits in your data warehouse Analyze any of the data in your data lake Paradigm Shift Enabled by Redshift Spectrum
  • 26. Query SELECT COUNT(*) FROM S3.EXT_TABLE GROUP BY… Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore
  • 27. Amazon Redshift Spectrum is secure End-to-end data encryption Alerts & notifications Virtual private cloud Audit logging Certifications & compliance Encrypt S3 data using SSE and AWS KMS Encrypt all Amazon Redshift data using KMS, AWS CloudHSM, or your on-premises HSMs Enforce SSL with perfect forward encryption using ECDHE Amazon Redshift leader node in your VPC. Compute nodes in private VPC. Redshift Spectrum nodes in private VPC, store no state. Communicate event-specific notifications via email, text message, or call with Amazon SNS All API calls are logged using AWS CloudTrail All SQL statements are logged within Amazon Redshift PCI/DSSFedRAMP SOC1/2/3 HIPAA/BAA
  • 28. “Redshift Spectrum will let us expand the universe of the data we analyze to hundreds of petabytes over time. This is truly a game changer, and we can think of no other system in the world that can get us there.” “Multiple teams can now query the same Amazon S3 data sets using both Amazon Redshift and Amazon EMR.” “Redshift Spectrum will help us scale yet further while also lowering our costs.” “Redshift Spectrum’s fast performance across massive data sets is unprecedented.” “Redshift Spectrum enables us to directly operate on our data in its native format in Amazon S3 with no preprocessing or transformation.” “Our data science team using Amazon EMR can now collaborate with our marketing and product teams using Redshift Spectrum to analyze the same Amazon S3 data sets.” Customers Love Amazon Redshift Spectrum
  • 29. Allows automatic handling of runaway (poorly written) queries Metrics with operators and values (e.g. query_cpu_time > 1000) create a predicate Multiple predicates can be AND-ed together to create a rule Multiple rules can be defined for a queue in WLM. These rules are OR-ed together If { rule } then [action] { rule : metric operator value } e.g.: rows_scanned > 100000 • Metric: cpu_time, query_blocks_read, rows scanned, query execution time, cpu & io skew per slice, join_row_count, etc. • Operator: <, >, == • Value: integer [action]: hop, log, abort Query Monitoring Rules (QMR)
  • 30. BI tools SQL clientsAnalytics tools Client AWS Amazon Redshift ADFS Corporate Active Directory IAM Amazon Redshift ODBC/JDBC User groups Individual user Single Sign-On Identity providers New Amazon Redshift ODBC/JDBC drivers. Grab the ticket (userid) and get a SAML assertion. IAM Authentication
  • 31. Coming Soon: SQL Scalar User-Defined Functions Language SQL Support Added for Scalar UDFs CREATE OR REPLACE FUNCTION inet_ntoa(bigint) RETURNS varchar(15) AS $$ SELECT CASE WHEN $1 BETWEEN 0 AND 4294967295 THEN (($1 >> 24) & 255) || '.' || (($1 >> 16) & 255) || '.' || (($1 >> 8) & 255) || '.' || (($1 >> 0) & 255) ELSE NULL END $$ LANGUAGE SQL IMMUTABLE; Example:
  • 32. Ancestry’s Journey to Amazon Redshift
  • 33. 20 billion historical records 90 million family trees 10 billion profiles 330 million user-generated photos and stories 5 million people in genomics networks
  • 34. Why DNA matching is so powerful The number of ways I could pick two different people in this room: 0.0001875Chance that a random pair of individuals share enough DNA to be considered fourth cousins: 19,900 97.6%Chance that there is at least one pair of genetic fourth cousin pairs in the room: 200Number of people in the room:
  • 35. Where we were • Actian Matrix • On Premises • Hardware Bound • Single Admin • SSIS for ETL • 161 Packages • 50+ Data Sources
  • 36. How we successfully migrated. Step 1. PLAN • Document ALL Components That Need to Migrate • Identify Dependencies • Prioritize Importance • Create Roadmap with Milestones • Align Resources to Swimlanes • Rally Team
  • 37. Roadmap JAN FEB MAR APR Q1 Q2 Infra.DataWarehouse AWS Infrastructure (VPC, Subnet, Network) DW Environment Design & PoC DW Infra. Configuration (EDW) Milestones JAN 27 Core Infrastructure Established PoC Complete FEB 7 Amazon Redshift Env. Established Design Extract Extract and Transfer Data Design S3 to Amazon Redshift Load Refactor and Test ELT Packages (161 Packages, 5 Resources 80% allocated) Migrate to AWS Run ELT to “Catch Up” Load Schema & Data into Amazon Redshift MAR 17 Data Loaded into Amazon Redshift ELT Packages Migrated to AWS and Verified FEB 24 Data Loaded onto S3 Amount of time determines ELT “Catch Up” time BackupTableau & MicroStrategy Validate Tableau & MicroStrategy (900 Tableau Workbooks) Restore to AWS Export, Transfer and Load SQL Server Data Validate Environment & Cutover Run MATRIX and Amazon Redshift in Parallel APR 7 Redshift ”Caught Up” Begin Running in Parallel APR 20 Business Live on Amazon Redshift Today, Apr 18 AWS and DW Security (ACL,Groups, IAM, AIM, Encryption) DW Infra. Configuration (SOX, Tableau) DW Infra. (Monitoring, DR) • Cutover to Amazon Redshift completed 9 days ahead of schedule! Business is live on Amazon Redshift. • ETL and Tools Migration 96% complete • Tableau/MicroStrategy Migration 94% complete
  • 38. How we successfully migrated. Step 2. ADJUST • Review Tasks and Roadmap Frequently • There Will Be Unknowns • Evaluate Impacts • Be Flexible and Agile
  • 39. How we successfully migrated. Step 3. COMMUNICATE • Internal with Team • Up with Executives and Management • Out with Customers • Be Transparent, Honest, and Realistic
  • 40. Data Warehouse Migration Status Scorecard – 4/18 Phase Status Complete In Progress AWS Infrastructure Common Infrastructure – 100% complete Security Management Security Management – 100% complete DW Infrastructure Configuration DW Infrastructure Configuration – 100% complete Monitoring and DR Setup – 100% complete DW Data Extract and Load to AWS Data Extract and Load to AWS –100% complete ETL and Tools Migration ETL and Tools Migration – 96% complete 100% (165/165) migration complete 100% (127/127) ETL catchup complete 85% (108/127) validation complete 4/19 – 85% validation complete 4/28 – Run MATRIX and Amazon Redshift in parallel, 100% validation complete SQL Server Data Migration SQL Server Data Migration – 100% complete Tableau and MicroStrategy Migration Tableau/MicroStrategy Migration – 94% complete Final Tableau Backup and Restore 4/12 - Test connectivity using the ELB/ENI for Tableau 4/18 – Upgrade MicroStrategy 4/19 – MicroStrategy Validation 4/21 – Tableau Validation Not on schedule & high risk to complete on time On schedule & moderate risk to complete on time On track to deliver on time
  • 41. How we successfully migrated. Step 4. RUN PARALLEL • Start as SOON as Possible • Quickly Identify Issues • Added Work, but WORTH IT! • Insights Are Invaluable
  • 42. How we successfully migrated. Step 5. HAVE THE RIGHT PARTNER • We Were Scared • Meet Often • Communicate Concerns • Amazon Has Been AMAZING
  • 43. How we successfully migrated. End result: • Anticlimactic Failover • Flawless Migration • Zero Business Downtime • 88 Business Days. 9 Days EARLY! • 470 BILLION Records. Migrated AND Validated. 100% Accurate. • 161 ETL Packages Migrated • 600+ Tableau Workbooks Migrated
  • 44. Lessons learned during migration 1. AWS Infrastructure Creation 2. Data Migration 3. Code Migration
  • 45. Lessons learned during AWS infrastructure How do I setup a production level AWS account with all appropriate services in 3 weeks? • Preparation • Practice • Patience
  • 46. Lessons learned during data migration How do I get 80 TB of highly compressed data to the cloud securely, quickly, and accurately? • AWS Snowball method • AWS Direct Connect method VS AWS Snowball
  • 48. Lessons learned during code migration How do I migrate 161 ETL code packages with no business logic loss or data loss? • Automation of changes • Strict tracking of changes • No deleting source data
  • 49. Optimized for the cloud Changes we have made since migrations • Change in mindset • Node size and type change: 96 DC1.8XL -> 47 DS2.8XL • Rewrite of ETL code to leverage commit blocks, temp tables, dist keys, sort keys etc. drop table if exists foo.me ; Create table foo.me as ( Select x , y from foo.you ) ; 2 commits hitting commit queue Begin; drop table if exists foo.me ; Create table foo.me as ( Select x , y from foo.you ); Commit; 1 Commit
  • 50. Amazon Redshift Spectrum use case • Had 3 simple use cases of analysis that needed to be performed on the data • Loaded 20 TB of data from source data systems to Amazon S3 in raw then convert to parquet • Partitioned by day • Delivered analysis that was previously impossible to produce • https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/
  • 52. Thank You! Tony Gibbs tonygibb@amazon.com Chris Sanders csanders@ancestry.com Tanner Pratt tpratt@ancestry.com