SlideShare a Scribd company logo
1 of 26
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migrating Your Data Warehouse to
Amazon Redshift
Ashok Sundaram
Solutions Architect
AWS/AWS Partner Program
D A T 3 3 7
Arun Kannan
Solutions Architect
AWS/AWS Partner Program
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
AWS Database Migration Service (AWS DMS)
AWS Schema Conversion Tool (AWS SCT)
Migration patterns
Best practices
Q&A
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift cluster architecture
• Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
• Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, back up, restore
• Two hardware platforms
• Optimized for data processing
• DS2: HDD; scale from 2 TB to 2 PB
• DC2: SSD; scale from 160 GB to 326 TB
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL clients/BI tools
128GB RAM
16TB disk
16 cores
Amazon S3 / Amazon EMR / Amazon
DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
node
128GB RAM
16TB disk
16 coresCompute
node
128GB RAM
16TB disk
16 coresCompute
node
Leader
node
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migrating to Amazon Redshift
Step 1: Convert or copy your schema
Source DB or DW AWS SCT Destination DB or DW
Step 2: Move your data
Source DB or DW Destination DB or DWAWS DMS
Copy or convert
Data
schema
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The AWS SCT helps automate many database schema and
code conversion tasks while migrating
AWS SCT
Amazon Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS SCT features
• Convert tables, views,
and code
• Convert SQL in your
application code
• Migration compatibility
assessment
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Azure SQL Database
AWS DMS
AWS DMS
Amazon
Redshift
2. Relational databases
1. Non-relational databases
3. Other sources
Amazon S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration using AWS DMS
Replication
Instance
Source
Database
Endpoint
Definition o Size
o VPC
o Security Group
o Encryption
o Source DB
connection
details
Target
Database
Endpoint
Definition
o Target DB
connection
details
AWS DMS Task
Run
Source Database
Refers
Refers Copy Command
Data
Data
o Source Endpoint
o Target Endpoint
o Replication Instance
o Objects/Tables
o Where clause
Amazon
Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SCT data extractors
Amazon RedshiftAWS SCT S3 Bucket
Extracts through local migration agents
Data is optimized for Amazon Redshift and saved in local files
Files are loaded to an Amazon S3 bucket (through network or Amazon Snowball) and then to
Amazon Redshift
Extract data from your data warehouse and migrate to Amazon Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration using AWS SCT extraction agents
Data warehouse
SCT
Migration agent
Migration agent
Migration
agent
…
Corporate data center
Amazon S3 bucket Amazon Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Snowball
• Scale and speed
• 80 TB capacity
• 10 Gbps connectivity
• Parallel data transfer enables PBs transferred
in a week
• Secure
• Tamper-resistant enclosure
• 256-bit encryption with AWS Key
Management Services (AWS KMS)
• Industry-standard TPM
• Simple
• Manage entire process through AWS Management Console
• Lightweight data transfer client
• Notifications
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migrating large data warehouses using Snowball
Data warehouse
SCT
Migration agent
Migration agent
Migration
agent
…
Corporate data center
Amazon S3 bucket Amazon Redshift
AWS Snowball
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparison of different approaches
Parameters AWS DMS AWS SCT extraction agents AWS SCT extraction agents and
AWS Snowball
Size Moderate Large Very large
CDC Yes No No
Setup - SCT
- Replication instance
- Source/target endpoints
- Tasks
- SCT
- Amazon S3 bucket
- Extraction agents
- SCT
- Amazon S3 Bucket
- Extraction agents
- Snowball job
Considerations - Latency
- DW size
−On-premise networking
−Agent host configuration
−On-premise networking
−Host configuration
−Snowball transfer time
Data compression No Yes Yes
Supported engines Oracle, Amazon RDS, Aurora,
SQL Server
Oracle, SQL Server DW, Greenplum,
Netezza, Teradata, Vertica, Amazon
Redshift
Oracle, SQL Server DW, Greenplum,
Netezza, Teradata, Vertica, Amazon
Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift architecture: Slices
A slice can be thought of as a virtual compute node
Unit of data partitioning
Parallel query processing
Facts about slices
Each compute node has 2, 16, or 32 slices
Table rows are distributed to slices
A slice processes only its own data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data ingestion: COPY statement
Number of input files should be a
multiple of the number of slices
Splitting the single file into 16
input files, all slices are working
to maximize ingestion performance
COPY continues to scale linearly
as you add nodes
16 input files
Recommendation is to use delimited files—1 MB to 1 GB after .gzip compression
0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
DC2.8XL compute node
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data distribution
Distribution style is a table property which dictates how
that table’s data is distributed throughout the cluster:
• KEY: Value is hashed, same value goes to
same location (slice)
• ALL: Full table data goes to the first slice
of every node
• EVEN: Round-robin
ALL
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
KEY
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
EVEN
Goals:
• Distribute data evenly for
parallel processing
• Minimize data movement
during query processing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VACUUM and ANALYZE
VACUUM will globally sort the table and remove rows that are marked as
deleted
ANALYZE collects table statistics for optimal query planning
Best practices:
VACUUM should be run only as necessary
Typically nightly or weekly
Consider deep copy (re-creating and copying data) for larger or wide tables
ANALYZE can be run periodically after ingestion on just the columns that WHERE predicates
are filtered on
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration of stored procedures
# Stored procedures content (high level) Complexity (high level)
1 • Dynamic DDLs and DMLs
• Simple transformations
• Simple control statements (For loops, While do loops)
Small
2 • Dynamic DDLs and DMLs
• Complex transformations
• Control statements and cursors
• Aggregations/summaries
Medium
3
• Dynamic DDLs and DMLs
• Complex transformations
• Control statements and cursors
• Complex workflows
• Joins/aggregations/summaries
• Data quality check and cleansing
• Meet certain performance criteria
Complex
Stored procedures are not currently supported in Amazon Redshift. Workaround to migrate the stored
procedure depends on the complexity level.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration of stored procedures
• Amazon Redshift UDFs: Non-SQL processing & Scala to Python UDFs
• Amazon EMR + AWS Glue: Rewrite stored procedure to Amazon EMR
workloads using Pig, Hive, and MapReduce or Spark and then perform
bulk load into the Amazon Redshift database
• ELT using SQL files: Convert stored procedure as series of SQLs in an
Amazon S3 file
• Data integration tools: Converting stored procedure workload into ETL
workload
• Python: Convert stored procedures to Python
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• SCT recommends sort keys and
distribution keys to optimize your
database
• SCT extension pack wizard can
help you install AWS Lambda
functions and Python libraries to
emulate the features that can’t be
converted
Other SCT features
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration phases
Phase Description Automation
1 Deprecate any objects in source database that are no longer needed
2 Assessment of schema conversion state between source and target SCT
3 Remediation of schema conversion issues (source, scripts, or target) SCT
4 Application conversion/remediation SCT
5 Data migration DMS / SCT
6 Functional testing of the entire system
7 Performance tuning SCT
8 Deployment
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ashok Sundaram – sunashok@amazon.com
Arun Kannan - arunkan@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 201830분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
Amazon Web Services Korea
 

What's hot (20)

Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Cost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark ServiceCost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark Service
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
 
Dynamodb ppt
Dynamodb pptDynamodb ppt
Dynamodb ppt
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 201830분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
 
IICS_Capabilities.pptx
IICS_Capabilities.pptxIICS_Capabilities.pptx
IICS_Capabilities.pptx
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookVectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at Facebook
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 

Similar to Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018

Similar to Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018 (20)

Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Toronto ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Toronto ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Toronto ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Toronto ...
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Atlanta ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Atlanta ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Atlanta ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Atlanta ...
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
 
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
 
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - May 2017 A...
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - May 2017 A...Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - May 2017 A...
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - May 2017 A...
 
Running SQL Server on Amazon RDS and Migrating to MySQL (DAT306-R1) - AWS re:...
Running SQL Server on Amazon RDS and Migrating to MySQL (DAT306-R1) - AWS re:...Running SQL Server on Amazon RDS and Migrating to MySQL (DAT306-R1) - AWS re:...
Running SQL Server on Amazon RDS and Migrating to MySQL (DAT306-R1) - AWS re:...
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
SQL Server to Amazon Aurora Migration, Step by Step (DAT405) - AWS re:Invent ...
SQL Server to Amazon Aurora Migration, Step by Step (DAT405) - AWS re:Invent ...SQL Server to Amazon Aurora Migration, Step by Step (DAT405) - AWS re:Invent ...
SQL Server to Amazon Aurora Migration, Step by Step (DAT405) - AWS re:Invent ...
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Anaheim ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Anaheim ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Anaheim ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Anaheim ...
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migrating Your Data Warehouse to Amazon Redshift Ashok Sundaram Solutions Architect AWS/AWS Partner Program D A T 3 3 7 Arun Kannan Solutions Architect AWS/AWS Partner Program
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda AWS Database Migration Service (AWS DMS) AWS Schema Conversion Tool (AWS SCT) Migration patterns Best practices Q&A
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift cluster architecture • Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing • Compute nodes • Local, columnar storage • Executes queries in parallel • Load, back up, restore • Two hardware platforms • Optimized for data processing • DS2: HDD; scale from 2 TB to 2 PB • DC2: SSD; scale from 160 GB to 326 TB 10 GigE (HPC) Ingestion Backup Restore SQL clients/BI tools 128GB RAM 16TB disk 16 cores Amazon S3 / Amazon EMR / Amazon DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute node 128GB RAM 16TB disk 16 coresCompute node 128GB RAM 16TB disk 16 coresCompute node Leader node
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migrating to Amazon Redshift Step 1: Convert or copy your schema Source DB or DW AWS SCT Destination DB or DW Step 2: Move your data Source DB or DW Destination DB or DWAWS DMS Copy or convert Data schema
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The AWS SCT helps automate many database schema and code conversion tasks while migrating AWS SCT Amazon Redshift
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS SCT features • Convert tables, views, and code • Convert SQL in your application code • Migration compatibility assessment
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Azure SQL Database AWS DMS AWS DMS Amazon Redshift 2. Relational databases 1. Non-relational databases 3. Other sources Amazon S3
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration using AWS DMS Replication Instance Source Database Endpoint Definition o Size o VPC o Security Group o Encryption o Source DB connection details Target Database Endpoint Definition o Target DB connection details AWS DMS Task Run Source Database Refers Refers Copy Command Data Data o Source Endpoint o Target Endpoint o Replication Instance o Objects/Tables o Where clause Amazon Redshift
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SCT data extractors Amazon RedshiftAWS SCT S3 Bucket Extracts through local migration agents Data is optimized for Amazon Redshift and saved in local files Files are loaded to an Amazon S3 bucket (through network or Amazon Snowball) and then to Amazon Redshift Extract data from your data warehouse and migrate to Amazon Redshift
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration using AWS SCT extraction agents Data warehouse SCT Migration agent Migration agent Migration agent … Corporate data center Amazon S3 bucket Amazon Redshift
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Snowball • Scale and speed • 80 TB capacity • 10 Gbps connectivity • Parallel data transfer enables PBs transferred in a week • Secure • Tamper-resistant enclosure • 256-bit encryption with AWS Key Management Services (AWS KMS) • Industry-standard TPM • Simple • Manage entire process through AWS Management Console • Lightweight data transfer client • Notifications
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migrating large data warehouses using Snowball Data warehouse SCT Migration agent Migration agent Migration agent … Corporate data center Amazon S3 bucket Amazon Redshift AWS Snowball
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Comparison of different approaches Parameters AWS DMS AWS SCT extraction agents AWS SCT extraction agents and AWS Snowball Size Moderate Large Very large CDC Yes No No Setup - SCT - Replication instance - Source/target endpoints - Tasks - SCT - Amazon S3 bucket - Extraction agents - SCT - Amazon S3 Bucket - Extraction agents - Snowball job Considerations - Latency - DW size −On-premise networking −Agent host configuration −On-premise networking −Host configuration −Snowball transfer time Data compression No Yes Yes Supported engines Oracle, Amazon RDS, Aurora, SQL Server Oracle, SQL Server DW, Greenplum, Netezza, Teradata, Vertica, Amazon Redshift Oracle, SQL Server DW, Greenplum, Netezza, Teradata, Vertica, Amazon Redshift
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift architecture: Slices A slice can be thought of as a virtual compute node Unit of data partitioning Parallel query processing Facts about slices Each compute node has 2, 16, or 32 slices Table rows are distributed to slices A slice processes only its own data
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data ingestion: COPY statement Number of input files should be a multiple of the number of slices Splitting the single file into 16 input files, all slices are working to maximize ingestion performance COPY continues to scale linearly as you add nodes 16 input files Recommendation is to use delimited files—1 MB to 1 GB after .gzip compression 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15 DC2.8XL compute node
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data distribution Distribution style is a table property which dictates how that table’s data is distributed throughout the cluster: • KEY: Value is hashed, same value goes to same location (slice) • ALL: Full table data goes to the first slice of every node • EVEN: Round-robin ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 KEY Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN Goals: • Distribute data evenly for parallel processing • Minimize data movement during query processing
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. VACUUM and ANALYZE VACUUM will globally sort the table and remove rows that are marked as deleted ANALYZE collects table statistics for optimal query planning Best practices: VACUUM should be run only as necessary Typically nightly or weekly Consider deep copy (re-creating and copying data) for larger or wide tables ANALYZE can be run periodically after ingestion on just the columns that WHERE predicates are filtered on
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration of stored procedures # Stored procedures content (high level) Complexity (high level) 1 • Dynamic DDLs and DMLs • Simple transformations • Simple control statements (For loops, While do loops) Small 2 • Dynamic DDLs and DMLs • Complex transformations • Control statements and cursors • Aggregations/summaries Medium 3 • Dynamic DDLs and DMLs • Complex transformations • Control statements and cursors • Complex workflows • Joins/aggregations/summaries • Data quality check and cleansing • Meet certain performance criteria Complex Stored procedures are not currently supported in Amazon Redshift. Workaround to migrate the stored procedure depends on the complexity level.
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration of stored procedures • Amazon Redshift UDFs: Non-SQL processing & Scala to Python UDFs • Amazon EMR + AWS Glue: Rewrite stored procedure to Amazon EMR workloads using Pig, Hive, and MapReduce or Spark and then perform bulk load into the Amazon Redshift database • ELT using SQL files: Convert stored procedure as series of SQLs in an Amazon S3 file • Data integration tools: Converting stored procedure workload into ETL workload • Python: Convert stored procedures to Python
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. • SCT recommends sort keys and distribution keys to optimize your database • SCT extension pack wizard can help you install AWS Lambda functions and Python libraries to emulate the features that can’t be converted Other SCT features
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration phases Phase Description Automation 1 Deprecate any objects in source database that are no longer needed 2 Assessment of schema conversion state between source and target SCT 3 Remediation of schema conversion issues (source, scripts, or target) SCT 4 Application conversion/remediation SCT 5 Data migration DMS / SCT 6 Functional testing of the entire system 7 Performance tuning SCT 8 Deployment
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ashok Sundaram – sunashok@amazon.com Arun Kannan - arunkan@amazon.com
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.