More Related Content Similar to Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018 (20) More from Amazon Web Services (20) Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migrating Your Data Warehouse to
Amazon Redshift
Ashok Sundaram
Solutions Architect
AWS/AWS Partner Program
D A T 3 3 7
Arun Kannan
Solutions Architect
AWS/AWS Partner Program
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
AWS Database Migration Service (AWS DMS)
AWS Schema Conversion Tool (AWS SCT)
Migration patterns
Best practices
Q&A
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift cluster architecture
• Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
• Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, back up, restore
• Two hardware platforms
• Optimized for data processing
• DS2: HDD; scale from 2 TB to 2 PB
• DC2: SSD; scale from 160 GB to 326 TB
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL clients/BI tools
128GB RAM
16TB disk
16 cores
Amazon S3 / Amazon EMR / Amazon
DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
node
128GB RAM
16TB disk
16 coresCompute
node
128GB RAM
16TB disk
16 coresCompute
node
Leader
node
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migrating to Amazon Redshift
Step 1: Convert or copy your schema
Source DB or DW AWS SCT Destination DB or DW
Step 2: Move your data
Source DB or DW Destination DB or DWAWS DMS
Copy or convert
Data
schema
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The AWS SCT helps automate many database schema and
code conversion tasks while migrating
AWS SCT
Amazon Redshift
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS SCT features
• Convert tables, views,
and code
• Convert SQL in your
application code
• Migration compatibility
assessment
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Azure SQL Database
AWS DMS
AWS DMS
Amazon
Redshift
2. Relational databases
1. Non-relational databases
3. Other sources
Amazon S3
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration using AWS DMS
Replication
Instance
Source
Database
Endpoint
Definition o Size
o VPC
o Security Group
o Encryption
o Source DB
connection
details
Target
Database
Endpoint
Definition
o Target DB
connection
details
AWS DMS Task
Run
Source Database
Refers
Refers Copy Command
Data
Data
o Source Endpoint
o Target Endpoint
o Replication Instance
o Objects/Tables
o Where clause
Amazon
Redshift
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SCT data extractors
Amazon RedshiftAWS SCT S3 Bucket
Extracts through local migration agents
Data is optimized for Amazon Redshift and saved in local files
Files are loaded to an Amazon S3 bucket (through network or Amazon Snowball) and then to
Amazon Redshift
Extract data from your data warehouse and migrate to Amazon Redshift
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration using AWS SCT extraction agents
Data warehouse
SCT
Migration agent
Migration agent
Migration
agent
…
Corporate data center
Amazon S3 bucket Amazon Redshift
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Snowball
• Scale and speed
• 80 TB capacity
• 10 Gbps connectivity
• Parallel data transfer enables PBs transferred
in a week
• Secure
• Tamper-resistant enclosure
• 256-bit encryption with AWS Key
Management Services (AWS KMS)
• Industry-standard TPM
• Simple
• Manage entire process through AWS Management Console
• Lightweight data transfer client
• Notifications
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migrating large data warehouses using Snowball
Data warehouse
SCT
Migration agent
Migration agent
Migration
agent
…
Corporate data center
Amazon S3 bucket Amazon Redshift
AWS Snowball
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparison of different approaches
Parameters AWS DMS AWS SCT extraction agents AWS SCT extraction agents and
AWS Snowball
Size Moderate Large Very large
CDC Yes No No
Setup - SCT
- Replication instance
- Source/target endpoints
- Tasks
- SCT
- Amazon S3 bucket
- Extraction agents
- SCT
- Amazon S3 Bucket
- Extraction agents
- Snowball job
Considerations - Latency
- DW size
−On-premise networking
−Agent host configuration
−On-premise networking
−Host configuration
−Snowball transfer time
Data compression No Yes Yes
Supported engines Oracle, Amazon RDS, Aurora,
SQL Server
Oracle, SQL Server DW, Greenplum,
Netezza, Teradata, Vertica, Amazon
Redshift
Oracle, SQL Server DW, Greenplum,
Netezza, Teradata, Vertica, Amazon
Redshift
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift architecture: Slices
A slice can be thought of as a virtual compute node
Unit of data partitioning
Parallel query processing
Facts about slices
Each compute node has 2, 16, or 32 slices
Table rows are distributed to slices
A slice processes only its own data
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data ingestion: COPY statement
Number of input files should be a
multiple of the number of slices
Splitting the single file into 16
input files, all slices are working
to maximize ingestion performance
COPY continues to scale linearly
as you add nodes
16 input files
Recommendation is to use delimited files—1 MB to 1 GB after .gzip compression
0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
DC2.8XL compute node
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data distribution
Distribution style is a table property which dictates how
that table’s data is distributed throughout the cluster:
• KEY: Value is hashed, same value goes to
same location (slice)
• ALL: Full table data goes to the first slice
of every node
• EVEN: Round-robin
ALL
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
KEY
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
EVEN
Goals:
• Distribute data evenly for
parallel processing
• Minimize data movement
during query processing
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VACUUM and ANALYZE
VACUUM will globally sort the table and remove rows that are marked as
deleted
ANALYZE collects table statistics for optimal query planning
Best practices:
VACUUM should be run only as necessary
Typically nightly or weekly
Consider deep copy (re-creating and copying data) for larger or wide tables
ANALYZE can be run periodically after ingestion on just the columns that WHERE predicates
are filtered on
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration of stored procedures
# Stored procedures content (high level) Complexity (high level)
1 • Dynamic DDLs and DMLs
• Simple transformations
• Simple control statements (For loops, While do loops)
Small
2 • Dynamic DDLs and DMLs
• Complex transformations
• Control statements and cursors
• Aggregations/summaries
Medium
3
• Dynamic DDLs and DMLs
• Complex transformations
• Control statements and cursors
• Complex workflows
• Joins/aggregations/summaries
• Data quality check and cleansing
• Meet certain performance criteria
Complex
Stored procedures are not currently supported in Amazon Redshift. Workaround to migrate the stored
procedure depends on the complexity level.
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration of stored procedures
• Amazon Redshift UDFs: Non-SQL processing & Scala to Python UDFs
• Amazon EMR + AWS Glue: Rewrite stored procedure to Amazon EMR
workloads using Pig, Hive, and MapReduce or Spark and then perform
bulk load into the Amazon Redshift database
• ELT using SQL files: Convert stored procedure as series of SQLs in an
Amazon S3 file
• Data integration tools: Converting stored procedure workload into ETL
workload
• Python: Convert stored procedures to Python
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• SCT recommends sort keys and
distribution keys to optimize your
database
• SCT extension pack wizard can
help you install AWS Lambda
functions and Python libraries to
emulate the features that can’t be
converted
Other SCT features
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration phases
Phase Description Automation
1 Deprecate any objects in source database that are no longer needed
2 Assessment of schema conversion state between source and target SCT
3 Remediation of schema conversion issues (source, scripts, or target) SCT
4 Application conversion/remediation SCT
5 Data migration DMS / SCT
6 Functional testing of the entire system
7 Performance tuning SCT
8 Deployment
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
25. Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ashok Sundaram – sunashok@amazon.com
Arun Kannan - arunkan@amazon.com
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.