One of the most critical roles of an IT department is to protect and serve its corporate data. As a result, IT departments spend tremendous amounts of resources developing, designing, testing, and optimizing data recovery and replication options in order to improve data availability and service response time. This session outlines replication challenges, key design patterns, and methods commonly used in today’s IT environment. Furthermore, the session provides different data replication solutions available in the AWS cloud. Finally, the session outlines several key factors to be considered when implementing data replication architectures in the AWS cloud.
2. Thomas Jefferson first
acquired the lettercopying device he called
"the finest invention of
the present age" in
March of 1804.
http://www.monticello.org/site/house-and-gardens/polygraph
3. Agenda
• Data replication design options in AWS
• Replication design factors and challenges
• Use cases
– Do-it-yourself options
– Managed and built-in AWS options
• Demos
4. Data Replication Capabilities
Application Services
Compute
Storage
Networking
AWS Global Infrastructure
Database
Partner Data Replication
Capabilities
AWS Data Replication
Capabilities
Deployment & Administration
5. Data Replication Solution Architecture
AWS
Capabilities
•
•
•
•
•
•
•
•
•
•
•
•
Solutions
Architecture
Multiple Availability Zones
Evaluate
Multiple regions
Options
AMI
Amazon EBS and DB snapshots
AMI copy
Amazon EBS and DB snapshot copy
Multi-AZ DBs
Read replica DBs
Provisioned IOPS
Offline backups and achieves
Data lifecycle policies
Highly durable storage
Business Drivers
Measure
Metrics
•
•
•
•
•
•
•
•
•
•
Business continuity
Disaster recovery
Customer experience
Productivity
Mismatched SLA
Compliance
Reducing cost
Information security risks
Global expansion
Performance/availability
6. Focus of Our Discussion Today
Business Drivers
Databases
Replication
Options
Data Preservation
Design
Factors
Data
Type
Design
Options
Storage/Content
(Files and Objects)
Performance
7. Focus of Our Discussion Today
Business Drivers
Databases
Replication
Options
Data Preservation
Design
Factors
Data
Type
Design
Options
Storage/Content
(Files and Objects)
Performance
8. Design Options in AWS
AZ
Single
AZ
Databases
Files and
Objects
AZ
AZ
Multi-AZ
Region
Region
Cross
Region
Hybrid
IT
9. Focus of Our Discussion Today
Business Drivers
Databases
Replication
Options
Data Preservation
Design
Factors
Data
Type
Design
Options
Storage/Content
(Files and Objects)
Performance
10. DR Metrics – RTO/RPO
2:00am
6:00am
RPO
4 Hours
Physical vs. Logical
Synchronous vs. Asynchronous
11:00am
RTO
5 Hours
Time
Last Backup
Event
Data Restored
11. Data Replication Options in AWS
AZ
Single
AZ
Files and
Objects
AZ
Multi-AZ
Region
Region
Cross
Region
Hybrid
IT
Preservation
Databases
AZ
12. Focus of Our Discussion Today
Business Drivers
Databases
Replication
Options
Data Preservation
Design
Factors
Data
Type
Design
Options
Storage/Content
(Files and Objects)
Performance
13. Performance Metric – Total Time
600M Records and
320 GB in Size
Daily and Weekly
Updates
Source
DB
Estimated DB
Size
~35 TB
ETL
Estimated DB
Size
~48 TB
Can we do this in 10 hours?
Target
DB
14. Performance Metric – Total Time
Source
DB
Estimated DB
Size
~70 TB
ETL
Estimated DB
Size
~48 TB
Can we STILL do this in 10 hours?
Target
DB
15. Data Replication Options in AWS
AZ
Single
AZ
Multi-AZ
Region
Region
Cross
Region
Hybrid
IT
Performance
Files and
Objects
AZ
Preservation
Databases
AZ
16. Focus of Our Discussion Today
Business Drivers
Databases
Replication
Options
Data preservation
Design
Factors
Data
Type
Design
Options
Storage/Content
(Files and Objects)
Performance
17. Factors Affecting Replication Designs
Throughput
Source
Target
4
Bandwidth
Latency
6
1
2
Consistency
Replicate
Size of Data
5
Read/Write
3
Data Change Rate
Read/Write
31. Demo – AWS S3 Multipart Upload
Generic Database
DB
Full Backup
Corporate Data center
Amazon S3
Bucket
32. Think Parallel
• Use multipart upload (API/SDK or command line
tools) – min part size is 5 MB
• Use multiple threads
– GNU parallels: parallel -j0 -N2 --progress /usr/bin/s3cmd put {1} {2}
– Python multiprocessing, .Net parallel extensions, etc.
• Use multiple machines
– Limited by host CPU / memory / network / IO
35. Non-RDS to RDS Database Replication
Bob’s Office
1
Amazon RDS
Configure to Be a Master
Initialize
4
MySQL
mysqldump
2
mysqldump
Dump
3
AWS S3 CP
Amazon S3
Bucket
Availability Zone A
Corporate Data Center
36. Non-RDS to RDS Database Replication
Bob’s Office
5
Amazon RDS
MySQL
Availability Zone A
Corporate Data Center
Run mysql.rds_set_external_master
37. Non-RDS to RDS Database Replication
Bob’s Office
Amazon RDS
6
MySQL
Run mysql.rds_start_replication
Availability Zone A
Corporate Data Center
39. Database Replication Options
Bob’s Office
Amazon RDS
MySQL
SQL
Server
SQL
Server
OSB Cloud
Module
Oracle
O
S
B
RMAN
restore
Amazon S3
Bucket
Oracle
40. Database Replication Options
• Amazon RDS MySQL
– Replication
• SQL Server and Oracle on EC2
– SQL server log shipping, always on,
mirroring, etc.
– Oracle RMAN/OSB, Active Data Guard,
etc.
Bob, DBA
41. HA Database Replication Options
We need a highly
available solution.
Katie, Director
Bob, DBA
42. HA DB Replication Options
Data Guard Configuration
Prepare primary database
1. Enable logging
2. Add standby redo logs
3. Add data guard parameters to init.ora/spfile
4. Update tnsnames.ora and listener.ora
Amazon RDS DB
Instance Standby
(Multi-AZ)
SQL
Server
SQL
Server
Prepare standby database environment
1. Install or clone the Oracle home
2. Copy password file (orapwdSID) from primary
database
3. Add data guard parameters to init.ora/spfile
4. Update tnsnames.ora and listener.ora
Create standby database using RMAN
1. Duplicate target database for standby
Oracle
Availability Zone A
Data Guard
Oracle
Standby
Availability Zone B
Configure Data Guard broker
1. Setup database parameters on primary and
standby database init.ora/spfile
2. Create Data Guard configuration for primary and
standby using dgmgrl
3. Setup StaticConnectIdentifier for primary and
standby
4. Enable Data Guard configuration
5. Show configuration – should return success
43. HA DB Replication Options
Physical
Synchronous Replication
Amazon RDS DB
Instance Standby
(Multi-AZ)
SQL
Server
Oracle
Availability Zone A
Availability Zone B
Amazon RDS MySQL Multi-AZ
Amazon RDS Oracle Multi-AZ
45. Increase Throughput Options
• Amazon EC2 instance type
– Amazon RDS MySQL
• PIOPS
– Amazon RDS MySQL
• Read replicas
– Amazon RDS MySQL
Bob, DBA Manager
46. Amazon RDS Performance Options
Asynchronous
m1.small
m2.4xlarge
Amazon
RDS DB
Instance
Read
Replica
Provision IOPS
Availability Zone A
Availability Zone B
61. AWS Data Pipeline
JDBC
Amazon S3
DynamoDB
DB on
Instance
Amazon Redshift
Scheduler
Copy
EMR
Hive
Hive
copy
Pig
Data Pipeline
Shell
command
Redshift
copy
SQL
62. Demo Data Warehouse Replication
Amazon
DynamoDB
BI
Reports
Bucket with
Objects
Oracle
AWS Data Pipeline
Amazon Redshift
63. Attunity CloudBeam for Amazon Redshift –
Incremental Load (CDC)
Amazon
Redshift
AWS region
5
‘Copy’ data to CDC table
Change Files
in customer’s S3 account
Data Tables
S3
CDC Table
4
Execute SQL commands
‘merge’ change into data tables
Execute ‘copy’ command
to load data tables from S3
2
Beam
files to S3
6
3
Replication Server
Source Database
Oracle DB
1
Generate change files
Change Files (CDC)
Net Changes file
Validate file
content upon
arrival
70. Amazon DynamoDB Built-in Replication
Region B
Region A
Provisioned Throughput
Automatic 3-way Replication
Amazon S3
Bucket
Amazon
DynamoDB
Amazon
DynamoDB
Table
Table
Table
Table
Availability Zone A
Availability Zone B
Availability Zone C
AWS Data Pipeline
Amazon
DynamoDB
Amazon
DynamoDB
71. Amazon Redshift Replication Patterns
Region B
Leader
Node
Copy
Unload
Amazon S3
Bucket
Amazon Redshift
10GigE
Compute
Node
Compute
Node
Availability Zone A
Compute
Node
73. Tsunami-UDP here’s how
Install Tsunami
Origin Server
$ cd /path/to/files
$ tsunamid *
Destination Server
$ cd /path/to/receive/files
$ tsunami
tsunami> connect ec2-XX-XX-XX-83.compute-1.amazonaws.com
tsunami> get *
*** Note firewall ports need opening between servers
74. BBCP here’s how
Install BBCP
$ sudo wget http://www.slac.stanford.edu/~abh/bbcp/bin/amd64_linux26/bbcp
$ sudo cp bbcp /usr/bin/
Transmitting with 64 parallel streams
$
bbcp -P 2 -V -w 8m -s 64 /local/files/* ec2-your-instance.ap-southeast-1.compute.amazonaws.com:/mnt/
Notes:
• SSH key pairs are used to authenticate between systems
• TCP port 5031 needs to be open between machines
• Does NOT encrypt data in transit
• Instance Type matters the better the instance type the better the performance
• Many dials and options to tweak to improve performance and friendly features
like retries and restarts
*** Note firewall ports need opening between servers
75. Minutes to Send a 30GB File from US-East-1
60
50
40
30
20
10
0
To Dublin
SCP
To Singapore
BBCP
Tsunami
*** Note using hs1.xl