DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora PostgreSQL-Compatible Edition

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hands-On Journey for Migrating Oracle
Databases to the Amazon Aurora
PostgreSQL-Compatible Edition
S a m e e r M a l i k
G o w r i B a l a s u b r a m a n i a n
K y l e H a i l e y
K e v i n J e r n i g a n
J i g n e s h S h a h
N o v e m b e r 2 8 , 2 0 1 7
AWS re:INVENT

Introduction
Gowri Balasubramanian

What to Expect from This Session
• Introduction to Aurora PostgreSQL and DMS
• LAB1: Oracle to Aurora PostgreSQL Migration using AWS SCT and
DMS
• LAB2:Walk-Through of Aurora Console and DBA Tasks
• Introduction to Performance Insights
• LAB3: Performance Analysis Using PI
• Aurora Best Practices for DBAs and Wrap-Up

Amazon Aurora with PostgreSQL
compatibility
Kevin Jernigan

Reimagining the Relational Database
What if you were inventing the database today?
You would break apart the stack
You would build something that:
 Can scale out…
 Is self-healing…
 Leverages distributed services…

A Service-Oriented Architecture Applied to the
Database
Move the logging and storage layer into a
multitenant, scale-out, database-optimized
storage service
Integrate with other AWS services like Amazon
Elastic Compute Cloud (Amazon EC2), Amazon
Virtual Private Cloud (Amazon VPC), Amazon
DynamoDB, Amazon Simple Workflow Service
(Amazon SWF), and Amazon Route 53 for
control and monitoring
Make it a managed service – using Amazon
Relational Database Service (Amazon RDS).
Takes care of management and administrative
functions.
Amazon
DynamoDB
Amazon SWF
Amazon Route 53
Logging + Storage
SQL
Transactions
Caching
Amazon S3
1
2
3
Amazon RDS

PostgreSQL – Open Source Database
• Open source database
• In active development for 20 years
• Owned by a foundation, not a single company
• Permissive innovation-friendly open source license
• High performance out of the box
• Object-oriented and ANSI-SQL:2008 compatible
• Most geospatial features of any open source database
• Supports stored procedures in 12 languages (Java, Perl,
Python, Ruby, Tcl, C/C++, its own Oracle-like PL/pgSQL, and
so on)
• Most Oracle-compatible open source databases
• Highest AWS Schema Conversion Tool automatic conversion
rates are from Oracle to PostgreSQL

PostgreSQL Compatibility
PostgreSQL 9.6 + Amazon Aurora cloud-optimized storage
 Performance: 2x to 3x higher throughput than PostgreSQL alone
 Availability: failover time of < 30 seconds
 Durability: six copies across three Availability Zones
 Read Replicas: single-digit millisecond lag times on up to 15 replicas
Amazon Aurora Storage

PostgreSQL Compatibility
Cloud-native security and encryption
 AWS Key Management Service (AWS KMS)
 AWS Identity and Access Management (IAM)
Easy to manage with Amazon RDS
Easy to load and unload
 AWS Database Migration Service
 AWS Schema Conversion Tool
Fully compatible with PostgreSQL, now and for the
foreseeable future
 Not a compatibility layer – native PostgreSQL
implementation
AWS DMS
Amazon RDS
PostgreSQL

Amazon Aurora with PostgreSQL Compatibility
Durability and Availability

Scale-Out, Distributed, Log Structured Storage
Master Replica Replica Replica
Availability Zone 1
Shared Storage Volume – Transaction Aware
Primary
Database
Node
Read
Replica/
Secondary
Node
Read
Replica/
Secondary
Node
Read
Replica/
Secondary
Node
Availability Zone 2 Availability Zone 3
AWS Region
Storage
Monitoring
Database and
Instance
Monitoring

Amazon Aurora Storage Engine Overview
Data is replicated six times across three
Availability Zones
Continuous backup to Amazon S3
(built for 11 9s durability)
Continuous monitoring of nodes and disks for
repair
10 GB segments as unit of repair or hotspot
rebalance
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Storage volume automatically grows up to 64 TB
AZ 1 AZ 2 AZ 3
Amazon S3
Database
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Monitoring

Faster, More Predictable Failover with Amazon Aurora
App
RunningFailure Detection DNS Propagation
Recovery
Database
Failure
Amazon RDS for PostgreSQL is good: failover times of ~60 seconds
Replica-Aware App Running
Failure Detection DNS Propagation
Recovery
Database
Failure
Amazon Aurora is better: failover times < 30 seconds
1 5 - 2 0 s e c 3 - 1 0 s e c
App
Running

Performance Architecture

Do fewer IOs
Minimize network packets
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
How Does Amazon Aurora Achieve High Performance?
DATABASES ARE ALL ABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING NEEDS CPU AND MEMORY OPTIMIZATIONS

Write IO Traffic in an Amazon Aurora Database Node
AZ 1 AZ 3
Primary
Database
Node
Amazon S3
AZ 2
Read
Replica/
Secondary
Node
AMAZON AURORA
ASYNC
4/6 QUORUM
DISTRIBUTED
WRITES
DATAAMAZON AURORA + WAL LOG COMMIT LOG & FILES
IO FLOW
Only write WAL records; all steps asynchronous
No data block writes (checkpoint, cache replacement)
6x more log writes, but 9x less network traffic
Tolerant of network and storage outlier latency
OBSERVATIONS
2x or better PostgreSQL Community Edition performance on
write-only or mixed read-write workloads
PERFORMANCE
Boxcar log records – fully ordered by LSN
Shuffle to appropriate segments – partially ordered
Boxcar to storage nodes and issue writes
WAL
T Y P E O F W R IT E
Read
Replica/
Secondary
Node

Write IO Traffic in an Amazon Aurora Storage Node
LOG RECORDS
Primary
Database
Node
INCOMING QUEUE
STORAGE NODE
AMAZON S3 BACKUP
1
2
3
4
5
6
7
8
UPDATE
QUEUE
ACK
HOT
LOG
DATA
BLOCKS
POINT IN TIME
SNAPSHOT
GC
SCRUB
COALESCE
SORT
GROUP
PEER TO PEER GOSSIPPeer
Storage
Nodes
All steps are asynchronous
Only steps 1 and 2 are in foreground latency path
Input queue is far smaller than PostgreSQL
Favors latency-sensitive operations
Uses disk space to buffer against spikes in activity
OBSERVATIONS
IO FLOW
① Receive record and add to in-memory queue
② Persist record and acknowledge
③ Organize records and identify gaps in log
④ Gossip with peers to fill in holes
⑤ Coalesce log records into new data block versions
⑥ Periodically stage log and new block versions to Amazon
S3
⑦ Periodically garbage collect old versions
⑧ Periodically validate CRC codes on blocks

IO Traffic in Aurora Replicas
PAGE CACHE
UPDATE
Aurora Master
30% Read
70% Write
Aurora Replica
100% New Reads
Shared Multi-AZ Storage
PostgreSQL Master
30% Read
70% Write
PostgreSQL Replica
30% New Reads
70% Write
SINGLE-THREADED
WAL APPLY
Data Volume Data Volume
Physical: Ship redo (WAL) to Replica
Write workload similar on both instances
Independent storage
Physical: Ship redo (WAL) from Master to Replica
Replica shares storage: No writes performed
Cached pages have redo applied
Advance read view when all commits seen
POSTGRESQL READ SCALING AMAZON AURORA READ SCALING

Performance as opposed to PostgreSQL

System Configuration
PostgreSQL – Single AZ, no backup
Amazon Aurora
EBS EBS EBS
60,000 total IOPS
AZ 1 AZ 2 AZ 3
Amazon S3
r4.16xlarge
database
instance
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
r4.8xlarge
client
driver
r4.16xlarge
database
instance
r4.8xlarge
client
driver
r4.8xlarge
client
driver
Amazon S3
Amazon RDS
r4.16xlarge
30,000 iops

Amazon Aurora Is Faster on PgBench
Running the standard pgbench benchmark, Amazon Aurora delivers
1.6x the peak throughput of PostgreSQL and 2.9x at high client counts
0
5
10
15
20
25
30
35
40
45
128 256 512 768 1024 1280 1536 1792 2048
Throughput(tps,thousands)
Number of clients
pgbench tpcb-like throughput, 150 GiB
PostgreSQL (Single AZ) Amazon Aurora (Three AZs)

Amazon Aurora Is 2x–5x Faster on Sysbench
0
20
40
60
80
100
120
140
256 512 768 1024 1280 1536 1792 2048 2305 2560
writes/second,thousands
Number of clients
sysbench write-only 30 GiB
PostgreSQL (Single AZ, No Backup) Amazon Aurora (Three AZs, Continuous Backup)
2.2x 5.3x
Running the standard sysbench benchmark, Amazon Aurora delivers
> 2x the absolute peak of PostgreSQL and 5x at high client counts.

Amazon Aurora Is 4x Faster at Large Scale
Scales from 1.8x to 4.4x better as database grows from 10 GiB to 100
GiB
74
49
30
136 134 131
0
20
40
60
80
100
120
140
160
10 GiB 30 GiB 100 GiB
writes/second,thousands
Database size
sysbench write-only
4.4x1.8x 2.8x

Amazon Aurora Improvements for GA
GA release improves performance on larger working sets
112
92
83
136 134 131
0
20
40
60
80
100
120
140
160
10 GiB 30 GiB 100 GiB
writes/sec.thousands
Database Size
sysbench write-only
Amazon Aurora Preview Amazon Aurora GA

Amazon Aurora Loads Data Faster
Database initialization is > 2x faster than PostgreSQL using the
standard pgbench benchmark
Copy In
Copy In
Vacuum
Vacuum
Index Build
Index Build
0 500 1000 1500 2000 2500 3000 3500 4000
PostgreSQL
Amazon Aurora
Runtime (seconds)
pgbench initialization, scale 10000 (150 GiB)
86% reduction in vacuum time

Amazon Aurora Gives > 2x Lower Response Times
Response time under heavy write load > 2x lower than PostgreSQL,
variance reduced by 99%
0.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Responsetime,ms
Minutes
SYSBENCH RESPONSE TIME (p95), 30 GiB, 1024 CLIENTS

Amazon Aurora Has More Consistent Throughput
While running pgbench at load, throughput is 3x more consistent
than PostgreSQL
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
10 15 20 25 30 35 40 45 50 55 60
Throughput,tps
Minutes
pgbench throughput over time, 150 GiB, 1024 clients
PostgreSQL (Single AZ) Amazon Aurora (Three AZs)

Amazon Aurora Reduced Recovery Time Up to 97%
Transaction-aware storage system recovers almost instantly
3 GiB redo
recovered in 19 seconds
10 GiB redo
30 GiB redo
Amazon Aurora has no redo.
Recovered in 3 seconds while maintaining
significantly greater throughput.
0
20
40
60
80
100
120
140
160
0 20000 40000 60000 80000 100000 120000 140000
Recoverytimeinseconds(lessisbetter)
writes/second (more is better)
Recovery time from crash under load
Bubble size represents redo log which must be recovered
As PostgreSQL throughput
goes up, so does log size
and crash recovery time.

Performance by the Numbers
Measurement Result
PgBench Up to 2.9x faster
Sysbench 2x–5x faster
Vacuum Time reduced up to 86%
Response time > 2x lower, 99% < variance
Throughput jitter 3x more consistent
Throughput at scale 4x faster
Recovery speed Time reduced up to 97%

Database Migration

DB migration is part of the cloud journey
How quickly and easily can I migrate my on-premises data to the cloud?
How can I minimize application downtime during database migration?
Are there ways to automate schema conversion and conflict analysis?
How can my databases be merged or modularized during the migration for data
consolidation and application refactoring?
Can I migrate off commercial databases?

What are DMS and SCT?
AWS Database Migration Service (DMS) easily and securely
migrates and/or replicates your databases and data
warehouses to AWS
AWS Schema Conversion Tool (SCT) converts your commercial
database and data warehouse schemas to open-source engines or
AWS-native services, such as Amazon Aurora and Amazon Redshift
We have migrated over 45,000 unique databases. And counting…

Introduction to Performance Insights
Kyle Hailey

What Is Performance Insights?
Customers ask for
• Visibility into performance of RDS databases
• Want to optimize cloud database workloads
• Easy tool
• Often only part time DBA or no DBA
• Single Pane of glass

First Step: RDS Enhanced Monitoring
Released 2016
OS Metrics
Process/thread List
Down to 1 second granularity

Introducing: Performance Insights
Dashboard
• DB load
• Adjustable timeframe
• Filterable by attribute (SQL, User, Host, Wait)
• SQL causing load
Phased RDS delivery
• Aurora, MySQL/MariaDB, PostgreSQL, Commercial DB
Guided discovery of performance problems
• For both beginners and experts
• Core metric “Database Load”

What Is “Database Load”?
All engines have a connections list showing
• active
• Idle
We sample every second
• For each active session, collect
• SQL,
• State :CPU, I/O, Lock, Commit log wait, and so on
• Host
• User
Expose as “Average Active Sessions” (AAS)

Performance Insights Dashboard

Sampling Every Second
Query run often
Fast query run rarely
Slow query
User 1
User 2
User 3
Time

Sampling Is Like a Movie

Load Chart: Average Active Sessions(AAS)
User 1
User 2
User 3
User 4
Active Sessions
=
1
2
3
4

Queries Running Indicate Active Sessions
idleidle idle idleQuery 1 Query 2 Query 3
Time

Different Active Session States
CPU IO Wait
idleidle idle idleQuery 1 Query 2 Query 3
Time

AAS by Session State

Showing Per Second Samples

AAS over 1-Minute Averages

AAS Compared to Max CPU
Max vCPU

Use CPU Count As Yardstick
AAS < 1
• Database is not blocked
AAS ~= 0
• Database basically idle
• Problems are in the APP not DB
AAS < # of CPUs
• CPU available
• Are any single sessions 100% active?
AAS > # of CPUs
• Could have performance problems
AAS >> # of CPUS
• There is a bottleneck

Idle Database
Bottleneck is not the database

Performance Insights Useful for Sizing
Load << less than #vCPU: oversized
Load > #vCPU: undersized

Access to Performance Insights

Access to Performance Insights
High
Load

Summary: Amazon RDS Performance Insights
 DB Load : Average Active Sessions
 Identifies database bottlenecks
 Easy
 Powerful
 Top SQL
 Identifies source of bottleneck
 Enables Problem Discovery
 Adjustable Time frame
 Hour, day, week and longer

pgbench
Gowri Subramanian

pgbench
 Simple program for running benchmark tests on PostgreSQL
 Runs the same sequence of SQL commands using multiple concurrent database
sessions
 Actual Pgbench schema is like TPC-B (we will use migrated HR schema for load
testing)
 Provides average transaction rate (transactions per second)
Sample pgbench command
pgbench -h <Cluster_Endpoint> -n -U <user> -d <db_name> -c <No of sessions> -T
<duration in seconds> -f <SQL File>

pgbench—Sample Output
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 300 s
number of transactions actually processed: 35287*5
tps = 117.574256 (including connections establishing)
tps = 119.573901 (excluding connections establishing)
statement latencies in milliseconds:
83.602274select add_employee_data(5);

Aurora Best Practices
Jignesh Shah

PostgreSQL Events and Logs

Read Replica Lag
Increase wal_keep_segments to allow replicas to catch up after
interruption
Use wal_compression = on
Use higher checkpoint_timeout
Use hot_standby_feedback = on on read replicas
Use similar sized Instance size
Lag for replicas vs long running queries on replicas
• max_standby_archive_delay
• max_standby_streaming_delay

CloudWatch—Replication Lag

Fast Failover Using JDBC Driver
Aggressively set TCP keepalives parameters
• to ensure long running queries are killed before read timeout
expires in case of failover
Reducing DNS caching timeouts in Java application
• to ensure Aurora Read-Only Endpoint can properly cycle through
read-only nodes on subsequent connection attempts
Use the provided read and write Aurora endpoints
• to establish a connection to the cluster
Use RDS APIs to test application response on server-side failures
Use a package dropping tool to test application response for client-side
failures

Maintenance: Bloat Management
MVCC in PostgreSQL
• DELETE operations mark a row as dead
• UPDATE operations are essentially
• Marking current row version as dead
• Row with new version is added
• Eventually, all dead rows need to be garbage collected
VACUUM utility
• Cleans up dead rows but does not release space to OS
• FULL option also reorganizes the table to release space to OS

AUTOVACUUM Tuning
For 24/7 constant load on Database server
• AUTOVACUUM may not get a chance to finish its job
For high number of tables
• Increase autovacuum_max_workers from default 3 to higher number
• Increase autovacuum_vacuum_cost_limit
• Note: With this change, there may be performance impact
For large tables use
• Decrease autovacuum_vacuum_scale_factor from 0.2 (20%) to 0.05
(5%)
• ALTER TABLE myablename SET autovacuum_scale_factor = 0.02

Managing Bloat: Extension—pg_repack
Reorganizes Table with a very short lock
• Must have PRIMARY key or UNIQUE NOT-NULL key
Quick Setup
• Add pg_repack to shared_preload_libraries in a custom parameter group
in PostgreSQL 9.6 family
• CREATE EXTENSION pg_repack
• Use pg_repack client utility using rds_superuser privileges with –k option
pg_repack -h myproductiondb.cw7jjfgdr4on8.us-west-
2.rds.amazonaws.com -U pgadmin -k postgres

Avoiding Transaction ID Wraparound
Each unit of work (even read-only queries) is a transaction
Each transaction is tracked by transaction ID, which is 32 bit
Two billion “in-flight” un-vacuumed transactions before PostgreSQL takes
dramatic action to avoid data loss
If the number of un-vacuumed transactions reaches (2^31 - 1,000,000):
• PostgreSQL sets the database to read-only mode and requires an
offline, single-user, standalone vacuum

CloudWatch Metric—Max Used Trans IDs

Performance Tuning
Module—auto_explain (9.6.3+ only)
Verify auto_explain is in shared_preload_libraries
Set following values in db parameters
• auto_explain.log_min_duration = 5000
• auto_explain.log_nested_statements = on

Thank you!

DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora PostgreSQL-Compatible Edition

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora PostgreSQL-Compatible Edition

Similar to DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora PostgreSQL-Compatible Edition (20)

More from Amazon Web Services

More from Amazon Web Services (20)

DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora PostgreSQL-Compatible Edition