Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora is a disruptive technology in the database space, bringing a new architectural model and distributed system techniques to provide far higher performance, availability and durability than previously available using conventional monolithic database techniques. In this session, we will do a deep-dive into some of the key innovations behind Amazon Aurora, discuss best practices and configurations, and share customer experiences from the field.
Learning Objectives:
Learn how Amazon Aurora delivers 5x the performance and 1/10th the cost
Learn best practices for using Amazon Aurora
4. Expedia: Online travel marketplace
Real-time business intelligence and
analytics on a growing corpus of online
travel marketplace data.
Current Microsoft SQL Server–based
architecture is too expensive. Performance
degrades as data volume grows.
Cassandra with Solr index requires large
memory footprint and hundreds of nodes,
adding cost.
Aurora benefits:
Aurora meets scale and performance
requirements with much lower cost.
25,000 inserts/sec with peak up to 70,000.
30 millisecond average response time for
write and 17 millisecond for read, with 1
month of data.
World’s leading online travel company,
with a portfolio that includes 150+
travel sites in 70 countries.
5. Alfresco: Enterprise content management
Scaling Alfresco document repositories to
billions of documents.
Support user applications that require sub-
second response times.
Aurora benefits:
Scaled to 1 billion documents with a throughput
of 3 million per hour, which is 10 times faster
than their current environment.
Moving from large data centers to cost-effective
management with AWS and Aurora.
Leading the convergence of
enterprise content management and
business process management. More
than 1,800 organizations in 195
countries rely on Alfresco, including
leaders in financial services,
healthcare, and the public sector.
6. A service-oriented architecture applied to databases
Moved the logging and storage layer into a
multitenant, scale-out database-optimized
storage service.
Integrated with other AWS services like
Amazon EC2, Amazon VPC, Amazon
DynamoDB, Amazon SWF, and Amazon
Route 53 for control plane operations.
Integrated with Amazon S3 for continuous
backup with 99.999999999% durability.
Control PlaneData Plane
Amazon
DynamoDB
Amazon SWF
Amazon Route 53
Logging + Storage
SQL
Transactions
Caching
Amazon S3
1
2
3
7. SQL benchmark results
4 client machines with 1,000 connections each
WRITE PERFORMANCE READ PERFORMANCE
Single client machine with 1,600 connections
Using MySQL SysBench with Amazon Aurora R3.8XL with 32 cores and 244 GB RAM
8. Reproducing these results
https ://d0.a wsstat ic . com /product -m ark eting/Aurora /R DS_ Auro ra_Perf orm ance_Assessm ent_Benchm ark ing_v 1-2 .pdf
AMAZON
AURORA
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
Create an Amazon VPC (or use an existing one).
Create 4 EC2 R3.8XL client instances to run the SysBench
client. All 4 should be in the same Availability Zone (AZ).
Enable enhanced networking on your clients.
Tune Linux settings (see whitepaper referenced below).
Install SysBench version 0.5.
Launch a r3.8xlarge Amazon Aurora DB instance in the
same VPC and AZ as your clients.
Start your benchmark!
1
2
3
4
5
6
7
9. Beyond benchmarks
If only real-world applications saw benchmark performance.
POSSIBLE DISTORTIONS
Real-world requests contend with each other.
Real-world metadata rarely fits in the data dictionary cache.
Real-world data rarely fits in the buffer cache.
Real-world production databases need to run at high availability.
10. Scaling user connections
SysBench OLTP workload
250 tables
Connections Amazon Aurora
Amazon RDS MySQL
30 K IOPS (single AZ)
50 40,000 10,000
500 71,000 21,000
5,000 110,000 13,000
8x
UP TO
FASTER
11. Scaling table count
SysBench write-only workload
1,000 connections, default settings
Tables
Amazon
Aurora
MySQL
I2.8XL
local SSD
MySQL
I2.8XL
RAM disk
RDS
MySQL
30 K IOPS
(single AZ)
10 60,000 18,000 22,000 25,000
100 66,000 19,000 24,000 23,000
1,000 64,000 7,000 18,000 8,000
10,000 54,000 4,000 8,000 5,000
Number of write operations per second
11x
UP TO
FASTER
12. Scaling dataset size
SYSBENCH WRITE-ONLY
DB Size Amazon Aurora
RDS MySQL
30 K IOPS (single AZ)
1GB 107,000 8,400
10GB 107,000 2,400
100GB 101,000 1,500
1TB 26,000 1,200
67x
U P TO
FA STER
DB Size Amazon Aurora
RDS MySQL
30K IOPS (single AZ)
80GB 12,582 585
800GB 9,406 69
CLOUDHARMONY TPC-C
136x
U P TO
FA STER
13. Running with read replicas
SysBench write-only workload
250 tables
Updates per
second Amazon Aurora
RDS MySQL
30 K IOPS (single AZ)
1,000 2.62 ms 0 s
2,000 3.42 ms 1 s
5,000 3.94 ms 60 s
10,000 5.38 ms 300 s
500x
UP TO
LOWER LAG
14. Do fewer I/Os
Minimize network packets
Cache prior results
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
How do we achieve these results?
DATABASES ARE ALL ABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
15. I/O traffic in RDS MySQL
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R IT E
MYSQL WITH STANDBY
EBS mirrorEBS mirror
AZ 1 AZ 2
Amazon S3
EBS
Amazon Elastic
Block Store (EBS)
Primary
Instance
Standby
Instance
1
2
3
4
5
Issue write to Amazon EBS—EBS issues to mirror,
acknowledge when both done.
Stages write to standby instance using DRBD.
Issues write to EBS on standby instance.
I/O FLOW
Steps 1, 3, 5 are sequential and synchronous.
This amplifies both latency and jitter.
Many types of write operations for each user operation.
Have to write data blocks twice to avoid torn write operations.
OBSERVATIONS
780 K transactions.
7,388 K I/Os per million transactions (excludes mirroring,
standby).
Average 7.4 I/Os per transaction.
PERFORMANCE
30 minute SysBench write-only workload, 100 GB dataset, RDS Single AZ, 30 K
PIOPS
16. I/O traffic in Aurora (database)
AZ 1 AZ 3
Primary
Instance
Amazon S3
AZ 2
Replica
Instance
AMAZON AURORA
ASYNC
4/6 QUORUM
DISTRIBUTED
WRITES
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R IT E S
30 minute SysBench writeonly workload, 100GB dataset
IO FLOW
Only write redo log records; all steps asynchronous.
No data block writes (checkpoint, cache replacement).
6x more log writes, but 9x less network traffic.
Tolerant of network and storage outlier latency.
OBSERVATIONS
27,378 K transactions 35x MORE
950K I/Os per 1M transactions (6x amplification) 7.7x LESS
PERFORMANCE
Boxcar redo log records—fully ordered by LSN.
Shuffle to appropriate segments—partially ordered.
Boxcar to storage nodes and issue write operations.
17. I/O traffic in Aurora (storage node)
LOG RECORDS
Primary
Instance
INCOMING QUEUE
STORAGE NODE
S3 BACKUP
1
2
3
4
5
6
7
8
UPDATE
QUEUE
ACK
HOT
LOG
DATA
BLOCKS
POINT IN TIME
SNAPSHOT
GC
SCRUB
COALESCE
SORT
GROUP
PEER TO PEER GOSSIPPeer
Storage
Nodes
All steps are asynchronous.
Only steps 1 and 2 are in the foreground latency path.
Input queue is 46x less than MySQL (unamplified, per node).
Favors latency-sensitive operations.
Use disk space to buffer against spikes in activity.
OBSERVATIONS
I/O FLOW
① Receive record and add to in-memory queue.
② Persist record and acknowledge.
③ Organize records and identify gaps in log.
④ Gossip with peers to fill in holes.
⑤ Coalesce log records into new data block versions.
⑥ Periodically stage log and new block versions to S3.
⑦ Periodically garbage-collect old versions.
⑧ Periodically validate CRC codes on blocks.
18. Asynchronous group commits
Read
Write
Commit
Read
Read
T1
Commit (T1)
Commit (T2)
Commit (T3)
LSN 10
LSN 12
LSN 22
LSN 50
LSN 30
LSN 34
LSN 41
LSN 47
LSN 20
LSN 49
Commit (T4)
Commit (T5)
Commit (T6)
Commit (T7)
Commit (T8)
LSN GROWTH
Durable LSN at head node
COMMIT QUEUE
Pending commits in LSN order
TIME
GROUP
COMMIT
TRANSACTIONS
Read
Write
Commit
Read
Read
T1
Read
Write
Commit
Read
Read
Tn
TRADITIONAL APPROACH AMAZON AURORA
Maintain a buffer of log records to write out to disk.
Issue write operations when buffer is full, or time out waiting for
write operations.
First writer has latency penalty when write rate is low.
Request I/O with first write, fill buffer till write picked up.
Individual write durable when 4 of 6 storage nodes
acknowledge.
Advance DB durable point up to earliest pending
acknowledgement.
19. Re-entrant connections multiplexed to active threads.
Kernel-space epoll() inserts into latch-free event queue.
Dynamically size threads pool.
Gracefully handles 5000+ concurrent client sessions on r3.8xl.
Standard MySQL—one thread per connection.
Doesn’t scale with connection count.
MySQL EE—connections assigned to thread group.
Requires careful stall threshold tuning.
CLIENTCONNECTION
CLIENTCONNECTION
LATCH FREE
TASK QUEUE
epoll()
MYSQL THREAD MODEL AURORA THREAD MODEL
Adaptive thread pool
20. I/O Traffic in Aurora (read replica)
PAGE CACHE
UPDATE
Aurora Master
30% Read
70% Write
Aurora Replica
100% New Reads
Shared Multi-AZ Storage
MySQL Master
30% Read
70% Write
MySQL Replica
30% New Reads
70% Write
SINGLE-THREADED
BINLOG APPLY
Data Volume Data Volume
Logical: Ship SQL statements to replica.
Write workload similar on both instances.
Independent storage.
Can result in data drift between master and replica.
Physical: Ship redo from master to replica.
Replica shares storage. No writes performed.
Cached pages have redo applied.
Advance read view when all commits seen.
MYSQL READ SCALING AMAZON AURORA READ SCALING
22. Performance Best Practices
MySQL/RDBMS practices still apply
Choose the right tool for the right job (OLAP vs OLTP vs NoSQL)
Create appropriate indexes
Tune your SQL code, use explain plans, performance schema
Many more…..
Leverage high concurrency
Aurora throughput increases with number of connections
Architect your applications to leverage high concurrency in Aurora
Read Scaling
Aurora offers read replicas with virtually no replication lag
Leverage multiple read replicas to distribute your reads
23. Performance Best Practices
Parameter tuning
No need to migrate your performance-related MySQL parameters to Aurora
Aurora Parameter Groups are pre-tuned and already optimal in most cases
Performance comparison
Don’t obsess over individual metrics (CPU, IOPS, IO throughput)
Focus on what matters i.e. application performance
Other best practices
Keep query cache on
Leverage CloudWatch metrics
24. Advanced monitoring
50+ system/OS metrics | sorted process list view | 1-60 sec granularity
alarms on specific metrics | egress to CloudWatch Logs | integration with 3rd-party tools
ALARM
25. Important systems and OS metrics
User
System
Wait
IRQ
Idle
Nice
Steal
CPU Utilization
Rx per declared ethn
Tx per declared ethn
Network
Sleeping
Running
Total
Stopped
Blocked
Zombie
Processes
Process ID
Process name
VSS
Res
Mem %
consumed
CPU % used
CPU time
Parent ID
Process List
Free
Cached
Buffered
Total
Writeback
Inactive
Dirty
Mapped
Slab
Page tables
Huge pages free
Huge pages rsvd
Huge pages surp
Huge pages size
Huge pages total
Swap
Swap free
Swap committed
Memory
Read latency
Write latency
Read throughput
Write throughput
Read IO/s
Write IO/s
Queue depth
Read queue depth
Write queue depth
Free local storage
Device IO
Used
Total
Used Inodes/%
Max Inodes/%
File System
1 min
5 min
15 min
Load Average
26. Important database metrics
View database level metrics
from Amazon Aurora and
CloudWatch console
Perform retroactive workload
analysis
Select throughput
Select latency
DML throughput
DML latency
Commit throughput
Commit latency
DDL throughput
DDL latency
DB connections
Active connections
Login failures
Buffer cache hit ratio
Resultset cache hit
ratio
Deadlocks
Blocked transactions
Failed SQL statements
Replica lag
Replica lag maximum
Replica lag minimum
Free local storage
27. Analysis example
SELECT traffic reached
~74,500 SELECTs per second
during the first period.
SELECT statements were
extremely quick: their average
latency was under 0.3
milliseconds (0.0003
seconds).
Conclusion: The server
experienced a storm of very
short SELECT statements.
29. Storage node availability
Quorum system for read/write; latency tolerant.
Peer-to-peer gossip replication to fill in holes.
Continuous backup to S3 (designed for 11 9s durability).
Continuous scrubbing of data blocks.
Continuous monitoring of nodes and disks for repair.
10 GB segments as unit of repair or hotspot rebalance to
quickly rebalance load.
Quorum membership changes do not stall write operations.
AZ 1 AZ 2 AZ 3
Amazon
S3
30. Traditional databases
Have to replay logs since the last
checkpoint.
Typically 5 minutes between checkpoints.
Single-threaded in MySQL; requires a
large number of disk accesses.
Amazon Aurora
Underlying storage replays redo records
on demand as part of a disk read.
Parallel, distributed, asynchronous.
No replay for startup.
Checkpointed Data Redo Log
Crash at T0 requires
a re-application of the
SQL in the redo log since
last checkpoint.
T0 T0
Crash at T0 will result in redo logs being
applied to each segment on demand, in
parallel, asynchronously.
Instant crash recovery
31. Survivable caches
We moved the cache out of the
database process.
Cache remains warm in the event
of a database restart.
Lets you resume fully loaded
operations much faster.
Instant crash recovery +
survivable cache = quick and
easy recovery from DB failures.
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
Caching process is outside the DB process
and remains warm across a database restart.
32. Faster, more predictable failover
App
RunningFailure Detection DNS Propagation
Recovery Recovery
DB
Failure
MYSQL
App
Running
Failure Detection DNS Propagation
Recovery
DB
Failure
AURORA WITH MARIADB DRIVER
1 5 – 2 0 s e c .
3 – 2 0 s e c .
33. High availability with Read Replicas
Amazon S3
AZ 1 AZ 2 AZ 3
Aurora Primary
instance
Cluster volume spans 3 AZs
Aurora Replica Aurora Replica
db.r3.8xlarge db.r3.2xlarge
Priority: tier-1
db.r3.8xlarge
Priority: tier-0
34. High availability with Read Replicas
Amazon S3
AZ 1 AZ 2 AZ 3
Aurora Primary
instance
Cluster volume spans 3 AZs
Aurora Replica
Aurora Primary
Instance
db.r3.8xlarge db.r3.2xlarge
Priority: tier-1
db.r3.8xlarge
35. ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}]
ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN
[DISK index | NODE index] FOR INTERVAL interval
ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type
[TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval
Simulate failures using SQL
To cause the failure of a component at the database node:
To simulate the failure of disks:
To simulate the failure of networking: