SlideShare a Scribd company logo
Steve Abraham, Solutions Architect
September 2016
Deep Dive on Amazon Aurora
MySQL-compatible relational database
Performance and availability of
commercial databases
Simplicity and cost-effectiveness of
open-source databases
What is Amazon Aurora?
Fastest growing
service in AWS
history
Business applications
Web and mobile
Content management
E-commerce, retail
Internet of Things
Search, advertising
BI, analytics
Games, media
Aurora customer adoption
A service-oriented architecture applied to databases
Moved the logging and storage layer into a
multitenant, scale-out database-optimized
storage service
Integrated with other AWS services like
Amazon EC2, Amazon VPC, Amazon
DynamoDB, Amazon SWF, and Amazon
Route 53 for control plane operations
Integrated with Amazon S3 for continuous
backup with 99.999999999% durability
Control planeData plane
Amazon
DynamoDB
Amazon SWF
Amazon Route 53
Logging + Storage
SQL
Transactions
Caching
Amazon S3
1
2
3
SQL benchmark results
4 client machines with 1,000 connections each
WRITE PERFORMANCE READ PERFORMANCE
Single client machine with 1,600 connections
Using MySQL SysBench with Amazon Aurora R3.8XL with 32 cores and 244 GB RAM
Reproducing these results
https ://d0.a wsstat ic . com /product -m ark eting/Aurora /R DS_ Auro ra_Perf orm ance_Assessm ent_Benchm ark ing_v 1-2 .pdf
AMAZON
AURORA
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
• Create an Amazon VPC (or use an existing one).
• Create 4 EC2 R3.8XL client instances to run the SysBench
client. All 4 should be in the same Availability Zone (AZ).
• Enable enhanced networking on your clients.
• Tune Linux settings (see whitepaper referenced below).
• Install SysBench version 0.5.
• Launch a r3.8xlarge Amazon Aurora DB instance in the
same VPC and AZ as your clients.
• Start your benchmark!
1
2
3
4
5
6
7
Performance best practices
MySQL and RDBMS practices still apply
 Choose the right tool for the right job (OLAP vs. OLTP vs. NoSQL)
 Create appropriate indexes
 Tune your SQL code, use explain plans, performance schema
 Many more…
Leverage high concurrency
 Aurora throughput increases with number of connections
 Architect your applications to leverage high concurrency in Aurora
Read scaling
 Aurora offers read replicas with virtually no replication lag
 Leverage multiple read replicas to distribute your reads
Performance best practices
Parameter tuning
 No need to migrate your performance-related MySQL parameters to Aurora
 Aurora parameter groups are pre-tuned and already optimal in most cases
Performance comparison
 Don’t obsess over individual metrics (CPU, IOPS, I/O throughput)
 Focus on what matters—that is, application performance
Other best practices
 Keep query cache on
 Leverage Amazon CloudWatch metrics
Advanced monitoring
50+ system/OS metrics | sorted process list view | 1–60 sec granularity
Alarms on specific metrics | egress to CloudWatch Logs | integration with third-party tools
ALARM
Important systems and OS metrics
User
System
Wait
IRQ
Idle
Nice
Steal
CPU utilization
Rx per declared ethn
Tx per declared ethn
Network
Sleeping
Running
Total
Stopped
Blocked
Zombie
Processes
Process ID
Process name
VSS
Res
Mem %
consumed
CPU % used
CPU time
Parent ID
Process list
Free
Cached
Buffered
Total
Writeback
Inactive
Dirty
Mapped
Slab
Page tables
Huge pages free
Huge pages rsvd
Huge pages surp
Huge pages size
Huge pages total
Swap
Swap free
Swap committed
Memory
Read latency
Write latency
Read throughput
Write throughput
Read I/O/sec
Write I/O/sec
Queue depth
Read queue depth
Write queue depth
Free local storage
Device I/O
Used
Total
Used Inodes/%
Max Inodes/%
File system
1 min
5 min
15 min
Load average
Important database metrics
 View database level metrics
from Aurora and CloudWatch
console
 Perform retroactive workload
analysis
Select throughput
Select latency
DML throughput
DML latency
Commit throughput
Commit latency
DDL throughput
DDL latency
DB connections
Active connections
Login failures
Buffer cache hit ratio
Resultset cache hit
ratio
Deadlocks
Blocked transactions
Failed SQL statements
Replica lag
Replica lag maximum
Replica lag minimum
Free local storage
Beyond benchmarks
If only real-world applications saw benchmark performance
POSSIBLE DISTORTIONS
Real-world requests contend with each other
Real-world metadata rarely fits in the data dictionary cache
Real-world data rarely fits in the buffer cache
Real-world production databases need to run at high availability
Scaling user connections
SysBench OLTP workload
250 tables
Connections Amazon Aurora
Amazon RDS MySQL
30 K IOPS (single AZ)
50 40,000 10,000
500 71,000 21,000
5,000 110,000 13,000
8x
UP TO
FA STER
Scaling table count
SysBench write-only workload
1,000 connections, default settings
Tables
Amazon
Aurora
MySQL
I2.8XL
local SSD
MySQL
I2.8XL
RAM disk
RDS
MySQL
30 K IOPS
(single AZ)
10 60,000 18,000 22,000 25,000
100 66,000 19,000 24,000 23,000
1,000 64,000 7,000 18,000 8,000
10,000 54,000 4,000 8,000 5,000
11x
UP TO
FA STER
Number of write operations per second
Scaling dataset size
SYSBENCH WRITE-ONLY
DB size Amazon Aurora
RDS MySQL
30 K IOPS (single AZ)
1 GB 107,000 8,400
10 GB 107,000 2,400
100 GB 101,000 1,500
1 TB 26,000 1,200
67x
U P TO
FA STER
DB size Amazon Aurora
RDS MySQL
30K IOPS (single AZ)
80 GB 12,582 585
800 GB 9,406 69
CLOUDHARMONY TPC-C
136x
U P TO
FA STER
Running with read replicas
SysBench write-only workload
250 tables
Updates per
second Amazon Aurora
RDS MySQL
30 K IOPS (single AZ)
1,000 2.62 ms 0 s
2,000 3.42 ms 1 s
5,000 3.94 ms 60 s
10,000 5.38 ms 300 s
500x
UP TO
LOW ER LA G
Do fewer I/Os
Minimize network packets
Cache prior results
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
How do we achieve these results?
DATABASES ARE ALL ABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
Aurora cluster
Amazon S3
AZ 1 AZ 2 AZ 3
Aurora primary
instance
Cluster volume spans 3 AZs
Aurora cluster with replicas
Amazon S3
AZ 1 AZ 2 AZ 3
Aurora primary
instance
Cluster volume spans 3 AZs
Aurora Replica Aurora Replica
I/O traffic in RDS MySQL
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R IT E
MYSQL WITH STANDBY
EBS mirrorEBS mirror
AZ 1 AZ 2
Amazon S3
EBS
Amazon Elastic
Block Store (EBS)
Primary
instance
Standby
instance
1
2
3
4
5
Issue write to Amazon EBS—EBS issues to mirror, acknowledge
when both done
Stages write to standby instance using storage level replication
Issues write to EBS on standby instance
I/O FLOW
Steps 1, 3, 5 are sequential and synchronous
This amplifies both latency and jitter
Many types of write operations for each user operation
Have to write data blocks twice to avoid torn write operations
OBSERVATIONS
780 K transactions
7,388 K I/Os per million transactions (excludes mirroring, standby)
Average 7.4 I/Os per transaction
PERFORMANCE
30 minute SysBench write-only workload, 100 GB dataset, RDS Single AZ, 30 K PIOPS
I/O traffic in Aurora (database)
AZ 1 AZ 3
Primary
instance
Amazon S3
AZ 2
Replica
instance
AMAZON AURORA
ASYNC
4/6 QUORUM
DISTRIBUTED
WRITES
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T Y P E O F W R IT E S
30 minute SysBench writeonly workload, 100GB dataset
IO FLOW
Only write redo log records; all steps asynchronous
No data block writes (checkpoint, cache replacement)
6x more log writes, but 9x less network traffic
Tolerant of network and storage outlier latency
OBSERVATIONS
27,378 K transactions 35x MORE
950K I/Os per 1M transactions (6x amplification) 7.7x LESS
PERFORMANCE
Boxcar redo log records—fully ordered by LSN
Shuffle to appropriate segments—partially ordered
Boxcar to storage nodes and issue write operations
I/O traffic in Aurora (storage node)
LOG RECORDS
Primary
instance
INCOMING QUEUE
STORAGE NODE
S3 BACKUP
1
2
3
4
5
6
7
8
UPDATE
QUEUE
ACK
HOT
LOG
DATA
BLOCKS
POINT IN TIME
SNAPSHOT
GC
SCRUB
COALESCE
SORT
GROUP
PEER TO PEER GOSSIPPeer
storage
nodes
All steps are asynchronous
Only steps 1 and 2 are in the foreground latency path
Input queue is 46x less than MySQL (unamplified, per node)
Favors latency-sensitive operations
Use disk space to buffer against spikes in activity
OBSERVATIONS
I/O FLOW
① Receive record and add to in-memory queue
② Persist record and acknowledge
③ Organize records and identify gaps in log
④ Gossip with peers to fill in holes
⑤ Coalesce log records into new data block versions
⑥ Periodically stage log and new block versions to S3
⑦ Periodically garbage-collect old versions
⑧ Periodically validate CRC codes on blocks
Asynchronous group commits
Read
Write
Commit
Read
Read
T1
Commit (T1)
Commit (T2)
Commit (T3)
LSN 10
LSN 12
LSN 22
LSN 50
LSN 30
LSN 34
LSN 41
LSN 47
LSN 20
LSN 49
Commit (T4)
Commit (T5)
Commit (T6)
Commit (T7)
Commit (T8)
LSN GROWTH
Durable LSN at head node
COMMIT QUEUE
Pending commits in LSN order
TIME
GROUP
COMMIT
TRANSACTIONS
Read
Write
Commit
Read
Read
T1
Read
Write
Commit
Read
Read
Tn
TRADITIONAL APPROACH AMAZON AURORA
Maintain a buffer of log records to write out to disk
Issue write operations when buffer is full, or time out waiting for
write operations
First writer has latency penalty when write rate is low
Request I/O with first write, fill buffer till write picked up
Individual write durable when 4 of 6 storage nodes acknowledge
Advance DB durable point up to earliest pending
acknowledgement
Re-entrant connections multiplexed to active threads
Kernel-space epoll() inserts into latch-free event queue
Dynamically size threads pool
Gracefully handles 5000+ concurrent client sessions on r3.8xl
Standard MySQL—one thread per connection
Doesn’t scale with connection count
MySQL EE—connections assigned to thread group
Requires careful stall threshold tuning
CLIENTCONNECTION
CLIENTCONNECTION
LATCH FREE
TASK QUEUE
epoll()
MYSQL THREAD MODEL AURORA THREAD MODEL
Adaptive thread pool
I/O traffic in Aurora (Aurora Replica)
Page cache
Update
Aurora master
30% read
70% write
Aurora Replica
100% new reads
Shared Multi-AZ storage
MySQL master
30% read
70% write
MySQL replica
30% new reads
70% write
Single-threaded
Binlog apply
Data volume Data volume
Logical: Ship SQL statements to replica
Write workload similar on both instances
Independent storage
Can result in data drift between master and
replica
Physical: Ship redo from master to replica
Replica shares storage; no writes performed
Cached pages have redo applied
Advance read view when all commits seen
MYSQL READ SCALING AMAZON AURORA READ SCALING
Availability
“Performance only matters if your database is up”
Storage node availability
Quorum system for read/write; latency tolerant
Peer-to-peer gossip replication to fill in holes
Continuous backup to S3 (designed for 11 9s
durability)
Continuous scrubbing of data blocks
Continuous monitoring of nodes and disks for repair
10 GB segments as unit of repair or hotspot
rebalance to quickly rebalance load
Quorum membership changes do not stall write
operations
AZ 1 AZ 2 AZ 3
Amazon
S3
Traditional databases
Have to replay logs since the last
checkpoint
Typically 5 minutes between checkpoints
Single-threaded in MySQL; requires a
large number of disk accesses
Amazon Aurora
Underlying storage replays redo records
on demand as part of a disk read
Parallel, distributed, asynchronous
No replay for startup
Checkpointed data Redo log
Crash at T0 requires
a reapplication of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo logs being
applied to each segment on demand, in
parallel, asynchronously
Instant crash recovery
Survivable caches
We moved the cache out of the
database process
Cache remains warm in the event
of a database restart
Lets you resume fully loaded
operations much faster
Instant crash recovery +
survivable cache = quick and
easy recovery from DB failures
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
Caching process is outside the DB process
and remains warm across a database restart
Faster, more predictable failover
App
runningFailure detection DNS propagation
Recovery Recovery
DB
failure
MySQL
App
running
Failure detection DNS propagation
Recovery
DB
failure
AURORA WITH MARIADB DRIVER
1 5 – 2 0 s e c .
3 – 2 0 s e c .
High availability with Aurora Replicas
Amazon S3
AZ 1 AZ 2 AZ 3
Aurora primary
instance
Cluster volume spans 3 AZs
Aurora Replica Aurora Replica
db.r3.8xlarge db.r3.2xlarge
Priority: tier-1
db.r3.8xlarge
Priority: tier-0
High availability with Aurora Replicas
Amazon S3
AZ 1 AZ 2 AZ 3
Aurora Primary
instance
Cluster volume spans 3 AZs
Aurora Replica
Aurora primary
instance
db.r3.8xlarge db.r3.2xlarge
Priority: tier-1
db.r3.8xlarge
ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}]
ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN
[DISK index | NODE index] FOR INTERVAL interval
ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type
[TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval
Simulate failures using SQL
To cause the failure of a component at the database node:
To simulate the failure of disks:
To simulate the failure of networking:
Thank You.

More Related Content

What's hot

What's hot (20)

What's New in Amazon RDS for Open-Source & Commercial Databases
What's New in Amazon RDS for Open-Source & Commercial DatabasesWhat's New in Amazon RDS for Open-Source & Commercial Databases
What's New in Amazon RDS for Open-Source & Commercial Databases
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
AWS 9월 웨비나 | Amazon Aurora Deep Dive
AWS 9월 웨비나 | Amazon Aurora Deep DiveAWS 9월 웨비나 | Amazon Aurora Deep Dive
AWS 9월 웨비나 | Amazon Aurora Deep Dive
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
 
AWS Summit Seoul 2023 | AWS에서 최소한의 비용으로 구현하는 멀티리전 DR 자동화 구성
AWS Summit Seoul 2023 | AWS에서 최소한의 비용으로 구현하는 멀티리전 DR 자동화 구성AWS Summit Seoul 2023 | AWS에서 최소한의 비용으로 구현하는 멀티리전 DR 자동화 구성
AWS Summit Seoul 2023 | AWS에서 최소한의 비용으로 구현하는 멀티리전 DR 자동화 구성
 
Amazon DynamoDB 키 디자인 패턴
Amazon DynamoDB 키 디자인 패턴Amazon DynamoDB 키 디자인 패턴
Amazon DynamoDB 키 디자인 패턴
 
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
 
Disaster Recovery Options with AWS
Disaster Recovery Options with AWSDisaster Recovery Options with AWS
Disaster Recovery Options with AWS
 
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
 
Running MariaDB in multiple data centers
Running MariaDB in multiple data centersRunning MariaDB in multiple data centers
Running MariaDB in multiple data centers
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
re:Invent 2022  DAT326 Deep dive into Amazon Aurora and its innovationsre:Invent 2022  DAT326 Deep dive into Amazon Aurora and its innovations
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
 
Amazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB DayAmazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB Day
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
SMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsSMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step Functions
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Databases on AWS Workshop.pdf
Databases on AWS Workshop.pdfDatabases on AWS Workshop.pdf
Databases on AWS Workshop.pdf
 
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
 

Viewers also liked

Giuseppe vaciago, Digital forensics e garanzie dell’indagato 2011 04 05
Giuseppe vaciago, Digital forensics e garanzie dell’indagato 2011 04 05Giuseppe vaciago, Digital forensics e garanzie dell’indagato 2011 04 05
Giuseppe vaciago, Digital forensics e garanzie dell’indagato 2011 04 05
Andrea Rossetti
 
Calendar
CalendarCalendar
Calendar
kumee
 

Viewers also liked (20)

Amazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database EngineAmazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database Engine
 
(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep Dive(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep Dive
 
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Amazon RDS for PostgreSQL - PGConf 2016
Amazon RDS for PostgreSQL - PGConf 2016 Amazon RDS for PostgreSQL - PGConf 2016
Amazon RDS for PostgreSQL - PGConf 2016
 
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
 
Deep Dive on Amazon Aurora - Covering New Feature Announcements
Deep Dive on Amazon Aurora - Covering New Feature AnnouncementsDeep Dive on Amazon Aurora - Covering New Feature Announcements
Deep Dive on Amazon Aurora - Covering New Feature Announcements
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Amazon Aurora New Features - September 2016 Webinar Series
Amazon Aurora New Features - September 2016 Webinar SeriesAmazon Aurora New Features - September 2016 Webinar Series
Amazon Aurora New Features - September 2016 Webinar Series
 
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon AuroraNEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
 
AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)
AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)
AWS re:Invent 2016: Deep Dive on Amazon Aurora (DAT303)
 
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
 
I am sailing
I am sailingI am sailing
I am sailing
 
ETE
ETEETE
ETE
 
Giuseppe vaciago, Digital forensics e garanzie dell’indagato 2011 04 05
Giuseppe vaciago, Digital forensics e garanzie dell’indagato 2011 04 05Giuseppe vaciago, Digital forensics e garanzie dell’indagato 2011 04 05
Giuseppe vaciago, Digital forensics e garanzie dell’indagato 2011 04 05
 
DesmedtXSB
DesmedtXSBDesmedtXSB
DesmedtXSB
 
Calendar
CalendarCalendar
Calendar
 

Similar to Deep Dive on Amazon Aurora

DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora P...
DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora P...DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora P...
DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora P...
Amazon Web Services
 

Similar to Deep Dive on Amazon Aurora (20)

Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
AWS November Webinar Series - Aurora Deep Dive
AWS November Webinar Series - Aurora Deep DiveAWS November Webinar Series - Aurora Deep Dive
AWS November Webinar Series - Aurora Deep Dive
 
What's New in Amazon Aurora
What's New in Amazon AuroraWhat's New in Amazon Aurora
What's New in Amazon Aurora
 
What’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQLWhat’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQL
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon Aurora
 
AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...
AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...
AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...
 
Amazon Aurora TechConnect
Amazon Aurora TechConnect Amazon Aurora TechConnect
Amazon Aurora TechConnect
 
AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Appli...
AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Appli...AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Appli...
AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Appli...
 
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
 
Introducing Amazon Aurora
Introducing Amazon AuroraIntroducing Amazon Aurora
Introducing Amazon Aurora
 
Amazon (AWS) Aurora
Amazon (AWS) AuroraAmazon (AWS) Aurora
Amazon (AWS) Aurora
 
Amazon Aurora for the Enterprise - August 2016 Monthly Webinar Series
Amazon Aurora for the Enterprise - August 2016 Monthly Webinar SeriesAmazon Aurora for the Enterprise - August 2016 Monthly Webinar Series
Amazon Aurora for the Enterprise - August 2016 Monthly Webinar Series
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon Aurora
 
DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora P...
DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora P...DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora P...
DAT340_Hands-On Journey for Migrating Oracle Databases to the Amazon Aurora P...
 
What’s New in Amazon Aurora
What’s New in Amazon AuroraWhat’s New in Amazon Aurora
What’s New in Amazon Aurora
 
What's New in Amazon Aurora
What's New in Amazon AuroraWhat's New in Amazon Aurora
What's New in Amazon Aurora
 
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon Aurora
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Deep Dive on Amazon Aurora

  • 1. Steve Abraham, Solutions Architect September 2016 Deep Dive on Amazon Aurora
  • 2. MySQL-compatible relational database Performance and availability of commercial databases Simplicity and cost-effectiveness of open-source databases What is Amazon Aurora?
  • 3. Fastest growing service in AWS history Business applications Web and mobile Content management E-commerce, retail Internet of Things Search, advertising BI, analytics Games, media Aurora customer adoption
  • 4. A service-oriented architecture applied to databases Moved the logging and storage layer into a multitenant, scale-out database-optimized storage service Integrated with other AWS services like Amazon EC2, Amazon VPC, Amazon DynamoDB, Amazon SWF, and Amazon Route 53 for control plane operations Integrated with Amazon S3 for continuous backup with 99.999999999% durability Control planeData plane Amazon DynamoDB Amazon SWF Amazon Route 53 Logging + Storage SQL Transactions Caching Amazon S3 1 2 3
  • 5. SQL benchmark results 4 client machines with 1,000 connections each WRITE PERFORMANCE READ PERFORMANCE Single client machine with 1,600 connections Using MySQL SysBench with Amazon Aurora R3.8XL with 32 cores and 244 GB RAM
  • 6. Reproducing these results https ://d0.a wsstat ic . com /product -m ark eting/Aurora /R DS_ Auro ra_Perf orm ance_Assessm ent_Benchm ark ing_v 1-2 .pdf AMAZON AURORA R3.8XLARGE R3.8XLARGE R3.8XLARGE R3.8XLARGE R3.8XLARGE • Create an Amazon VPC (or use an existing one). • Create 4 EC2 R3.8XL client instances to run the SysBench client. All 4 should be in the same Availability Zone (AZ). • Enable enhanced networking on your clients. • Tune Linux settings (see whitepaper referenced below). • Install SysBench version 0.5. • Launch a r3.8xlarge Amazon Aurora DB instance in the same VPC and AZ as your clients. • Start your benchmark! 1 2 3 4 5 6 7
  • 7. Performance best practices MySQL and RDBMS practices still apply  Choose the right tool for the right job (OLAP vs. OLTP vs. NoSQL)  Create appropriate indexes  Tune your SQL code, use explain plans, performance schema  Many more… Leverage high concurrency  Aurora throughput increases with number of connections  Architect your applications to leverage high concurrency in Aurora Read scaling  Aurora offers read replicas with virtually no replication lag  Leverage multiple read replicas to distribute your reads
  • 8. Performance best practices Parameter tuning  No need to migrate your performance-related MySQL parameters to Aurora  Aurora parameter groups are pre-tuned and already optimal in most cases Performance comparison  Don’t obsess over individual metrics (CPU, IOPS, I/O throughput)  Focus on what matters—that is, application performance Other best practices  Keep query cache on  Leverage Amazon CloudWatch metrics
  • 9. Advanced monitoring 50+ system/OS metrics | sorted process list view | 1–60 sec granularity Alarms on specific metrics | egress to CloudWatch Logs | integration with third-party tools ALARM
  • 10. Important systems and OS metrics User System Wait IRQ Idle Nice Steal CPU utilization Rx per declared ethn Tx per declared ethn Network Sleeping Running Total Stopped Blocked Zombie Processes Process ID Process name VSS Res Mem % consumed CPU % used CPU time Parent ID Process list Free Cached Buffered Total Writeback Inactive Dirty Mapped Slab Page tables Huge pages free Huge pages rsvd Huge pages surp Huge pages size Huge pages total Swap Swap free Swap committed Memory Read latency Write latency Read throughput Write throughput Read I/O/sec Write I/O/sec Queue depth Read queue depth Write queue depth Free local storage Device I/O Used Total Used Inodes/% Max Inodes/% File system 1 min 5 min 15 min Load average
  • 11. Important database metrics  View database level metrics from Aurora and CloudWatch console  Perform retroactive workload analysis Select throughput Select latency DML throughput DML latency Commit throughput Commit latency DDL throughput DDL latency DB connections Active connections Login failures Buffer cache hit ratio Resultset cache hit ratio Deadlocks Blocked transactions Failed SQL statements Replica lag Replica lag maximum Replica lag minimum Free local storage
  • 12. Beyond benchmarks If only real-world applications saw benchmark performance POSSIBLE DISTORTIONS Real-world requests contend with each other Real-world metadata rarely fits in the data dictionary cache Real-world data rarely fits in the buffer cache Real-world production databases need to run at high availability
  • 13. Scaling user connections SysBench OLTP workload 250 tables Connections Amazon Aurora Amazon RDS MySQL 30 K IOPS (single AZ) 50 40,000 10,000 500 71,000 21,000 5,000 110,000 13,000 8x UP TO FA STER
  • 14. Scaling table count SysBench write-only workload 1,000 connections, default settings Tables Amazon Aurora MySQL I2.8XL local SSD MySQL I2.8XL RAM disk RDS MySQL 30 K IOPS (single AZ) 10 60,000 18,000 22,000 25,000 100 66,000 19,000 24,000 23,000 1,000 64,000 7,000 18,000 8,000 10,000 54,000 4,000 8,000 5,000 11x UP TO FA STER Number of write operations per second
  • 15. Scaling dataset size SYSBENCH WRITE-ONLY DB size Amazon Aurora RDS MySQL 30 K IOPS (single AZ) 1 GB 107,000 8,400 10 GB 107,000 2,400 100 GB 101,000 1,500 1 TB 26,000 1,200 67x U P TO FA STER DB size Amazon Aurora RDS MySQL 30K IOPS (single AZ) 80 GB 12,582 585 800 GB 9,406 69 CLOUDHARMONY TPC-C 136x U P TO FA STER
  • 16. Running with read replicas SysBench write-only workload 250 tables Updates per second Amazon Aurora RDS MySQL 30 K IOPS (single AZ) 1,000 2.62 ms 0 s 2,000 3.42 ms 1 s 5,000 3.94 ms 60 s 10,000 5.38 ms 300 s 500x UP TO LOW ER LA G
  • 17. Do fewer I/Os Minimize network packets Cache prior results Offload the database engine DO LESS WORK Process asynchronously Reduce latency path Use lock-free data structures Batch operations together BE MORE EFFICIENT How do we achieve these results? DATABASES ARE ALL ABOUT I/O NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
  • 18. Aurora cluster Amazon S3 AZ 1 AZ 2 AZ 3 Aurora primary instance Cluster volume spans 3 AZs
  • 19. Aurora cluster with replicas Amazon S3 AZ 1 AZ 2 AZ 3 Aurora primary instance Cluster volume spans 3 AZs Aurora Replica Aurora Replica
  • 20. I/O traffic in RDS MySQL BINLOG DATA DOUBLE-WRITELOG FRM FILES T Y P E O F W R IT E MYSQL WITH STANDBY EBS mirrorEBS mirror AZ 1 AZ 2 Amazon S3 EBS Amazon Elastic Block Store (EBS) Primary instance Standby instance 1 2 3 4 5 Issue write to Amazon EBS—EBS issues to mirror, acknowledge when both done Stages write to standby instance using storage level replication Issues write to EBS on standby instance I/O FLOW Steps 1, 3, 5 are sequential and synchronous This amplifies both latency and jitter Many types of write operations for each user operation Have to write data blocks twice to avoid torn write operations OBSERVATIONS 780 K transactions 7,388 K I/Os per million transactions (excludes mirroring, standby) Average 7.4 I/Os per transaction PERFORMANCE 30 minute SysBench write-only workload, 100 GB dataset, RDS Single AZ, 30 K PIOPS
  • 21. I/O traffic in Aurora (database) AZ 1 AZ 3 Primary instance Amazon S3 AZ 2 Replica instance AMAZON AURORA ASYNC 4/6 QUORUM DISTRIBUTED WRITES BINLOG DATA DOUBLE-WRITELOG FRM FILES T Y P E O F W R IT E S 30 minute SysBench writeonly workload, 100GB dataset IO FLOW Only write redo log records; all steps asynchronous No data block writes (checkpoint, cache replacement) 6x more log writes, but 9x less network traffic Tolerant of network and storage outlier latency OBSERVATIONS 27,378 K transactions 35x MORE 950K I/Os per 1M transactions (6x amplification) 7.7x LESS PERFORMANCE Boxcar redo log records—fully ordered by LSN Shuffle to appropriate segments—partially ordered Boxcar to storage nodes and issue write operations
  • 22. I/O traffic in Aurora (storage node) LOG RECORDS Primary instance INCOMING QUEUE STORAGE NODE S3 BACKUP 1 2 3 4 5 6 7 8 UPDATE QUEUE ACK HOT LOG DATA BLOCKS POINT IN TIME SNAPSHOT GC SCRUB COALESCE SORT GROUP PEER TO PEER GOSSIPPeer storage nodes All steps are asynchronous Only steps 1 and 2 are in the foreground latency path Input queue is 46x less than MySQL (unamplified, per node) Favors latency-sensitive operations Use disk space to buffer against spikes in activity OBSERVATIONS I/O FLOW ① Receive record and add to in-memory queue ② Persist record and acknowledge ③ Organize records and identify gaps in log ④ Gossip with peers to fill in holes ⑤ Coalesce log records into new data block versions ⑥ Periodically stage log and new block versions to S3 ⑦ Periodically garbage-collect old versions ⑧ Periodically validate CRC codes on blocks
  • 23. Asynchronous group commits Read Write Commit Read Read T1 Commit (T1) Commit (T2) Commit (T3) LSN 10 LSN 12 LSN 22 LSN 50 LSN 30 LSN 34 LSN 41 LSN 47 LSN 20 LSN 49 Commit (T4) Commit (T5) Commit (T6) Commit (T7) Commit (T8) LSN GROWTH Durable LSN at head node COMMIT QUEUE Pending commits in LSN order TIME GROUP COMMIT TRANSACTIONS Read Write Commit Read Read T1 Read Write Commit Read Read Tn TRADITIONAL APPROACH AMAZON AURORA Maintain a buffer of log records to write out to disk Issue write operations when buffer is full, or time out waiting for write operations First writer has latency penalty when write rate is low Request I/O with first write, fill buffer till write picked up Individual write durable when 4 of 6 storage nodes acknowledge Advance DB durable point up to earliest pending acknowledgement
  • 24. Re-entrant connections multiplexed to active threads Kernel-space epoll() inserts into latch-free event queue Dynamically size threads pool Gracefully handles 5000+ concurrent client sessions on r3.8xl Standard MySQL—one thread per connection Doesn’t scale with connection count MySQL EE—connections assigned to thread group Requires careful stall threshold tuning CLIENTCONNECTION CLIENTCONNECTION LATCH FREE TASK QUEUE epoll() MYSQL THREAD MODEL AURORA THREAD MODEL Adaptive thread pool
  • 25. I/O traffic in Aurora (Aurora Replica) Page cache Update Aurora master 30% read 70% write Aurora Replica 100% new reads Shared Multi-AZ storage MySQL master 30% read 70% write MySQL replica 30% new reads 70% write Single-threaded Binlog apply Data volume Data volume Logical: Ship SQL statements to replica Write workload similar on both instances Independent storage Can result in data drift between master and replica Physical: Ship redo from master to replica Replica shares storage; no writes performed Cached pages have redo applied Advance read view when all commits seen MYSQL READ SCALING AMAZON AURORA READ SCALING
  • 26. Availability “Performance only matters if your database is up”
  • 27. Storage node availability Quorum system for read/write; latency tolerant Peer-to-peer gossip replication to fill in holes Continuous backup to S3 (designed for 11 9s durability) Continuous scrubbing of data blocks Continuous monitoring of nodes and disks for repair 10 GB segments as unit of repair or hotspot rebalance to quickly rebalance load Quorum membership changes do not stall write operations AZ 1 AZ 2 AZ 3 Amazon S3
  • 28. Traditional databases Have to replay logs since the last checkpoint Typically 5 minutes between checkpoints Single-threaded in MySQL; requires a large number of disk accesses Amazon Aurora Underlying storage replays redo records on demand as part of a disk read Parallel, distributed, asynchronous No replay for startup Checkpointed data Redo log Crash at T0 requires a reapplication of the SQL in the redo log since last checkpoint T0 T0 Crash at T0 will result in redo logs being applied to each segment on demand, in parallel, asynchronously Instant crash recovery
  • 29. Survivable caches We moved the cache out of the database process Cache remains warm in the event of a database restart Lets you resume fully loaded operations much faster Instant crash recovery + survivable cache = quick and easy recovery from DB failures SQL Transactions Caching SQL Transactions Caching SQL Transactions Caching Caching process is outside the DB process and remains warm across a database restart
  • 30. Faster, more predictable failover App runningFailure detection DNS propagation Recovery Recovery DB failure MySQL App running Failure detection DNS propagation Recovery DB failure AURORA WITH MARIADB DRIVER 1 5 – 2 0 s e c . 3 – 2 0 s e c .
  • 31. High availability with Aurora Replicas Amazon S3 AZ 1 AZ 2 AZ 3 Aurora primary instance Cluster volume spans 3 AZs Aurora Replica Aurora Replica db.r3.8xlarge db.r3.2xlarge Priority: tier-1 db.r3.8xlarge Priority: tier-0
  • 32. High availability with Aurora Replicas Amazon S3 AZ 1 AZ 2 AZ 3 Aurora Primary instance Cluster volume spans 3 AZs Aurora Replica Aurora primary instance db.r3.8xlarge db.r3.2xlarge Priority: tier-1 db.r3.8xlarge
  • 33. ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}] ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN [DISK index | NODE index] FOR INTERVAL interval ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type [TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval Simulate failures using SQL To cause the failure of a component at the database node: To simulate the failure of disks: To simulate the failure of networking:

Editor's Notes

  1. Amazon Aurora is a MySQL-compatible, ACID compliant enterprise-grade relational database engine built for the cloud. In many ways,  Aurora is a game changer and helps overcome the limitations of traditional relational database engines. Most traditional databases have been built and optimized for on-premises environments, and the fundamentals of their architecture have not changed much in the last few decades. One of the key objectives of  Aurora is to overcome the performance, scalability, and availability limitations of traditional databases in a cost-effective manner similar to open-source databases. RDS automatic backups, security, compute scaling, storage auto-scaling, Durable, read replicas, fault tolerant, self healing, dms, schema conversion. instant crash recovery, survivable buffer cache.
  2. Aurora was made generally available in July 2015. It is the fastest growing service in AWS history. We wanted to provide compatibility with a popular database system. We chose MySQL5.6 InnoDB. No need to change applications. Along with its rapid growth, we have also seen different kinds of customers adopt it for a large variety of use cases from enterprise applications to web, mobile, IoT, content management, search, BI, gaming, media etc. Several public use cases. Talk about zynga…
  3. So here is the Aurora picture. We talked about layers of functionality in a Database. We’ve moved the logging and storage out into a multi tenant , scale out, database optimized storage service, which I will give you more details on shortly. The SQL and Transaction layers are still part of the database. Caching is also moved out . We also have integration with Amazon S3 for continuous backup with 13 9’s of durability, and integrated with other AWS services to take advantage of the cloud SQL will write redo log to storage tier which is now smart and interprets and persists redo log and generates data page from it. And backs up to S3. Control plane managed the DB Dynamodb for volume metadata. SWF for create instance, scale it etc. Route53 for failover. 1-2 minute time frame.
  4. Here is a Updated slide on aurora performance 100,000 write per sec, 500,000 reads – we are doing a little bit above that – on the mysql sysbench benchmark which is a standard, R38XL 32 cores, 244 GB RAM
  5. You can reproduce this yourself, all you have to do is provision an instance, we have a white paper that shows some of the things you should be doing to get high performance, so for example you want to run multiple run client machines, you should set enhanced networking on them, you need to tune your Linux settings to get the throughput you need, and run in the same Az (client and servers) and you should be able to get similar numbers. Performance is obviously lower as the instance types go down, as you would expect.
  6. The fact is that benchmarks don’t really matter, if only any of you were able to see the values in your real world applications, that were the same as benchmark values from the various database vendors, we wouldn’t be looking at a new database Why? There are a bunch of real world distortions that happen that are different than what we talk about as db vendorsdifferent than than what you see in the real world: For example requests contend with each other Metadata rarely fits in the data dictionary cache so you have to go out to disk Real world data almost never fits in the buffer cache, you have to go out to disk there too And production databases need to run HA, you aren’t going to run something on a single node So in the next few slides we are going to show you how we are doing with respect to some of these criteria relative to mysql 5.6
  7. Here we are looking at user connection scaling going up from 50 to 5000 using Aurora, and as you expect we are seeing increased throughput because Aurora is very much about trying to generate higher throughput Using the OLTP workload from sysbench which has a fair bit of contention baked in with 250 tables And If you look t RDS MySQL with 30,000 PIOPS which s about the biggest box I can put together, despite all the available capacity it starts to decline, which has to do with ,issues in lock sharding, and the how threading model works inside MySQL So in this case our best numbers are about 8x faster
  8. Lets look at table scaling when the tables no longer fit into cache. So as you go up from 10 tables to 10,000 tables, we definitely start seeing a little bit of a penalty in Aurora going from a peak of 64,000 requests per second on a write only workload, with 1000 connections, down to 54 k But you can see with MySQL it is doing about 5x worse as you go from 10 to 10000 tables. What is interesting to call out here is This has nothing to do with IO, the numbers whether you are looking at local SSD, a RAM disk or a fully provisioned PIOPS instance, are really all about the same. This is a testament to all the work the EBS team has done to insure they are operating at pretty high throughput compared to local disk, but it also shows contention points that are going on in the database tier.
  9. Let’s look at data that does not fit in cache Here we are scaling a db size up from 1 GB , which can certainly fit in most any cache, to a TB Aurora is basically running about the same performance as long as we are fitting in cache. Than it drops down as you would expect, once you no longer fit in the cache by about 4x Whats interesting with mysql is it starts declining well ahead of that point while the data still is fully cached. Int his case.we are running 67x faster, which is a crazy number, which made us decide to run something else, the Cloudharmony TPCC like benchmark which is pretty much like the std TPC-C benchmark the tpc council runs, and here it is running 136 x faster when it is not fitting in cache. And these are transaction on the right, and statements on the left because those are the two standards for these benchmarks
  10. If you want to run HA or high throughput read queries you will probably use read replicas Mysql RRs start to degrade, the lag between the master and slaves goes up, as you start to increase the updates per second through the system In Aurora we also use the RRs as failover targets as well. Going from 1000 to 10,000 updates per second we are relatively stable at about 5 ms of replica lag Whereas In MySQL we go from a few ms to 300 seconds
  11. For the bulk of this talk we want to talk about is HOW. The core of this comes down to improving your algorithms, sharding your locks, removing contention. And that is good, but anyone can do this – it is not architectural work Once you are done with that you are at the point where you are hitting IO. Well designed dbs are all about IO. Once you do network attached storage it is all about packets per second. You can do somewhere between 1 and 2 mil packets per sec on high end hw like EC2 enhanced networking, but that is still a pretty constrained number The other thing that is important if you want to operate at high throughput you cannot afford system calls or context switches. So a lot of our work that has been architectural has been around reducing Ios, minimizing network packets, and moving toward processing asynchrounously.
  12. Here we have a single Aurora instance shown with its storage layer. This is called an Aurora cluster. The cluster consists of one or more Aurora instances and the underlying shared storage layer which spans 3 azs. Here we are showing Aurora with one primary instance only, but you can have multiple replicas – up to 15 – in these 3 Azs.. Let’s take a look at the storage layer. Aurora maintains 6 copies of your data across three Availability Zones – by default – no matter if you have chosen to have additional Aurora instances – the storage is on a virtual disk that spans 3 availability zones. Because Aurora maintains multiple copies of your data in three Availability Zones, the chance of losing data as a result of a disk failure is minimized. Aurora automatically detects failures in the disk volumes that make up the cluster volume. When a segment of a disk volume fails, Aurora immediately repairs the segment. When Aurora repairs the disk segment, it uses the data in the other volumes that make up the cluster volume to make sure that the data in the repaired segment is current. As a result, Aurora avoids data loss and reduces the need to perform a point-in-time restore to recover from a disk failure. And by the way, at the storage layer, you are only billed for one of those copies, not six.( You are charged based on the storage your database consumes at the database layer, not the storage consumed in this virtualized storage layer.)
  13. For comparison, this is what your cluster would look like with additional Aurora replicas provisioned, here one in each AZ
  14. Lets compare and contrast : So here is io traffic in RDS MySQL with a standby instance. You issue a write, it goes to EBS, EBS issues to a mirror asynchronously in parallel. Once both writes have happened we synch are going to go to another AZ to do another write with DRBD. Then you go to the standby, it has to issue a write, and then it proprages and comes back. The interesting thing to point out here is steps 1,3 and 5 are sequential and synchronous. So if there is any network “weather” going on it is oing to amplify both the latency and the jitter – minimally ou are going to be looking at 3x and possibly much more than that. The other thing you see is there are a lot of diferent types of write operations: log, binlob, data, double write to avoid the torn write, and the frm files for metadata And the thing that is particularly difficult int his is the data blocs are a good deal larger than the log, about 40x, and you are writing them twice so that is 80x. So that is a big deal. But like any solid db mysql is going to be doing a lot of batching under the covers to try to mitigate that overhead. So what we did here is we ran a 30 minute write only SysBench workload, with 100GB data set, SingleAZ, 30K PIOPS –and we ran 780 k transactions. We saw over 7.4 million and that’s pretty oggd –that is the work the primary instance is seeing. that is steps 1. Steps 2,3,4,5 are not reflected. You want to focus on the network throughtput on that single instance. Ets compare it to aurora. Aurora – 27,378,519 transactions, 35x more. 158,323 operations – 46x fewer. Latency aggregates Jitter amplified Write both log and data. Data IO flow Observations Multiple sequential steps Latency and jitter amplify Double-writes because overwriting page Performance
  15. So let’s compare it to Aurora. So in Aurora what we are doing is we are only shipping log records. So if we see a bunch of log records come to us, we boxcar them together SysBench writeonly workload 100GB SingleAZ 30K PIOPS – 30 minutes 779,664 txns. 7,387,798 operations Aurora – 27,378,519 transactions, 35x more. 158,323 operations – 46x fewer. 949938
  16. Talk about inflow traffic seen – latency path is1. writing the log and returning. Then 2. sorting, grouping by block. 3. coalesce into blocks 4. generate snapshot for s3. 5. gc once in s3. 6. peer to peer replication to fill in holes. 7.
  17. User is only allowed to see acknowledgements back Each storage node is acknowledging back. Determine when we’ve made quorum per write. Determine when we’ve made quorum on enough writes to advance volume durable LSN See if any requests are pending commit acknowledgement Ack those back “group commits” are asynchronous, not managed/issued by the database. Receive ACKs back from Storage Nodes Consider durable when first 4 of 6 ACK Advance DB Durable LSN once no holes ACK all pending commits below new DB Durable
  18. Commits are async Locks we use growing threads Reads, we’re working on it Default mysql – 1 thread per connection Mysql ee adds a thread pool. Assigns connections to thread groups. One statement executes per group, remainder queue If beyond stall threshold or blocked, allow a second to execute Useful for short queries, homogenous pattern
  19. * Read replicas work * 2000 updates; MySQL is 2s behind * Oscar, you can run 10,000 updates and run 5ms behind * 400x better * You could scale together 16 of these * You also get instant failover/promotion if your primary goes down
  20. We’ve talked a little bit about the quorum system which makes it latency tolerant and we can lose a node or a disk or an AZ and it wills still run. We also do a gossip protocol to fill in holes in replication in case of network weather causing issues, etc WE continually backup to S3 with no latency penalty , it is always doing this We continuously scrub the data blocks so you know you have 6 copies of data under the covers This is part of a service offereing so we have continuous monitoring going on in the control palne so we can automate the process of detecting bad nodes or bad disks so you don’t have to do this. We break up into 10 GB segments on the disk so if something goes wrong the blast radius is small. Also if there are hot spots we can kill one of those segments and move it somewhere else, also In order to lsoe my 4 out of 6 quorum I need to have 3 separate failures over the time it takes us to repair one, and so that’s a pretty unfortunate day if that happens., so you can keep issuing writes even though under the covers you are changing nodes in the quorum’ The std approach would force a write to stall. We invested a lot into making quorum membership changes transparent In order to lsoe Volume expands, 6x replications, gossip communication,
  21. In the traditional approach to crash recovery is that you bring the db back up, you find the last checkpoint, and you replay all the logs forward from that point. They estimated it should be 5 minutes between checkpoints, because that is the tradeoff between the amt of mem you can afford, and disk latency, andiops etc, stayed true for 30 years. The crash recovery in mysql is single threaded so you may end up with 32 cores but one core applying the redo log. The other problem is that this stuff is not in cache so you are doing Ios, so there is nothing that guarantees it is 5 minutes – it often goes longer. If you look at aurora we have built a log structured storage system under the covers, so if you ask for a data block out of aurora storage it goes to get the data block, it checks to see if any log records have been issued since that point, and it will apply them before it gives the block to you. So the same process that is happening for foreground systems is happening for recovery. This is a parallel, distributed, asynch operation and you don’t have to do it before start up. I can just start up If you issue an IO it will do the same thing it would do in the normal course of behavior, it will apply those records to the block before returning it.
  22. We moved the buffer cache out of the db process so the cache stays warm (if you have to restart the db like if you ran into an app deadlock and you jave to bounce things ) and its hard to say how long it will be to warm up the cache again. So you’re all IO again. In the aurora case the cache is always warm so this makes db failures pretty fast Control plane applies to storage Use Icons Another Box Collapsing
  23. Timeline: traditional mysql Get a db failure Detect it – takes a bit to declare a node bad Recovery, then DNS cutover to move the ip from primary to secondary and that can also take a little while We compressed failure detection to 15-20 seconds because the cause of making a mistake isn’t large and the recovery is between 3-20 seconds – t depends on instance size because we have to look through the buffer cache.
  24. Start with the slide that we talked about earlier in our discussion but let’s dive deeper into how this can be configured and how a primary node failure would apply to the read replicas. Availability with read replicas. Read replicas are already there to support the ability for your database to be able to support a lot of read and write operations at the same time. They allow you to have an architecture in place that does not put all the responsibility on one master node. With Aurora read replicas also serve as your failover node when there is a problem with the primary. When the primary instance of an Aurora DB cluster fails, the primary is replaced if there are no read replicas, or if there are read replicas a replica is promoted to being the primary. When your read replicas are promoted to a primary all the things that happen in the slide (30) that talk about recovery happen. When the recovery happens the read replica is not the primary and it is not receiving the requests for reads and writes, as the primary node of your cluster. When you create your read replicas you can actually make them a different size than your primary node. Let's say that your primary is running as an R3.8xlarge in order to give you the performance that you need to support a lot of writes. Now you create a read replica which does not need to be as large since it is just supporting reads so you make it an R3.2xlarge. With read replicas you can also prioritize which of your read replicas takes priority when it comes to promoting the replica to a primary status. So if we look at the above scenario where we choose a different size for the read replica because it does not need to be as large since it is just supporting reads, maybe we end up with a total of three nodes in our cluster. A primary node that is an R3.8xlarge, One read replica that is also R3.8xlarge, and a read replica that is an R3.2xlarge. You assign a priority of 0 to the 8xlarge read replica and a priority of 1 to the 2xlarge. you place the R3.8xlarge Read replica in a separate availability zone from the primary to provide more HA. When a failure happens with the primary the priority will dictate that the other 8xlarge is next in line to be promoted to being the primary. If you ever have a case where there are two read replicas with the top priority, the instance type that is the larger of the two will take priority.
  25. Start with the slide that we talked about earlier in our discussion but let’s dive deeper into how this can be configured and how a primary node failure would apply to the read replicas