4. Traditional database architecture
Databases are all about I/O
Design principles for > 40 years
• Increase I/O bandwidth
• Decrease number of I/Os !
SQL
Transactions
Caching
Logging
Compute
Attached storage
6. Aurora approach: Log is the database
t5 can be created using log records from t1 and t5
7. Aurora : Offload checkpointing to the storage fleet
Problem 1:
Solution:
Problem 2:
Solution:
8. Aurora approach: compute & storage separation
Compute & storage have different lifetimes
Compute instances
• fail and are replaced
• are shut down to save cost
• are scaled up/down/out on the basis of load needs
SQL
Transactions
Caching
Logging
Compute
Network storage
Storage, on the other hand, has to be long-lived
Decouple compute and storage for scalability, availability, durability
9. Scale-out, distributed architecture
Purpose-built log-structured
distributed storage system
designed for databases
Storage volume is striped across
hundreds of storage nodes
distributed over 3 different
availability zones
Six copies of data, two copies in
each availability zone to protect
againstAZ+1 failures
Plan to apply same principles to
other layers of the stack
Master Replica ReplicaReplica
Availability
Zone 1
Shared storage volume
Availability
Zone 2
Availability
Zone 3
Storage nodes with SSDs
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
12. Aurora MySQL performance
WRITE PERFORMANCE READ PERFORMANCE
MySQL SysBench results; R4.16XL: 64cores
/ 488 GB RAM
Aurora read write throughput compared to MySQL 5.6
based on industry standard benchmarks.
Aurora MySQL 5.6
0
100000
200000
300000
400000
500000
600000
700000
0
50000
100000
150000
200000
250000
13. DO LESS WORK
Do fewer IOs
Minimize network packets
Cache prior results
Offload the database engine
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
Speed- How did we achieve this?
DATABASESARE ALL ABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING IS ALL ABOUT CONTEXT SWITCHES
14. BINL
OG
DAT
A
DOUBLE-
WRITE
LOG FRM FILES
TYPE OF WRITE
MYSQL WITH REPLICA
EBS mirrorEBS mirror
AZ 1 AZ 2
Amazon S3
EBS
Amazon Elastic
Block Store (EBS)
Primary
Instance
Replica
Instance
1
2
3
4
5
Issue write to EBS – EBS issues to mirror, ack when
both done
Stage write to standby instance through DRBD
Issue write to EBS on standby instance
IO FLOW
Steps 1, 3, 4 are sequential and synchronous
This amplifies both latency and jitter
Many types of writes for each user operation
Have to write data blocks twice to avoid torn writes
OBSERVATIONS
780K transactions
7,388K I/Os per million txns (excludes mirroring,
standby)
Average 7.4 I/Os per transaction
PERFORMANCE
30 minute SysBench writeonly workload, 100GB dataset, RDS MultiAZ,
30K PIOPS
IO Traffic in MySQL
15. AZ 1 AZ 3
Primary
Instance
Amazon S3
AZ 2
Replica
Instance
AMAZON AURORA
ASYNC
4/6 QUORUM
DISTRIBUTED
WRITES
BINLO
G
DATA DOUBLE-WRITELOG FRM FILES
TYPE OF WRITE
IO FLOW
Only write redo log records; all steps asynchronous
No data block writes (checkpoint, cache
replacement)
6X more log writes, but 9X less network traffic
Tolerant of network and storage outlier latency
OBSERVATIONS
27,378K transactions 35X
MORE
950K I/Os per 1M txns (6X amplification) 7.7X
LESS
PERFORMANCE
Boxcar redo log records – fully ordered by LSN
Shuffle to appropriate segments – partially ordered
Boxcar to storage nodes and issue writes
Replica
Instance
IO Traffic in Aurora – DB Engine
16. I/O flow in Amazon Aurora storage node
①Receive log records and add to in-memory queue
and durably persist log records
② ACK to the database
③ Organize records and identify gaps in log
④ Gossip with peers to fill in holes
⑤ Coalesce log records into new page versions
⑥ Periodically stage log and new page versions to S3
⑦ Periodically garbage collect old versions
⑧ Periodically validate CRC codes on blocks
Log records
Database
instance
Incoming queue
Storage node
S3 backup
1
2
3
4
5
6
7
8
Update
Queue
ACK
Hot
log
Data
Pages
Continuous backup
GC
Scrub
Coalesce
Sort
Group
Peer-to-peer gossipPeer
storage
nodes
Note:
• All steps are asynchronous
• Only steps 1 and 2 are in the foreground latency
path
17. Asynchronous Group Commits
Read
Write
Commit
Read
Read
T1
Commit (T1)
Commit (T2)
Commit (T3)
LSN 10
LSN 12
LSN 22
LSN 50
LSN 30
LSN 34
LSN 41
LSN 47
LSN 20
LSN 49
Commit (T4)
Commit (T5)
Commit (T6)
Commit (T7)
Commit (T8)
LSN GROWTH
Durable LSN at head-node
COMMIT QUEUE
Pending commits in LSN order
TIME
GROUP
COMMIT
TRANSACTIONS
Read
Write
Commit
Read
Read
T1
Read
Write
Commit
Read
Read
Tn
TRADITIONAL APPROACH AMAZON AURORA
Maintain a buffer of log records to write out to disk
Issue write when buffer full or time out waiting for writes
First writer has latency penalty when write rate is low
Request I/O with first write, fill buffer till write picked up
Individual write durable when 4 of 6 storage nodes ACK
Advance DB Durable point up to earliest pending ACK
18. Re-entrant connections multiplexed to active threads
Kernel-space epoll() inserts into latch-free event queue
Dynamically size threads pool
Gracefully handles 5000+ concurrent client sessions on r3.8xl
Standard MySQL – one thread per connection
Doesn’t scale with connection count
MySQL EE – connections assigned to thread group
Requires careful stall threshold tuning
CLIENTCONNECTION
CLIENTCONNECTION
LATCH FREE
TASK QUEUE
epoll()
MYSQL THREAD MODEL AURORA THREAD MODEL
Adaptive Thread Pool
19. Scan
Delete
Same locking semantics as MySQL
Concurrent access to lock chains
Multiple scanners allowed in an individual lock chains
Lock-free deadlock detection
Scan
Delete
Insert
Scan Scan
Insert
Delete
Scan
Insert
Insert
MySQL lock manager Aurora lock manager
Needed to support many concurrent sessions, high update throughput
Aurora Lock Management
20. Aurora storage has thousands of CPUs
Presents opportunity to push down and
parallelize query processing using the storage
fleet
Moving processing close to data reduces network
traffic and latency
However, there are significant challenges
Data stored in storage node is not range
partitioned – require full scans
Data may be in-flight
Read views may not allow viewing most recent
data
Not all functions can be pushed down to storage
nodes
DATABASE NODE
STORAGE NODES
PUSH DOWN
PREDICATES
AGGREGATE
RESULTS
Parallel Query Processing
21. C I – Continuous I*
Innovation
Improvement
• Collaborate with other AWS and EC2 Teams
• Operations Performance – Fleet Performance
• Product Performance Improvement
23. Simplified Storage Management
Continuous backup
Automatic storage scaling
Fast Database Cloning – COW pages
Backtrack – without restore from backups
64TB+ of storage – auto-incremented in 10GB units
up to 64 TB
Production database
Clone Clone
Clone
Dev/test
applications
Benchmarks
Production
applications
Production
applications
24. Simplified DB Node Management - Aurora Serverless
Starts up on demand, shuts
down when not in use
Scales up/down automatically
No application impact when
scaling
Pay per second, 1 minute
minimum
WARM POOL
OF INSTANCES
APPLICATION
DATABASE STORAGE
SCALABLE DB CAPACITY
REQUEST ROUTERS
26. Global replication –.
Faster disaster recovery and enhanced data
locality
Promote read-replica to a
master for faster recovery in
the event of disaster
Bring data close to your
customer’s applications in
different regions
Promote to a master for easy
migration
27. Primary region Secondary region
1
ASYNC 4/6 QUORUM
Continuous
backup
AZ 1
Primary
Instance
Amazon
S3
AZ 2
Replica
Instance
AZ 3
Replica
Instance
Replicatio
n Server
Replication Fleet
Storage Fleet
11
4
AZ 1
Replica
Instance
AZ 2 AZ 3
ASYNC 4/6 QUORUM
Continuous
backup
Amazon
S3
Replica
Instance
Replica
Instance
Replicatio
n Agent
Replication Fleet
Storage Fleet
3
3
2
① Primary instance sends log records in parallel to storage nodes,
replica instances and replication server
② Replication server streams log records to Replication Agent in
secondary region
③ Replication agent sends log records in parallel to storage nodes,
and replica instances
④ Replication server pulls log records from storage nodes to catch up
after outages
High throughput: Up to 150K writes/sec –
negligible performance impact
Low replica lag: < 1 sec cross-region replica
lag under heavy load
Fast recovery: < 1 min to accept full read-write
workloads after region failure
Global Physical Replication
30. What it takes to be in Aurora
• Background in Computer science, Engineering
• Experience as Developers, Software Dev Managers, and Product Managers
(tech)
• Development background in database, storage, networking, operating
systems, or system development
• Programming Languages:C++ (Engine, Storage); Java/Python/Scripting
(Control Plane, Infrastructure)
• Managers – Communication /Writing – Clear & Concise
• Interview Process – indexed on both technical strengths & Leadership
Principles
31. Customer Obsession
Leaders start with the customer and work backwards.
Ownership
Leaders are owners.They think long term and don’t
sacrifice long-term value for short-term results.
Invent and Simplify
Leaders expect and require innovation and invention from
their teams and always find ways to simplify.
Are Right, A Lot
Leaders are right a lot.
Learn and Be Curious
Leaders are never done learning and always seek to
improve themselves.
Hire and Develop the Best
Leaders raise the performance bar with every hire and
promotion.
Insist on the Highest Standards
Leaders have relentlessly high standards — many people
may think these standards are unreasonably high.
Think Big
Thinking small is a self-fulfilling prophecy.
Bias for Action
Speed matters in business.
Frugality
Accomplish more with less.
EarnTrust
Leaders listen attentively, speak candidly, and treat others
respectfully.
Dive Deep
Leaders operate at all levels, stay connected to the details, audit
frequently, and are skeptical when metrics and anecdote differ.
Have Backbone; Disagree and Commit
Leaders are obligated to respectfully challenge decisions when they
disagree, even when doing so is uncomfortable or exhausting.
Deliver Results
Leaders focus on the key inputs for their business and deliver them
with the right quality and in a timely fashion.
Amazon Leadership Principles
32. Leveraging cloud ecosystem
Lambda
S3
IAM
Invoke Lambda events from stored procedures/triggers.
Load data from S3, store snapshots and backups in S3.
Use IAM roles to manage database access control.
Upload systems metrics and audit logs to CloudWatch.
33. Automate administrative tasks
Schema design
Query
construction
Query
optimization
Automatic fail-over
Backup & recovery
Isolation & security
Industry compliance
Push-button scaling
Automated patching
Advanced monitoring
Routine maintenance
Takes care of time-consuming database management tasks, freeing customers to
focus on their applications and business
Customer
AWS
35. Who are moving to Aurora and why?
Customers using
commercial engines
Customers using
open source engines
Higher performance – up to 5x
Better availability and durability
Reduces cost – up to 60%
Easy migration; no application change
One tenth of the cost; no licenses
Integration with cloud ecosystem
Comparable performance and availability
Migration tooling and services
37. Why work @ Aurora
Customer Impact
Technical Skills
Development
Personal
Interests
38. A day in my life @ Aurora
Support – Chat, Email, Calls
Collaboration with GlobalTeams
Impact customer businesses
Use a variety of Cloud services
Design Reviews
Live Deployments
Knowledge Sharing (Aurora, AWS, Amazon)
People (1x1s, promotion, career development, hiring, etc)
Fun @ Work
39. Career Growth and Development @ Aurora
• You make the career plan that works best forYOU
• Reviews focus on your superpowers and growth
opportunity
• Goal Setting:We set challenging business and personal
goals and are encouraged to achieve (and exceed them)
• Feedback designed for continuous growth
• Countless opportunities within Aurora, AWS and Amazon
40. “You don’t choose your passions, your passions choose
you. All of us are gifted with certain passions, and the
people who are lucky are the ones who get to follow those
things.”
– Jeff Bezos