Scalable Relational Databases with Amazon Aurora. Madrid Summit 2019

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Bases de datos relacionales escalables con
Amazon Aurora
Javier Ramirez
@supercoco9
Tech Evangelist
AWS
• C 1
Ignasi Nogués
@inogues
CEO
Clickedu

Amazon Aurora: Enterprise database at open source price
Fastest growing service in AWS history. Aurora is used by ¾ of the top 100 AWS customers

Which type of
phone are you
carrying today?

If you got over your phone from the
2000s…
why are you still using databases that
follow the design patterns of the 80s?

Local
Storage
SQL
Transactions
Caching
Logging
Compute
Traditional Database Architecture
• Monolithic stack in a Single box
• Large blast radius

Quick recap: Database internals contd…
• Recovery
Tc
Tf
Tx1
Tx2
Tx3
Tx4
checkpoint system failure
 Tx1 can be ignored as checkpoint has been taken after its commit
 Tx2 and Tx3 redone using REDO procedure
 Tx4 is undone using REDO/UNDO procedure

Traditional Distributed Database stack
Storage
Application Application Application
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
Storage StorageStorage Storage
 Same Monolithic stack  Distributed consensus
algorithms perform poorly

Aurora: Scale-out, Distributed architecture
• Push Log applicator to Storage
Master Replica Replica Replica
Master
Shared storage volume
Replica Replica
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
AZ1 AZ2 AZ3
 Write performance
 Read scale out
 AZ + 1 failure tolerance
• 4/6 Write Quorum & Local tracking

Read replica and custom endpoint
Master
Read
replica
Read
replica
Read
replica
Shared distributed storage volume
Reader endpoint #1 Reader endpoint #2
Up to 15 promotable read replicas across multiple availability zones
Re-do log based replication leads to low replica lag – typically < 10ms
Custom reader end-point with configurable failover order

• Partition volume into 𝑛 fixed size segments
• Replicate each segment 6-ways into a Protection Group (PG)
• A single PG failing is enough to fail the entire volume
• Trade-off between likelihood of faults and time to repair
• If segments are too small then failures are more likely
• If segments are too big then repairs take too long
• Choose the biggest size that lets us repair “fast enough”
• We currently picked a segment size of 10GB as we can repair a 10GB
segment in ~10 seconds on a 10Gbps link
Segmented storage

Continuous backup
• Take periodic snapshot of each segment in parallel; stream the redo logs to Amazon S3
• Backup happens continuously without performance or availability impact
Segment snapshot Log records
Recovery point
Segment 1
Segment 2
Segment 3
Time

Instant crash redo recovery
Traditional database
• Replay logs since the last checkpoint
• Slow replay in the single thread
Amazon Aurora
• At restore, retrieve the appropriate segment
snapshots and log streams to storage nodes
Checkpointed Data Redo Log
Crash at T0 requires
a re-application of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo logs being
applied to each segment on demand, in
parallel, asynchronously

Database backtrack
• Backtrack brings the database to a point in time without requiring restore from backups
• Backtracking from an unintentional DML or DDL operation
• Backtrack is not destructive. You can backtrack multiple times to find the right point in time
t0 t1 t2
t0 t1
t2
t3 t4
t3
t4
Rewind to t1
Rewind to t3
Invisible Invisible

Fast Database Cloning
• Create a copy of a database without duplicate storage
costs
• Creation of a clone is instantaneous since it doesn’t
require deep copy
• Data copy happens only on write – when original
and cloned volume data differ
– Typical use cases:
• Clone a production DB to run tests
• Reorganize a database
• Save a point in time snapshot for analysis without
impacting production system Production database
Clone Clone
Clone
Dev/test
applications
Benchmarks
Production
applications
Production
applications

Page
1
Page
2
Page
3
Page
4
Page
5
Source database Cloned database
As databases diverge, new pages are added appropriately to each database
while still referencing pages common to both databases
Page
1
Page
2
Page
3
Page
4
Page
5
Page
6
Page
1
Page
3
Page
5
Page
3
Page
5
Protection group 1
Page
2
Page
4
Page
6
Page
2
Protection group 2
Shared distributed storage system
Database Cloning: How does it work?

Write and read throughput
Aurora MySQL is 5x faster than MySQL
0
50,000
100,000
150,000
200,000
250,000
MySQL 5.6 MySQL 5.7 MySQL 8.0
Aurora 5.6 Aurora 5.7
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
MySQL 5.6 MySQL 5.7 MySQL 8.0
Aurora 5.6 Aurora 5.7
Write throughput Read throughput
Using Sysbench with 250 tables and 200,000 rows per table on R4.16XL

Copy In
Copy In
Vacuum
Vacuum
Index Build
Index Build
0 500 1000 1500 2000 2500 3000 3500 4000
PostgreSQL
Amazon Aurora
Runtime (seconds)
pgbench initialization, scale 10000 (150 GiB)
86% reduction in vacuum time
Bulk data load performance
Aurora PostgreSQL loads data 2x faster than PostgreSQL

Bulk data load performance
Aurora MySQL loads data 2.5x faster than MySQL
Data loading
Data loading
Index build
Index build
0 100 200 300 400 500 600 700 800
MySQL
Amazon
Aurora
Runtime (sec.)
10 Sysbench Tables, 10MM rows per each

0.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Responsetime,ms
SYSBENCH RESPONSE TIME (p95), 30 GiB, 1024 CLIENTS
PostgreSQL (Single AZ, No Backup) Amazon Aurora (Three AZs, Continuous Backup)
Performance variability under load
Aurora PostgreSQL is ~10x more consistent than PostgreSQL

Sysbench write-only workload with 250 tables and 200,000 initial rows per table
0
500
1,000
1,500
2,000
2,500
0 100 200 300 400 500 600
Time from start of run (sec.)
Write response time (ms.) Amazon Aurora MySQL
Performance variability under load
Aurora MySQLis ~25x moreconsistent than MySQL

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
0
50
100
150
200
250
2015 2016 2017 2018
Max write throughput – up 100%
0
100
200
300
400
500
600
700
800
2015 2016 2017 2018
Max read throughput – up 42%
Launched with R3.8xl
32 cores, 256-GB memory
Now support R4.16xl
R5.24xl coming soon
Besides many performance optimizations, we are also upgrading HW platform
Performance improvement over time
Aurora MySQL –2015-2018

How did we achieve this?
• Do less work
• Do fewer I/Os
• Minimize network packets
• Cache prior results
• Offload the database engine
• Be more efficient
• Process asynchronously
• Reduce latency path
• Use lock-free data structures
• Batch operations together
• Databases are all about I/O
• Network-attached storage is all about packets/second
• High-throughput processing is all about context switches

BINLOG DATA DOUBLE-WRITELOG FRM FILES
TYPE OF WRITE
MYSQL WITH REPLICA
EBS mirrorEBS mirror
AZ 1 AZ 2
EBS
Amazon Elastic
Block Store (EBS)
Primary
Instance
Replica
Instance
1
2
3
4
5
AZ 1 AZ 3
Primary
Instance
AZ 2
Replica
Instance
ASYNC
4/6 QUORUM
DISTRIBUTED
WRITES
Replica
Instance
Amazon S3
AMAZON AURORA
0.78MM transactions
7.4 I/Os per transaction
MySQL I/O profile for 30 min Sysbench run
27MM transactions 35X MORE
0.95 I/Os per transaction 7.7X LESS
Aurora IO profile for 30 min Sysbench run
MySQL vs. Aurora I/O profile
Amazon S3

Driving down query latency
Asynchronous Key
Prefetch
Hash joinsBatched scans

Performance Improvement
TPC –C benchmark
0x
2x
4x
6x
8x
10x
12x
14x
16x
18x
20x
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10Q11Q12Q13Q14Q15Q16Q17Q18Q19Q20Q21Q22
Query response time reduction
 Peak speed up ~18x
 >2x speedup: 10 of 22 queries

Intelligent Vacuum prefetch
PostgreSQL
Aurora PostgreSQL
Submit
Batch I/O
up to
256 blocks
402 seconds
163 seconds

Parallel query processing
• Aurora storage has thousands of CPUs
• Opportunity to push down and parallelize query
processing
• Moving processing close to data reduces network
traffic and latency
• However, there are significant challenges
• Data is not range partitioned – require full scans
• Data may be in-flight
• Read views may not allow viewing most recent data
• Not all functions can be pushed down
Database Node
Storage nodes
Push down
predicates
Aggregate
results

Database node processing
• Query Optimizer produces PQ Plan and creates PQ
context based on leaf page discovery.
• PQ request is sent to storage node along with PQ
context.
• Storage node produces:
• Partial results streams with processed stable rows.
• Raw stream of unprocessed rows with pending undos.
• Head node aggregates these data streams to produce
final results. STORAGE NODES
OPTIMIZER
EXECUTOR
INNODB
NETWORK STORAGE DRIVER
AGGREGATOR
APPLICATION
PARTIAL
RESULTS
STREAM
RESULTS
IN-FLIGHT
DATA
PQ CONTEXT
PQ PLAN

Amazon Aurora PQ summary
• Performance
• 120x lower latencies on TPCH-like benchmarks with Improved I/O performance and reduced CPU usage on the head node.
• High Concurrency
• Run both OLTP and light OLAP workloads simultaneously and efficiently.
• Cost Effective
• PQ comes at no extra cost. Can run on your live data. Potentially reduced effort and data duplication in your ETL pipeline.
• Quiet Tenant
• Reduced chance of evicting frequently used pages from the buffer pool that are used by OLTP workload.
• Ecosystem
• Get Aurora goodies such as PiTR, Continuous backup, Fast Cloning with PQ.

Parallel Query - Performance results

TPC-C benchmark
0x
20x
40x
60x
80x
100x
120x
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10Q11Q12Q13Q14Q15Q16Q17Q18Q19Q20Q21Q22
Query response time reduction
 Peak speed up ~120x
 >10x speedup: 8 of 22 queries
We were able to test Aurora’s parallel query feature and the performance gains
were very good. To be specific, we were able to reduce the instance type from
r3.8xlarge to r3.2xlarge. For this use case, parallel query was a great win for us.
Jyoti Shandil, Cloud Data Architect
Performance results

Global replication: Logical
Faster disaster recovery and enhanced data locality
 Promote read-replica to a master
for faster recovery
in the event of disaster
 Bring data close to your customer’s
applications
in different regions
 Promote to a master
for easy migration

Global physical replication
• Primary region • Secondary region
1
ASYNC 4/6 QUORUM
Continuous
backup
AZ 1
Primary
Instance
Amazon
S3
AZ 2
Replica
Instance
AZ 3
Replica
Instance
Replication
Server
Replication Fleet
Storage Fleet
11
4
AZ 1
Replica
Instance
AZ 2 AZ 3
ASYNC 4/6 QUORUM
Continuous
backup
Amazon
S3
Replica
Instance
Replica
Instance
Replication
Agent
Replication Fleet
Storage Fleet
3
3
2
① Primary instance sends log records in parallel to storage nodes, replica
instances and replication server
② Replication server streams log records to Replication Agent in secondary
region
③ Replication agent sends log records in parallel to storage nodes, and replica
instances
④ Replication server pulls log records from storage nodes to catch up after
outages
High throughput: Up to 150K writes/sec – negligible
performance impact
Low replica lag: < 1 sec cross-region replica lag under heavy
load
Fast recovery: < 1 min to accept full read-write workloads after
region failure

Global replication performance
Logical replication Physical replication
0
100
200
300
400
500
600
0
50,000
100,000
150,000
200,000
250,000
seconds
QPS
QPS Lag
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
0
50,000
100,000
150,000
200,000
250,000
seconds
QPS
QPS Lag
Logical vs. physical replication

Master
Replica
Orange Master Blue Master
SQL
Transactions
Caching
SQL
Transactions
Caching
Aurora Multi-Master Architecture
Shared Storage Volume
 No Pessimistic Locking
 No Global Ordering
 No Global Commit-Coordination
Replica
• Membership
• Heartbeat
• Replication
• Metadata
Cluster Services
1
1
1
1 1
1
2
2
2
2 2
2
3 3
3 3
3 3
1 3?
T1 T2
AZ1
AZ2
AZ3
Decoupled
Decoupled
Decoupled
 Decoupled System
 Microservices Architecture
2
 Optimistic Conflict Resolution

Scaling write workload
Optimistic conflict management
There are many “oases” of
consistency in Aurora
Database nodes know transaction
orders from that node
Storage nodes know transactions
orders applied at that node
Only have conflicts when data
changed at both multiple database
nodes AND multiple storage nodes
Much less coordination required
Near linear throughput scaling for
workloads with no or low conflict
Lower commit latency for workloads
with low conflict

Aurora Multi-Master: Scaling and availability
0
10000
20000
30000
40000
50000
60000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101103105107109111113115117119121123125
AggregatedThroughput
Time in minutes
Sysbench workload on 4 R4.XL nodes
Adding a node Adding a node Node going down Node recovering

Aurora Serverless
 Starts up on demand, shuts down
when not in use
 Scales up/down automatically
 No application impact when scaling
 Pay per second, 1 minute minimum
WARM POOL
OF INSTANCES
APPLICATION
AURORA STORAGE
AURORA
REQUEST ROUTER
DATABASE END-POINT
AURORA STORAGE

Scale up and down with load
1
2
4
8
16
32
64
128
0
500
1000
1500
2000
2500
3000
1
12
23
34
45
56
67
78
89
100
111
122
133
144
155
166
177
188
199
210
221
232
243
254
265
276
287
298
309
320
331
342
353
364
375
386
397
408
419
430
441
452
463
474
485
496
507
518
529
540
551
562
573
584
595
606
617
628
639
650
661
672
683
694
705
716
727
TPS ACU

Simpler experience: Less to worry about
• No CPU credits to monitor
• No commitment to particular availability zone
• No migrations between instance type generations
• No DB instance reservations to manage
• No instances to manage
• Encryption at REST is always enabled
• No need to manually suspend and resume database
• No DNS propagation delays
• No maintenance window
• No old database versions to upgrade

Clickedu + AWS: Better together

Learning and
administration
activities happen in
and outside
schools. Data needs
to be accessible from
anywhere, on any
device.
Education trends

School challenges
• Hard to access to the right
information for each user
• Curriculum, operations and data
related to performance is
unorganized or kept in separate
systems
• Teachers, students, parents and
administrators need access
wherever they are
• Need to connect data from all
parts of the school ecosystem
Not enough time Hard to find data Need mobile access

CLASSROOM ADMINISTRATION ACADEMIC COMMUNICATION
QUALITY TOOLS 100% CLOUD EASY LMS AND CONNECTABLE EDUCATIVE CONTENTS
The MIS for K-12: an educative E.R.P.

functionality
Administration Controls
• Reception Notices
• Room Booking
• Inventory Management
• Staff attendance and cover
module
• Documentation
Management
• Grades
• Payments
• EFQM & ISO quality
Services
• Extracurricular activities
• Transport and catering
• Special education
• Flipped classroom
• Individual timetables
• Student documentation
• Pupil Census
• General reporting
• Workforce census
Communication
• SMS & Email
• Centralized Communication
• Internal Messenger
• Contacts & Calendars
• Integrate with Google Mail
and Microsoft 365
• Parent Portal
• Behaviour Management
• Survey, newsletter, website
creation
LMS & Assessments
• Complete student profiles
• Collaborative Learning
• Recommendations
• Forums
• Projects
• Submission Box
• Self-Assessment Tests
• Rubrics
• Grades and assessment
• Official grade
documentation

Why AWS?
• Private cloud never was enough but most of the
time was so much
• Problems of free space on File System
• Huge use of data, no scalable database systems
• There were frequent errors doing things “by
hand”
• Security
• Complaints because we had to stop our servers
• It was difficult to test on real time
• No global servers

Global – AWS helps on going around the
planet

“Unlimited” &
elastic

Advantages
• Use of CloudWatch
• Alarm System with CloudWatch
• SES use for emails
• S3 with versioning and separate files from
code
• Environments to test new versions are easy
to create

Other uses of AWS
•AI with the use of
Sagemaker
•Integrate voice
systems (LEX; Polly)
•Integrate Alexa &
Chime

RDS Advantage
• 45 days of RDS snapshots
• Autoscaling of RDS with Aurora
• No stops on the new versions
• Data Encryption & access control

RDS evolution
PRIVATE CLOUD MYSQL
4 Cluster MASTER-SLAVE (1-1)
db.r3.2xlarge
AWS - AURORA
MASTER + 4 SLAVE
db.r3.2xlarge
PRIVATE CLOUD MYSQL
2 Clusters MASTER-SLAVE (1-1)
db.r3.2xlarge
AWS MYSQL
4 Clusters MASTER-SLAVE
db.r3.xlarge
AWS- AURORA
MASTER + 2 SLAVE IN
AUTOSCALING MODEL
db.r3.2xlarge
2013 2014 2015 2017 2019
PRIVATE CLOUD MYSQL
4 Cluster M-S (1-2)
db.r3.2xlarge + db.r3.xlarge

Statistics (by minutes), one MySQL in 2014

Migration
• Lots of Planning
• Partner help
• We connect an AWS RDS
as a slave to the previous
private cloud
• Low downtime, rather
because of the DNS than
to the switch from SLAVE
to MASTER

Aurora Migration
• Lots of Planning
• Well advised
• Hard work reviewing the platform to be sure
all was compatible but 0 issues found.
• All process tested before
• First cluster so easy to migrate (automated
with AWS service, less than 1 hour)
• Hard to find a screen to do the change for the
other 3 clusters:
– Downtime expected of 16 hours
– Had to be summer on Spain/Andorra/UK; we found
also a suitable week for LATAM (Chile & Colombia)
• Transfer from other clusters made with dumps
to be sure all was right.
– Problems with rights and users
• Finally the transfer was 8 hours for cluster

250
320
400
500
600
700
280
358
448
560
672
1047
308
394
493
616
739
1243
0
200
400
600
800
1,000
1,200
1,400
PRI PRI PRI AWS OD AWS RESERVA AURORA SC
2013 2014 2015 2016 2017 2019
Plattform evolution
NÚMERO BASES DATOS ESPACIO (en TB) IOPS (en millones)
Data in time

Annual cost €51,260.64
€73,781.28
€85,041.60
€101,094.06
€57,423.70
€44,410.65
€38,645.35
€-
€20,000.00
€40,000.00
€60,000.00
€80,000.00
€100,000.00
€120,000.00
1
2013 PRI 2014 PRI 2015 PRI 2016 AWS OD 2017 AWS RESERVA 2017 AURORA 2019 AURORA SC

Performance
Key idea: writer with reasonable CPU
Key idea: peak connections

Performance (2)
Key idea: autoscaling

Performance (3)
Key idea: replica MS with really low latency

Thanks!!!!
Ignasi Nogués
inogues@clickedu.net
@inogues

Amazon Aurora migration options
Source database From where Recommended option
RDS
EC2, on premises
EC2, on premises, RDS
Console-based automated
snapshot ingestion and catch up
via binlog replication.
Binary snapshot ingestion
through S3 and catch up via
binlog replication.
Schema conversion using SCT
and data migration via DMS.

Publications
• Amazon Aurora: Design Considerations for High Throughput Cloud-Native
Relational Databases. In SIGMOD 2017
• Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and
Membership Changes. In SIGMOD 2018
• https://aws.amazon.com/blogs/database/amazon-aurora-under-the-hood-quorum-
and-correlated-failure/

Related breakouts
• How to choose the right database
– David Sanz Gil
• Migrating business critical applications to AWS
– Jose Maria Fuster Millan

Scalable Relational Databases with Amazon Aurora. Madrid Summit 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scalable Relational Databases with Amazon Aurora. Madrid Summit 2019

Similar to Scalable Relational Databases with Amazon Aurora. Madrid Summit 2019 (20)

More from javier ramirez

More from javier ramirez (20)

Recently uploaded

Recently uploaded (20)

Scalable Relational Databases with Amazon Aurora. Madrid Summit 2019

Editor's Notes