Amazon Aurora는 어떻게 다른가?
김일호 | Solutions Architect
MySQL-compatible relational database
Performance and availability of
commercial databases
Simplicity and cost-effectiveness of
open source databases
What is Amazon Aurora?
§ 4	client	machines	with	1,000	connections	 each
WRITE PERFORMANCE READ PERFORMANCE
Single	client	machine	with	1,600	connections
MySQL SysBench
R3.8XL with 32 cores and 244 GB RAM
SQL benchmark results
Reproducing these results
h t t p s : / / d 0. a ws s t a t i c. c o m / p r o d u c t - m a rk e t i n g/ A u r o r a / R DS _ A u r o r a _ Pe r f o r m a n c e _A s s es s me n t _ B e n c h m a r k i ng _v 1 - 2. p d f
AMAZON	
AURORA	
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
• Create	an	Amazon	VPC	(or	use	an	existing	one).	
• Create	four	EC2	R3.8XL	client	instances	to	run	the	SysBench	
client.	All	four	should	be	in	the	same	AZ.	
• Enable	enhanced	networking	on	your	clients.
• Tune	your	Linux	settings	(see	whitepaper).
• Install	Sysbench	version	0.5.
• Launch	a	r3.8xlarge	Amazon	Aurora	DB	instance	in	the	sam
e	VPC	and	AZ	as	your	clients.
• Start	your	benchmark!
1
2
3
4
5
6
7
What makes Aurora fast.
Do fewer IOs
Minimize network packets
Cache prior results
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
How do we achieve these results?
DATABASESAREALLABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
IO traffic in RDS MySQL
BINLOG DATA DOUBLE-WRITELOG FRM	FILES
T YPE 	 O F 	WRI T E
MYSQL WITH STANDBY
Issue	write	to	EBS	– EBS	issues	to	mirror,	ack when	both	done
Stage	write	to	standby	instance	using	DRBD
Issue	write	to	EBS	on	standby	instance
IO FLOW
Steps	1,	3,	5	are	sequential	and	synchronous
This	amplifies	both	latency	and	jitter
Many	types	of	writes	for	each	user	operation
Have	to	write	data	blocks	twice	to	avoid	torn	writes
OBSERVATIONS
780K	transactions
7,388K	I/Os per	million	txns (excludes	mirroring,	standby)
Average	7.4	I/Os per	transaction
PERFORMANCE
30	minute	SysBench	write-only	workload,	100	GB	data	set,	RDS	SingleAZ,	30K	PIOPS
EBS	mirrorEBS	mirror
AZ	1 AZ	2
Amazon S3
EBS
Amazon	Elastic	Bloc
k	Store	(EBS)
Primary
instance
Standby
instance
1
2
3
4
5
IO traffic in Aurora (database)
AZ	1 AZ	3
Primary
instance
Amazon S3
AZ	2
Replica
instance
AMAZON AURORA
ASYNC
4/6	QUORUM
DISTRIBUTED	WRITE
S
BINLOG DATA DOUBLE-WRITELOG FRM	FILES
T YPE 	 O F 	WRI T E
30	minute	SysBench	write-only	workload,	100	GB	data	set
IO FLOW
Only	write	redo	log	records;	all	steps	asynchronous
No	data	block	writes	(checkpoint,	cache	replacement)
6X more log	writes,	but	9X lessnetwork	traffic
Tolerant	of	network	and	storage	outlier	latency
OBSERVATIONS
27,378K	transactions	 35X MORE
950K	I/Os per	1M	txns (6X	amplification) 7.7X LESS
PERFORMANCE
Boxcar	redo	log	records	– fully	ordered	by	LSN
Shuffle	to	appropriate	segments	– partially	ordered
Boxcar	to	storage	nodes	and	issue	writes
IO traffic in Aurora (storage node)
LOG	RECORDS
Primary	in
stance
INCOMING	QUEUE
STORAGE	NODE
S3	BACKUP
1
2
3
4
5
6
7
8
UPDATE	
QUEUE
ACK
HOT
LOG
DATA
BLOCKS
POINT	IN	TIME
SNAPSHOT
GC
SCRUB
COALESCE
SORT
GROUP
PEER-TO-PEER	GOSSIPPeer
storage
nodes
All	steps	are	asynchronous
Only	steps	1	and	2	are	in	foreground	latency	path
Input	queue	is	46X lessthan	MySQL	(unamplified,	per	node)
Favor	latency-sensitive	operations
Use	disk	space	to	buffer	against	spikes	in	activity
OBSERVATIONS
IO FLOW
① Receive	record	and	add	to	in-memory	queue
② Persist	record	and	ACK	
③ Organize	records	and	identify	gaps	in	log
④ Gossip	with	peers	to	fill	in	holes
⑤ Coalesce	log	records	into	new	data	block	versions
⑥ Periodically	stage	log	and	new	block	versions	to	S3
⑦ Periodically	garbage	collect	old	versions
⑧ Periodically	validate	CRC	codes	on	blocks
Asynchronous group commits
Read
Write
Commit
Read
Read
T1
Commit	(T1 )
Commit	(T2 )
Commit	 (T3)
LSN	1 0
LSN	1 2
LSN	22
LSN	5 0
LSN	30	
LSN	34
LSN	41
LSN	47
LSN	20
LSN	49
Commit	 (T4)
Commit	 (T5)
Commit	 (T6)
Commit	 (T7)
Commit	(T8 )
LSN	GROWTH
Durable	LSN	at	head-node	
COMMIT	QUEUE
Pending	commits	in	LSN	order
TIME
GROUP
COMMIT
TRANSACTIONS
Read
Write
Commit
Read
Read
T1
Read
Write
Commit
Read
Read
Tn
§ TRADITIONAL APPROACH AMAZON AURORA
Maintain a buffer of log records to write out to disk
Issue write when buffer full or time out waiting for writes
First writer has latency penalty when write rate is low
Request I/O with first write, fill buffer till write picked up
Individual write durable when 4 of 6 storage nodes ACK
Advance DB durable point up to earliest pending ACK
§ Re-entrant connections multiplexed to active threads
§ Kernel-space epoll() inserts into latch-free event queue
§ Dynamically size threads pool
§ Gracefully handles 5,000+ concurrent client sessions on r3.8xl
Standard MySQL – one thread per connection
Doesn’t scale with connection count
MySQL EE – connections assigned to thread group
Requires careful stall threshold tuning
CLIENT	CONNECTION
CLIENT	CONNECTION
LATCH	FREE
TASK	QUEUE
epoll()
MYSQL THREAD MODEL AURORA THREAD MODEL
Adaptive thread pool
IO traffic in Aurora (read replica)
PAGE	CACHE
UPDATE
Aurora	master
30%	Read
70%	Write
Aurora	replica
100%	New	Reads
Shared	Multi-AZ	Storage
MySQL	master
30%	Read
70%	Write
MySQL	replica
30%	New	Reads
70%	Write
SINGLE-THREADED
BINLOG	APPLY
Data	volume Data	volume
§ Logical: Ship	SQL	statements	to	replica.
§ Write	workload	similar	on	both	instances.
§ Independent	storage.
§ Can	result	in	data	drift	between	master	and	
replica.
Physical: Ship	redo	from	master	to	replica.
Replica	shares	storage.	No	writes	performed.
Cached	pages	have	redo	applied.
Advance	read	view	when	all	commits	seen.
MYSQL READ SCALING AMAZON AURORA READ SCALING
Improvements over the past few months
Write	batch	size	tuning	
Asynchronous	send	for	read/write	I/Os
Purge	thread	performance
Bulk insert performance
BATCH OPERATIONS
Failover	time	reductions
Malloc reduction
System	call	reductions
Undo	slot	caching	patterns
Cooperative log apply
OTHER
Binlog and	distributed	transactions
Lock compression
Read-ahead
CUSTOMER FEEDBACK
Hot	row	contention
Dictionary	statistics
Mini-transaction	commit	code	path
Query	cache	read/write	conflicts
Dictionary system mutex
LOCK CONTENTION
What makes Aurora highly available.
Storage node availability
§ Quorum system for read/write; latency tolerant
§ Peer-to-peer gossip replication to fill in holes
§ Continuous backup to S3 (designed for 11 9s
durability)
§ Continuous scrubbing of data blocks
§ Continuous monitoring of nodes and disks for
repair
§ 10 GB segments as unit of repair or hotspot
rebalance to quickly rebalance load
§ Quorum membership changes do not stall writes
AZ	1 AZ	2 AZ	3
Amazon S3
Traditional databases
§ Have to replay logs since the last
checkpoint
§ Typically 5 minutes between checkpoints
§ Single-threaded in MySQL; requires a
large number of disk accesses
Amazon Aurora
§ Underlying storage replays redo records
on demand as part of a disk read
§ Parallel, distributed, asynchronous
§ No replay for startup
Checkpointed data Redo	log
Crash	at	T0 requires
a	reapplication	of	the
SQL	in	the	redo	log	since
last	checkpoint
T0 T0
Crash	at	T0 will	result	in	redo	logs	being	applied	t
o	each	segment	on	demand,	in	parallel,	asynchro
nously
Instant crash recovery
Survivable caches
§ We moved the cache out of the
database process
§ Cache remains warm in the event of
a database restart
§ Lets you resume fully loaded
operations much faster
§ Instant crash recovery + survivable
cache = quick and easy recovery
from DB failures
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
Caching	process	is	outside	the	DB	process	
and	remains	warm	across	a	database	restart
Faster, more predictable failover
App
runningFailure	detection DNS	propagation
Recovery Recovery
DB
failure
MYSQL
App
running
Failure	detection DNS	propagation
Recovery
DB
failure
AURORA	WITH	MARIADB	DRIVER
1 5 - 2 0 	 s e c
3 - 2 0 	 s e c
Amazon Aurora Testimonials
§ “Amazon Aurora was able to satisfy all of our scale
requirements with no degradation in performance. With
Alfresco on Amazon Aurora we scaled to 1 billion documents
with a throughput of 3 million per hour, which is 10 times
faster than our MySQL environment. It just works!"
- John Newton, Founder and CTO of Alfresco
§ “We ran our compatibility test suites againstAmazon Aurora and
everything just worked.Amazon Aurora paired with Tableau means
data users can take advantage of the 5x throughputAmazon Aurora
provides and deliver faster analytic insights throughouttheir
organizations.We look forward to offering our Amazon Aurora Tableau
connector."
- Dan Jewett, Vice President of Product Managementat Tableau
§ "기존 RDS는 스토리지의 용량과 IOPS의 필요치를 예측해서 설정해야 했습니다. 하지만
Aurora에서는 이를 예측할 필요 없이 필요한 만큼 사용할 수 있습니다.덕분에 비용 절감은
물론이고, I/O 병목에 대한 걱정을 덜 수 있었습니다."
§ "기존 DB에서 Aurora로 migration하는 데에 어려움이 있었습니다.하지만 최근에 나온 AWS
Database Migration Service를 활용하면 좀 더 쉽게 migration을 진행할 수 있으리라
봅니다."
Thank you!

Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming

  • 1.
    Amazon Aurora는 어떻게다른가? 김일호 | Solutions Architect
  • 2.
    MySQL-compatible relational database Performanceand availability of commercial databases Simplicity and cost-effectiveness of open source databases What is Amazon Aurora?
  • 3.
    § 4 client machines with 1,000 connections each WRITEPERFORMANCE READ PERFORMANCE Single client machine with 1,600 connections MySQL SysBench R3.8XL with 32 cores and 244 GB RAM SQL benchmark results
  • 4.
    Reproducing these results ht t p s : / / d 0. a ws s t a t i c. c o m / p r o d u c t - m a rk e t i n g/ A u r o r a / R DS _ A u r o r a _ Pe r f o r m a n c e _A s s es s me n t _ B e n c h m a r k i ng _v 1 - 2. p d f AMAZON AURORA R3.8XLARGE R3.8XLARGE R3.8XLARGE R3.8XLARGE R3.8XLARGE • Create an Amazon VPC (or use an existing one). • Create four EC2 R3.8XL client instances to run the SysBench client. All four should be in the same AZ. • Enable enhanced networking on your clients. • Tune your Linux settings (see whitepaper). • Install Sysbench version 0.5. • Launch a r3.8xlarge Amazon Aurora DB instance in the sam e VPC and AZ as your clients. • Start your benchmark! 1 2 3 4 5 6 7
  • 5.
  • 6.
    Do fewer IOs Minimizenetwork packets Cache prior results Offload the database engine DO LESS WORK Process asynchronously Reduce latency path Use lock-free data structures Batch operations together BE MORE EFFICIENT How do we achieve these results? DATABASESAREALLABOUT I/O NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
  • 7.
    IO traffic inRDS MySQL BINLOG DATA DOUBLE-WRITELOG FRM FILES T YPE O F WRI T E MYSQL WITH STANDBY Issue write to EBS – EBS issues to mirror, ack when both done Stage write to standby instance using DRBD Issue write to EBS on standby instance IO FLOW Steps 1, 3, 5 are sequential and synchronous This amplifies both latency and jitter Many types of writes for each user operation Have to write data blocks twice to avoid torn writes OBSERVATIONS 780K transactions 7,388K I/Os per million txns (excludes mirroring, standby) Average 7.4 I/Os per transaction PERFORMANCE 30 minute SysBench write-only workload, 100 GB data set, RDS SingleAZ, 30K PIOPS EBS mirrorEBS mirror AZ 1 AZ 2 Amazon S3 EBS Amazon Elastic Bloc k Store (EBS) Primary instance Standby instance 1 2 3 4 5
  • 8.
    IO traffic inAurora (database) AZ 1 AZ 3 Primary instance Amazon S3 AZ 2 Replica instance AMAZON AURORA ASYNC 4/6 QUORUM DISTRIBUTED WRITE S BINLOG DATA DOUBLE-WRITELOG FRM FILES T YPE O F WRI T E 30 minute SysBench write-only workload, 100 GB data set IO FLOW Only write redo log records; all steps asynchronous No data block writes (checkpoint, cache replacement) 6X more log writes, but 9X lessnetwork traffic Tolerant of network and storage outlier latency OBSERVATIONS 27,378K transactions 35X MORE 950K I/Os per 1M txns (6X amplification) 7.7X LESS PERFORMANCE Boxcar redo log records – fully ordered by LSN Shuffle to appropriate segments – partially ordered Boxcar to storage nodes and issue writes
  • 9.
    IO traffic inAurora (storage node) LOG RECORDS Primary in stance INCOMING QUEUE STORAGE NODE S3 BACKUP 1 2 3 4 5 6 7 8 UPDATE QUEUE ACK HOT LOG DATA BLOCKS POINT IN TIME SNAPSHOT GC SCRUB COALESCE SORT GROUP PEER-TO-PEER GOSSIPPeer storage nodes All steps are asynchronous Only steps 1 and 2 are in foreground latency path Input queue is 46X lessthan MySQL (unamplified, per node) Favor latency-sensitive operations Use disk space to buffer against spikes in activity OBSERVATIONS IO FLOW ① Receive record and add to in-memory queue ② Persist record and ACK ③ Organize records and identify gaps in log ④ Gossip with peers to fill in holes ⑤ Coalesce log records into new data block versions ⑥ Periodically stage log and new block versions to S3 ⑦ Periodically garbage collect old versions ⑧ Periodically validate CRC codes on blocks
  • 10.
    Asynchronous group commits Read Write Commit Read Read T1 Commit (T1) Commit (T2 ) Commit (T3) LSN 1 0 LSN 1 2 LSN 22 LSN 5 0 LSN 30 LSN 34 LSN 41 LSN 47 LSN 20 LSN 49 Commit (T4) Commit (T5) Commit (T6) Commit (T7) Commit (T8 ) LSN GROWTH Durable LSN at head-node COMMIT QUEUE Pending commits in LSN order TIME GROUP COMMIT TRANSACTIONS Read Write Commit Read Read T1 Read Write Commit Read Read Tn § TRADITIONAL APPROACH AMAZON AURORA Maintain a buffer of log records to write out to disk Issue write when buffer full or time out waiting for writes First writer has latency penalty when write rate is low Request I/O with first write, fill buffer till write picked up Individual write durable when 4 of 6 storage nodes ACK Advance DB durable point up to earliest pending ACK
  • 11.
    § Re-entrant connectionsmultiplexed to active threads § Kernel-space epoll() inserts into latch-free event queue § Dynamically size threads pool § Gracefully handles 5,000+ concurrent client sessions on r3.8xl Standard MySQL – one thread per connection Doesn’t scale with connection count MySQL EE – connections assigned to thread group Requires careful stall threshold tuning CLIENT CONNECTION CLIENT CONNECTION LATCH FREE TASK QUEUE epoll() MYSQL THREAD MODEL AURORA THREAD MODEL Adaptive thread pool
  • 12.
    IO traffic inAurora (read replica) PAGE CACHE UPDATE Aurora master 30% Read 70% Write Aurora replica 100% New Reads Shared Multi-AZ Storage MySQL master 30% Read 70% Write MySQL replica 30% New Reads 70% Write SINGLE-THREADED BINLOG APPLY Data volume Data volume § Logical: Ship SQL statements to replica. § Write workload similar on both instances. § Independent storage. § Can result in data drift between master and replica. Physical: Ship redo from master to replica. Replica shares storage. No writes performed. Cached pages have redo applied. Advance read view when all commits seen. MYSQL READ SCALING AMAZON AURORA READ SCALING
  • 13.
    Improvements over thepast few months Write batch size tuning Asynchronous send for read/write I/Os Purge thread performance Bulk insert performance BATCH OPERATIONS Failover time reductions Malloc reduction System call reductions Undo slot caching patterns Cooperative log apply OTHER Binlog and distributed transactions Lock compression Read-ahead CUSTOMER FEEDBACK Hot row contention Dictionary statistics Mini-transaction commit code path Query cache read/write conflicts Dictionary system mutex LOCK CONTENTION
  • 14.
    What makes Aurorahighly available.
  • 15.
    Storage node availability §Quorum system for read/write; latency tolerant § Peer-to-peer gossip replication to fill in holes § Continuous backup to S3 (designed for 11 9s durability) § Continuous scrubbing of data blocks § Continuous monitoring of nodes and disks for repair § 10 GB segments as unit of repair or hotspot rebalance to quickly rebalance load § Quorum membership changes do not stall writes AZ 1 AZ 2 AZ 3 Amazon S3
  • 16.
    Traditional databases § Haveto replay logs since the last checkpoint § Typically 5 minutes between checkpoints § Single-threaded in MySQL; requires a large number of disk accesses Amazon Aurora § Underlying storage replays redo records on demand as part of a disk read § Parallel, distributed, asynchronous § No replay for startup Checkpointed data Redo log Crash at T0 requires a reapplication of the SQL in the redo log since last checkpoint T0 T0 Crash at T0 will result in redo logs being applied t o each segment on demand, in parallel, asynchro nously Instant crash recovery
  • 17.
    Survivable caches § Wemoved the cache out of the database process § Cache remains warm in the event of a database restart § Lets you resume fully loaded operations much faster § Instant crash recovery + survivable cache = quick and easy recovery from DB failures SQL Transactions Caching SQL Transactions Caching SQL Transactions Caching Caching process is outside the DB process and remains warm across a database restart
  • 18.
    Faster, more predictablefailover App runningFailure detection DNS propagation Recovery Recovery DB failure MYSQL App running Failure detection DNS propagation Recovery DB failure AURORA WITH MARIADB DRIVER 1 5 - 2 0 s e c 3 - 2 0 s e c
  • 19.
  • 20.
    § “Amazon Aurorawas able to satisfy all of our scale requirements with no degradation in performance. With Alfresco on Amazon Aurora we scaled to 1 billion documents with a throughput of 3 million per hour, which is 10 times faster than our MySQL environment. It just works!" - John Newton, Founder and CTO of Alfresco
  • 21.
    § “We ranour compatibility test suites againstAmazon Aurora and everything just worked.Amazon Aurora paired with Tableau means data users can take advantage of the 5x throughputAmazon Aurora provides and deliver faster analytic insights throughouttheir organizations.We look forward to offering our Amazon Aurora Tableau connector." - Dan Jewett, Vice President of Product Managementat Tableau
  • 22.
    § "기존 RDS는스토리지의 용량과 IOPS의 필요치를 예측해서 설정해야 했습니다. 하지만 Aurora에서는 이를 예측할 필요 없이 필요한 만큼 사용할 수 있습니다.덕분에 비용 절감은 물론이고, I/O 병목에 대한 걱정을 덜 수 있었습니다." § "기존 DB에서 Aurora로 migration하는 데에 어려움이 있었습니다.하지만 최근에 나온 AWS Database Migration Service를 활용하면 좀 더 쉽게 migration을 진행할 수 있으리라 봅니다."
  • 23.