Paper_Scalable database logging for multicores

Scalable Database
Logging for Multicores
Hyungsoo Jung, et al.
Hanyang University
2017, VLDB
1

Index
▪ Motivation
▪ Design
▪ Implementation
▪ Other Issues
2

Motivation: Characteristics of Modern Databases
▪ Modern Databases
▪ Write-ahead Logging (WAL) protocol
▪ ACID properties
▪ Atomicity
▪ Consistency
▪ Isolation
▪ Durability
3

▪ Existing architecture relies on WAL protocol
Motivation: Architectural Issues (1)
4
DRAM
Central log buffer
Flush
HDD or NVM
Synchronous I/O Delay

▪ Existing Central log buffer
5
T1 T2 T3
L1

6
T1 T2 T3
L1
Lock

7
T1 T2 T3
L1
Lock
L2 L3

8
T1 T2 T3
UnLock
L1
L2 L3

9
T1 T2 T3
L1
Lock
L2 L3

10
T1 T2 T3
L1 L2
Lock
L3

11
T1 T2 T3
UnLock
L1 L2
L3

▪ Synchronous I/O delay
12
DRAM
Central log buffer
Flush
HDD or NVM
Synchronous I/O Delay
1. Buffering log.
2. Flush log to storage.
3. Write data.
Thread 1
Transaction A

Summary
Motivation.
▪ Central log buffer limits the scalability of DB logging on multicore.
→ Parallel logging on multicore
Contribution.
▪ ELEDA (Express Logging Ensuring Durable Atomicity)
▪ Fast, scalable logging method for high performance transaction
systems with guaranteed atomicity and durability.
▪ With concurrent data structures that solves performance bottlenecks
in central log buffer.
▪ Implementation
▪ Plug ELEDA to WiredTiger and Shore-MT and evaluate performance
improvements.
▪ (ex) Transaction throughput improves by higher than ~ 3.9 million
Txn/s.
13

Design: Parallel Logging on Multicore, Grasshopper (1)
14
▪ Issues on Parallel Logging on Multicore
▪ Guarantee the sequentiality of each logs.
▪ Detect log holes.
▪ Concurrently,
▪ buffering logs.
▪ writing logs to durable storage.

15
T1 T2 T3
Fetch_and_Add
LSN:1
LSN:2
LSN:3
(cf) LSN: Log Sequence Number

16
T1 T2 T3
hole
Fetch_and_Add
LSN:1
LSN:2
LSN:3
L1 L3
L2
SBL (cf) SBL: sequentially buffered LSN

▪ Concurrently,
▪ buffering logs.
17
T1 T2 T3
holeL1 L3
L2
SBLFlush

18
▪ Concurrently,
▪ buffering logs.
▪ So, design a concurrent data structure that satisfy,
▪ Concurrent buffering and flushing of logs,
▪ Fast log hole detection.

19
Thread type ELEDA-worker ELEDA-flusher Database
Data
structure
Global Central log buffer
Others
- Hopping index (R)
- C&H-list
- Min heap
⋅
- Hopping index (W)
- C&H-list
Operation - Tracking holes - Flush
- Copy log to buffer
- Garbage collection

▪ ELEDA logging architecture
20
DB thread

21
Flusher thread

22
Worker thread

▪ Grasshopper algorithm
23

▪ Latency-hiding techniques (asynchronous I/O)
24

25
Data
structure
Others
- Hopping index (R)
- C&H-list
- Min heap
⋅
- Hopping index (W)
- C&H-list

Design: Execution process of ELEDA-based system
26
L1
Page 1 Page 2 Page 3
L2 L3 L4 L5 L6 L7
Thread1
Thread2
Thread3
0 * 4k 1 * 4k 2 * 4k
hopping hopping crawling

27
L1
L2 L3 L4 L5 L6 L7
0 * 4k 1 * 4k 2 * 4k
hopping hopping
[1] 4096
[2] 4096
[3] 4096 / 3
Crawling
Hopping Index

28
Hopping
head
tail
Crawling
head
tail
1 4 7
Hopping
head
tail
Crawling
head
tail
2 6
Page 1 Page 2
Hopping
head
tail
Crawling
head
tail
3 5
Page 1 Page 2
1
2 3
Min heap
Thread 1
Thread 2
Thread 3

Flusher
Worker
1. Get HB by scanning Hopping index table.
HB is 2 in this case.
2. Remove items that related with page
number 2 in c-list and h-list.
3. Rebuild min heap.
4. Pop root(7) in min heap.
5. Then, SBL is 7.
6. Flush LSN 1~7 to storage.
(cf)
- HB: hopping boundary
- SBL: sequentially buffered LSN
▪ Tracking LSN holes (= log holes) and flushing SBL
29
[1] 4096 = DB page size
[2] 4096 = DB page size
[3] 4096 / 3 < DB page size
Hopping Index
HB

30
Hopping
head
tail
Crawling
head
tail
7
Page 3
Hopping
head
tail
Crawling
head
tail
Hopping
head
tail
Crawling
head
tail
7
Thread 1
Thread 2
Thread 3
Pop

Flusher
Worker
31
Hopping
head
tail
Crawling
head
tail
Hopping
head
tail
Crawling
head
tail
Hopping
head
tail
Crawling
head
tail
Thread 1
Thread 2
Thread 3
1. Get HB by scanning Hopping index table. HB
is 2 in this case.
2. Remove items that related with page number 2
in c-list and h-list.
3. Rebuild min heap.
4. Pop root(7) in min heap.
5. Then, SBL is 7.
6. Flush LSN 1~7 to storage.
(cf)
- HB: hopping boundary
- SBL: sequentially buffered LSN

Implementation
▪ Applying to kernel file system, such as ext4.
▪ Abstraction
32
Data
structure
Others
- Hopping index (R)
- C&H-list
- Min heap
⋅
- Hopping index (W)
- C&H-list

Implementation
▪ Shore-MT
▪ Implement ELEDA to Shore-MT with Aether.
(cf) Aether: A Scalable Approach to Logging, R.Johnson et al.
▪ Details
▪ Replace its consolidation array-based logging subsystem.
▪ Modify its flush pipelining implementation for transaction
switching.
33

Other issues (1)
▪ Flush
▪ I/O unit for flushing is experimentally tailored.
▪ It depends on characteristics of applications.
▪ Average size of logs
▪ Max concurrency
(cf) 6.5.3 Effects of I/O unit size (64KiB and 512KiB)
▪ Garbage Collection & Callback
▪ GC pointer is exclusively accessed by the owner DB thread.
34

Other issues (2)
▪ Partially sequential implementation
▪ Access of DB threads to Hopping index.
▪ Evaluation
▪ Throughput and Commit latency
▪ Workloads
▪ Key-value
▪ Online transaction processing
▪ with Different Settings by DB options
▪ CPU utilization and Effects of I/O unit size
35

Summary
Motivation.
▪ Central log buffer limits the scalability of DB logging on multicore.
→ Parallel logging on multicore using Grasshopper
Contribution.
▪ ELEDA (Express Logging Ensuring Durable Atomicity)
▪ Fast, scalable logging method for high performance transaction
systems with guaranteed atomicity and durability.
▪ With concurrent data structures that solves performance bottlenecks
in central log buffer.
▪ Implementation
▪ Plug ELEDA to WiredTiger and Shore-MT and evaluate performance
improvements.
▪ (ex) Transaction throughput improves by higher than ~ 3.9 million
Txn/s.
36

TODO
▪ Analyze Shore-MT and Aether.
▪ Where can I insert logging and flusher modules?
▪ Design the logging subsystem and flusher modules.
▪ Implement ELEDA to Shore-MT.
▪ Starting point is C&H-list.
37

Shore-MT and Aether
▪ Shore-MT
▪ Open-source multi-threaded storage manager.
▪ The authors use the EPFL branch of Shore-MT.
▪ Aether
▪ A scalable approach to logging.
▪ Details for implementation
▪ 4.1 Flush Pipelining → modified to ELEDA’s design
▪ A.1 Log buffer design
▪ A.2 Consolidation array → replaced with ELEDA’s design
▪ A.3 Modification to address a potential delays caused by the
requirement that all threads need to release their buffer in-order
39
pseudo
codes
exist.

Shore-MT
▪ Shore-MT and target for optimization
▪ Open-source multi-threaded storage manager.
▪ The authors use the EPFL branch of Shore-MT.
(cf) https://bitbucket.org/shoremt/shore-
mt/src/e832a6a586048ad3f4cdefde30cf96131d4b4525?at=default
▪ Language
▪ Cpp
▪ Related codes in src/sm/log.h & log.cpp
▪ Log manager class log_m
40

Aether
▪ Aether and TODO
▪ A scalable approach to logging.
▪ Details for implementation
▪ 4.1 Flush Pipelining → Modified to ELEDA’s design
▪ Related codes in src/sm/log_core.cpp
▪ Default flusher method
rc_t log_core::flush(lsn_t lsn, bool block)
▪ A.1 Log buffer design
▪ A.2 Consolidation array → Replaced with ELEDA’s design
▪ A.3 Modification to address a potential delays caused by the
requirement that all threads need to release their buffer in-order
▪ A.4 Difficulty of distributing the log
41

TODO
▪ Analyze Shore-MT and Aether.
▪ Shore-MT (default) → Aether → ELEDA
: Define what features (i.e. multi logging by DB threads) are
implemented in each systems.
▪ Find out which part of the ELEDA can be replaced by Flush
pipelining and Consolidation array of Aether.
▪ Design the logging subsystem and flusher modules.
▪ Implement ELEDA to Shore-MT.
▪ Starting point is C&H-list.
42

Reference
▪ Johnson, Ryan, et al. "Aether: a scalable approach to logging."
Proceedings of the VLDB Endowment 3.1-2 (2010): 681-692.
▪ Shore-MT (source code and docs), https://bitbucket.org/shoremt/
▪ Shore Storage Manager Modules,
http://research.cs.wisc.edu/shore-mt/onlinedoc/html/index.html
▪ Implementation notes of Log manager,
http://research.cs.wisc.edu/shore-
mt/onlinedoc/html/implnotes.html#LOG_M
43

Paper_Scalable database logging for multicores

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Paper_Scalable database logging for multicores

Similar to Paper_Scalable database logging for multicores (20)

More from Hyo jeong Lee

More from Hyo jeong Lee (10)

Recently uploaded

Recently uploaded (20)

Paper_Scalable database logging for multicores

Editor's Notes