SlideShare a Scribd company logo
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
Leif Walsh 
Tokutek, Inc. 
leif@tokutek.com 
@leifwalsh 
November 12, 2014 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 1 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
Background 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 2 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
Data structures: 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 3 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
Data structures: 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 3 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
Data structures: 
Provide retrieval of data. 
Lookup(Key) 
Pred(Key) 
Succ(Key) 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 3 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
Data structures: 
Provide retrieval of data. 
Lookup(Key) 
Pred(Key) 
Succ(Key) 
Dynamic data structures let you change 
the data. 
Insert(Key; Value) 
Delete(Key) 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 3 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
[Aggarwal & Vitter ’88] 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
DAM model 
Problem size N. 
Memory size M. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 4 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
[Aggarwal & Vitter ’88] 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
DAM model 
Problem size N. 
Memory size M. 
Transfer data to/from memory in blocks 
of size B. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 4 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
[Aggarwal & Vitter ’88] 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
DAM model 
Problem size N. 
Memory size M. 
Transfer data to/from memory in blocks 
of size B. 
Efficiency of operations is measured as the 
number of block transfers, a.k.a. IOPS. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 4 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
A B-tree is an external memory data structure: 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
A B-tree is an external memory data structure: 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
A B-tree is an external memory data structure: 
Balanced search tree. 
Fanout of B 
(block size / key size). 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
A B-tree is an external memory data structure: 
Balanced search tree. 
Fanout of B 
(block size / key size). 
Internal nodes < M. 
Leaf nodes > M. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
A B-tree is an external memory data structure: 
Balanced search tree. 
Fanout of B 
(block size / key size). 
Internal nodes < M. 
Leaf nodes > M. 
Search: O(logB N) I/Os 
Insert: O(logB N) I/Os 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
. 
. 
. 
. 
. 
. 
. 
[Brodal & Fagerberg ’03] 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 6 / 31
. 
. 
. 
. 
. 
. 
. 
[Brodal & Fagerberg ’03] 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 7 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
OLAP 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 8 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
OLAP: Online Analytical Processing 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 9 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
OLAP: Online Analytical Processing 
Key idea: Analyze data collected in the past. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 9 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
OLAP: Online Analytical Processing 
Key idea: Analyze data collected in the past. 
B-tree inserts are slow, but…logging and sorting are fast. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 9 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Merge sort: 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 10 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Merge sort in external memory: 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Merge sort in external memory: 
Merge sort cost in DAM model is: 
Cost to scan through all the data once. 
Multiplied by the # of levels in the merge tree. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Merge sort in external memory: 
Merge sort cost in DAM model is: 
Cost to scan through all the data once. 
N/B 
Multiplied by the # of levels in the merge tree. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Merge sort in external memory: 
Merge sort cost in DAM model is: 
Cost to scan through all the data once. 
N/B 
Multiplied by the # of levels in the merge tree. 
logM/B N/B 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Merge sort in external memory: 
Merge sort cost in DAM model is: 
Cost to scan through all the data once. 
N/B 
Multiplied by the # of levels in the merge tree. 
logM/B N/B 
O 
( 
N 
B 
logM/B 
N 
B 
) 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Insert N elements into a B-tree: 
O 
( 
N logB 
N 
M 
) 
Merge sort: 
O 
( 
N 
B 
logM/B 
N 
B 
) 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Insert N elements into a B-tree: 
O 
( 
N logB 
N 
M 
) 
Merge sort: 
O 
( 
N 
B 
logM/B 
N 
B 
) 
 2N 
B 
Typically, M/B is large, so only two passes are needed to sort. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Insert N elements into a B-tree: 
O 
( 
N logB 
N 
M 
) 
 N 
Merge sort: 
O 
( 
N 
B 
logM/B 
N 
B 
) 
 2N 
B 
Typically, M/B is large, so only two passes are needed to sort. 
Intuition: Each insert into a B-tree costs 1 seek, while sorting is close to disk bandwidth. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Insert N elements into a B-tree: (assuming 100-1000 byte elements) 
O 
( 
N logB 
N 
M 
) 
 N  10  100kB/s = 100 elements/s 
Merge sort: 
O 
( 
N 
B 
logM/B 
N 
B 
) 
 2N 
B 
Typically, M/B is large, so only two passes are needed to sort. 
Intuition: Each insert into a B-tree costs 1 seek, while sorting is close to disk bandwidth. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
Insert N elements into a B-tree: (assuming 100-1000 byte elements) 
O 
( 
N logB 
N 
M 
) 
 N  10  100kB/s = 100 elements/s 
Merge sort: 
O 
( 
N 
B 
logM/B 
N 
B 
) 
 2N 
B 
 50MB/s = 50k  500k elements/s 
Typically, M/B is large, so only two passes are needed to sort. 
Intuition: Each insert into a B-tree costs 1 seek, while sorting is close to disk bandwidth. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
So, how does OLAP work? 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
So, how does OLAP work? 
Log new data unindexed until you accumulate a lot of it (10% of the data set). 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
So, how does OLAP work? 
Log new data unindexed until you accumulate a lot of it (10% of the data set). 
Sort the new data. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
So, how does OLAP work? 
Log new data unindexed until you accumulate a lot of it (10% of the data set). 
Sort the new data. 
Use a merge pass through existing indexes to incorporate new data. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
So, how does OLAP work? 
Log new data unindexed until you accumulate a lot of it (10% of the data set). 
Sort the new data. 
Use a merge pass through existing indexes to incorporate new data. 
Use indexes to do analytics. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
So, how does OLAP work? 
Log new data unindexed until you accumulate a lot of it (10% of the data set). 
Sort the new data. 
Use a merge pass through existing indexes to incorporate new data. 
Use indexes to do analytics. 
Moral: OLAP techniques can handle high insertion volume, but query results are delayed. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #1: OLAP 
So, how does OLAP work? 
Log new data unindexed until you accumulate a lot of it (10% of the data set). 
Sort the new data. 
Use a merge pass through existing indexes to incorporate new data. 
Use indexes to do analytics. 
Moral: OLAP techniques can handle high insertion volume, but query results are delayed. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
LSM-trees 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 14 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP? 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP? 
The buffer is small, let’s index it! 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP? 
The buffer is small, let’s index it! 
Inserts go into the “buffer B-tree”. 
When the buffer gets full, we merge it with the “main B-tree”. 
Queries have to touch both trees and merge results, but results are available immediately. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP? 
The buffer is small, let’s index it! 
Inserts go into the “buffer B-tree”. 
When the buffer gets full, we merge it with the “main B-tree”. 
Queries have to touch both trees and merge results, but results are available immediately. 
(This specific technique (which is not yet an LSM-tree) is used in InnoDB and is called the “change buffer”.) 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
Why is this fast? 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
Why is this fast? 
The buffer is in-memory, so inserts are fast. 
When we merge, we put many new elements in each leaf in the main B-tree (this amortizes 
the I/O cost to read the leaf ). 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
Why is this fast? 
The buffer is in-memory, so inserts are fast. 
When we merge, we put many new elements in each leaf in the main B-tree (this amortizes 
the I/O cost to read the leaf ). 
Eventually, we reach a problem: 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
Why is this fast? 
The buffer is in-memory, so inserts are fast. 
When we merge, we put many new elements in each leaf in the main B-tree (this amortizes 
the I/O cost to read the leaf ). 
Eventually, we reach a problem: 
If the buffer gets too big, inserts get slow. 
If the buffer stays too small, the merge gets inefficient (back to O(N logB N)). 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How can we fix this? 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How can we fix this? More buffering! 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How can we fix this? More buffering! 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How can we fix this? More buffering! 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How can we fix this? More buffering! 
Each level is twice as large as the previous level, for some value of 2. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How can we fix this? More buffering! 
Each level is twice as large as the previous level, for some value of 2. We’ll use 2. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How do queries work? 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How do queries work? 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How do queries work? 
Search cost is: 
logB B + : : : + logB 
N 
8 
+ logB 
N 
4 
+ logB 
N 
2 
+ logB N 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How do queries work? 
Search cost is: 
logB B + : : : + logB 
N 
8 
+ logB 
N 
4 
+ logB 
N 
2 
+ logB N 
= 
1 
log B (1 + : : : + lg(N)  3 + lg(N)  2 + lg(N)  1 + lg(N)) 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How do queries work? 
Search cost is: 
logB B + : : : + logB 
N 
8 
+ logB 
N 
4 
+ logB 
N 
2 
+ logB N 
= 
1 
log B (1 + : : : + lg(N)  3 + lg(N)  2 + lg(N)  1 + lg(N)) 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How do queries work? 
Search cost is: 
logB B + : : : + logB 
N 
8 
+ logB 
N 
4 
+ logB 
N 
2 
+ logB N 
= 
1 
log B (1 + : : : + lg(N)  3 + lg(N)  2 + lg(N)  1 + lg(N)) = O(log N  logB N) 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How much do inserts cost? 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How much do inserts cost? 
Cost to flush a tree Tj of size X is O(X/B). 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How much do inserts cost? 
Cost to flush a tree Tj of size X is O(X/B). 
Cost per element to flush Tj is O(1/B). 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How much do inserts cost? 
Cost to flush a tree Tj of size X is O(X/B). 
Cost per element to flush Tj is O(1/B). 
Each element moves  log N times. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #2: LSM-trees 
How much do inserts cost? 
Cost to flush a tree Tj of size X is O(X/B). 
Cost per element to flush Tj is O(1/B). 
Each element moves  log N times. 
Total amortized insert cost per element is O 
( 
log N 
B 
) 
. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization in external memory data structures 
Fractal Trees 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 20 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
The pain in LSM-trees is doing a full O(logB N) search in each level. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 21 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
The pain in LSM-trees is doing a full O(logB N) search in each level. 
We use fractional cascading to reduce the search per level to O(1). 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 21 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
The pain in LSM-trees is doing a full O(logB N) search in each level. 
We use fractional cascading to reduce the search per level to O(1). 
The idea is that once we’ve searched Ti, we know where the key would be in Ti, and we can use 
that information to guide our search of Ti+1. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 21 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
Add forwarding pointers from leaves in Ti to leaves in Ti+1 (but remove the redundant ones that 
point to the same leaf ): 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 22 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
Add ghost pointers to leaves not pointed to in Ti+1 in leaves in Ti: 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 23 / 31
[Bender, Farach-Colton, Fineman, Fogel, Kuszmaul,  Nelson ’07] 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
Now, after searching Ti for a missing element c, we look left and right for forwarding or ghost 
pointers, and follow them down to look at O(1) leaves in Ti+1. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 24 / 31
[Bender, Farach-Colton, Fineman, Fogel, Kuszmaul,  Nelson ’07] 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
Now, after searching Ti for a missing element c, we look left and right for forwarding or ghost 
pointers, and follow them down to look at O(1) leaves in Ti+1. 
This way, search is only O(logR N) (in our example, R = 2). 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 24 / 31
[Bender, Farach-Colton, Fineman, Fogel, Kuszmaul,  Nelson ’07] 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
Now, after searching Ti for a missing element c, we look left and right for forwarding or ghost 
pointers, and follow them down to look at O(1) leaves in Ti+1. 
This way, search is only O(logR N) (in our example, R = 2). 
The internal node structure in each level is now redundant, so we can represent each level as an 
array. This is called a Cache-Oblivious Lookahead Array. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 24 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
Though the amortized analysis says our inserts are fast, when we flush a very large level to the 
next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
Though the amortized analysis says our inserts are fast, when we flush a very large level to the 
next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better. 
We break each level’s array into chunks that can be flushed independently. Each chunk flushes 
to a localized region of a few chunks in the next level down, found using its forwarding pointers. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
Though the amortized analysis says our inserts are fast, when we flush a very large level to the 
next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better. 
We break each level’s array into chunks that can be flushed independently. Each chunk flushes 
to a localized region of a few chunks in the next level down, found using its forwarding pointers. 
Now we have a tree again! 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Write-optimization technique #3: Fractal Trees 
Though the amortized analysis says our inserts are fast, when we flush a very large level to the 
next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better. 
We break each level’s array into chunks that can be flushed independently. Each chunk flushes 
to a localized region of a few chunks in the next level down, found using its forwarding pointers. 
Now we have a tree again! 
As it turns out, this structure makes it easier to manage an LRU-style cache of blocks and is more 
flexible in the face of “hotspot” workloads. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Results 
Modified B-tree-like dynamic (inserts, updates, deletes) data structure that supports point 
and range queries. 
Inserts ( 
are a factor B/ log B (typically 10-100x in practice) faster than a B-tree: 
O 
log N 
B 
) 
 O 
( 
log N 
log B 
) 
. 
Searches are a factor log B/ log R slower than a B-tree: O 
( 
log N 
log R 
) 
 O 
( 
log N 
log B 
) 
. 
To amortize flush costs over many elements, we want each block we write to be large 
(4MB), much larger than typical B-tree blocks (16KB). These compress well. 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 26 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Applications 
TokuDB for MySQL, TokuMX for MongoDB: 
Faster indexed insertions. 
Hot schema changes. 
Compression. 
Faster replication on secondaries (TokuMX). 
Lower impact migrations (TokuMX). 
Fast (no read before write) updates (in TokuDB, coming soon in TokuMX). 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 27 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Applications 
TokuDB for MySQL, TokuMX for MongoDB: 
Faster indexed insertions. 
Hot schema changes. 
Compression. 
Faster replication on secondaries (TokuMX). 
Lower impact migrations (TokuMX). 
Fast (no read before write) updates (in TokuDB, coming soon in TokuMX). 
ACID transactions. 
Concurrency (TokuMX). 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 27 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Benchmarks 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 28 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Benchmarks 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 29 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Benchmarks 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 30 / 31
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Questions? 
Leif Walsh 
leif@tokutek.com 
@leifwalsh 
Downloads: www.tokutek.com/downloads 
Docs: docs.tokutek.com 
Slides: slidesha.re/1yDBLXQ 
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 31 / 31

More Related Content

Viewers also liked

A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
leifwalsh
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
Francisco Zamora-Martinez
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data Processing
Milind Gokhale
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithmsrajatmay1992
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Francisco Zamora-Martinez
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
Francisco Zamora-Martinez
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
Francisco Zamora-Martinez
 
PhD defence
PhD defencePhD defence
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
leifwalsh
 
@pospaseis
@pospaseis@pospaseis
Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs Training
Francisco Zamora-Martinez
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
Francisco Zamora-Martinez
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
Rogue Wave Software
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
Sri Ambati
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
leifwalsh
 

Viewers also liked (16)

A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 
Efficient Sorts
Efficient SortsEfficient Sorts
Efficient Sorts
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data Processing
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithms
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
 
PhD defence
PhD defencePhD defence
PhD defence
 
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
 
@pospaseis
@pospaseis@pospaseis
@pospaseis
 
Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs Training
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 

Similar to Write optimization in external memory data structures

The Level Ancestor Problem simplified
The Level Ancestor Problem simplifiedThe Level Ancestor Problem simplified
The Level Ancestor Problem simplified
leifwalsh
 
Wise Document Translator Report
Wise Document Translator ReportWise Document Translator Report
Wise Document Translator Report
Raouf KESKES
 
Data Management
Data ManagementData Management
Data Management
Emmett Poindexter
 
LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...
LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...
LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...
locloud
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
my presentation of the paper "FAST'12 NCCloud"
my presentation of the paper "FAST'12 NCCloud"my presentation of the paper "FAST'12 NCCloud"
my presentation of the paper "FAST'12 NCCloud"Shuai Yuan
 
Presentation hybrid store-ewsn-2013
Presentation hybrid store-ewsn-2013Presentation hybrid store-ewsn-2013
Presentation hybrid store-ewsn-2013Baobing Wang
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
Andy Petrella
 
Storage Systems
Storage SystemsStorage Systems
Storage Systems
David Evans
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
eXascale Infolab
 

Similar to Write optimization in external memory data structures (14)

The Level Ancestor Problem simplified
The Level Ancestor Problem simplifiedThe Level Ancestor Problem simplified
The Level Ancestor Problem simplified
 
Wise Document Translator Report
Wise Document Translator ReportWise Document Translator Report
Wise Document Translator Report
 
Data Management
Data ManagementData Management
Data Management
 
LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...
LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...
LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
my presentation of the paper "FAST'12 NCCloud"
my presentation of the paper "FAST'12 NCCloud"my presentation of the paper "FAST'12 NCCloud"
my presentation of the paper "FAST'12 NCCloud"
 
Approach
ApproachApproach
Approach
 
Approach
ApproachApproach
Approach
 
Presentation hybrid store-ewsn-2013
Presentation hybrid store-ewsn-2013Presentation hybrid store-ewsn-2013
Presentation hybrid store-ewsn-2013
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
 
ClusteringDEC
ClusteringDECClusteringDEC
ClusteringDEC
 
Storage Systems
Storage SystemsStorage Systems
Storage Systems
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 

Recently uploaded

Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 

Recently uploaded (20)

Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 

Write optimization in external memory data structures

  • 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures Leif Walsh Tokutek, Inc. leif@tokutek.com @leifwalsh November 12, 2014 Leif Walsh (Tokutek) Fractal Trees November 12, 2014 1 / 31
  • 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures Background Leif Walsh (Tokutek) Fractal Trees November 12, 2014 2 / 31
  • 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures Data structures: Leif Walsh (Tokutek) Fractal Trees November 12, 2014 3 / 31
  • 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures Data structures: Leif Walsh (Tokutek) Fractal Trees November 12, 2014 3 / 31
  • 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures Data structures: Provide retrieval of data. Lookup(Key) Pred(Key) Succ(Key) Leif Walsh (Tokutek) Fractal Trees November 12, 2014 3 / 31
  • 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures Data structures: Provide retrieval of data. Lookup(Key) Pred(Key) Succ(Key) Dynamic data structures let you change the data. Insert(Key; Value) Delete(Key) Leif Walsh (Tokutek) Fractal Trees November 12, 2014 3 / 31
  • 7. . . . . . . . . . [Aggarwal & Vitter ’88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures DAM model Problem size N. Memory size M. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 4 / 31
  • 8. . . . . . . . . . [Aggarwal & Vitter ’88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures DAM model Problem size N. Memory size M. Transfer data to/from memory in blocks of size B. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 4 / 31
  • 9. . . . . . . . . . [Aggarwal & Vitter ’88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures DAM model Problem size N. Memory size M. Transfer data to/from memory in blocks of size B. Efficiency of operations is measured as the number of block transfers, a.k.a. IOPS. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 4 / 31
  • 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures A B-tree is an external memory data structure: Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
  • 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures A B-tree is an external memory data structure: Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
  • 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures A B-tree is an external memory data structure: Balanced search tree. Fanout of B (block size / key size). Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
  • 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures A B-tree is an external memory data structure: Balanced search tree. Fanout of B (block size / key size). Internal nodes < M. Leaf nodes > M. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
  • 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures A B-tree is an external memory data structure: Balanced search tree. Fanout of B (block size / key size). Internal nodes < M. Leaf nodes > M. Search: O(logB N) I/Os Insert: O(logB N) I/Os Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
  • 15. . . . . . . . [Brodal & Fagerberg ’03] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures Leif Walsh (Tokutek) Fractal Trees November 12, 2014 6 / 31
  • 16. . . . . . . . [Brodal & Fagerberg ’03] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures Leif Walsh (Tokutek) Fractal Trees November 12, 2014 7 / 31
  • 17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures OLAP Leif Walsh (Tokutek) Fractal Trees November 12, 2014 8 / 31
  • 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP OLAP: Online Analytical Processing Leif Walsh (Tokutek) Fractal Trees November 12, 2014 9 / 31
  • 19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP OLAP: Online Analytical Processing Key idea: Analyze data collected in the past. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 9 / 31
  • 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP OLAP: Online Analytical Processing Key idea: Analyze data collected in the past. B-tree inserts are slow, but…logging and sorting are fast. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 9 / 31
  • 21. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Merge sort: Leif Walsh (Tokutek) Fractal Trees November 12, 2014 10 / 31
  • 22. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Merge sort in external memory: Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
  • 23. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Merge sort in external memory: Merge sort cost in DAM model is: Cost to scan through all the data once. Multiplied by the # of levels in the merge tree. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
  • 24. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Merge sort in external memory: Merge sort cost in DAM model is: Cost to scan through all the data once. N/B Multiplied by the # of levels in the merge tree. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
  • 25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Merge sort in external memory: Merge sort cost in DAM model is: Cost to scan through all the data once. N/B Multiplied by the # of levels in the merge tree. logM/B N/B Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
  • 26. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Merge sort in external memory: Merge sort cost in DAM model is: Cost to scan through all the data once. N/B Multiplied by the # of levels in the merge tree. logM/B N/B O ( N B logM/B N B ) Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
  • 27. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Insert N elements into a B-tree: O ( N logB N M ) Merge sort: O ( N B logM/B N B ) Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
  • 28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Insert N elements into a B-tree: O ( N logB N M ) Merge sort: O ( N B logM/B N B ) 2N B Typically, M/B is large, so only two passes are needed to sort. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
  • 29. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Insert N elements into a B-tree: O ( N logB N M ) N Merge sort: O ( N B logM/B N B ) 2N B Typically, M/B is large, so only two passes are needed to sort. Intuition: Each insert into a B-tree costs 1 seek, while sorting is close to disk bandwidth. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
  • 30. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Insert N elements into a B-tree: (assuming 100-1000 byte elements) O ( N logB N M ) N 10 100kB/s = 100 elements/s Merge sort: O ( N B logM/B N B ) 2N B Typically, M/B is large, so only two passes are needed to sort. Intuition: Each insert into a B-tree costs 1 seek, while sorting is close to disk bandwidth. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
  • 31. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP Insert N elements into a B-tree: (assuming 100-1000 byte elements) O ( N logB N M ) N 10 100kB/s = 100 elements/s Merge sort: O ( N B logM/B N B ) 2N B 50MB/s = 50k 500k elements/s Typically, M/B is large, so only two passes are needed to sort. Intuition: Each insert into a B-tree costs 1 seek, while sorting is close to disk bandwidth. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
  • 32. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP So, how does OLAP work? Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
  • 33. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP So, how does OLAP work? Log new data unindexed until you accumulate a lot of it (10% of the data set). Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
  • 34. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP So, how does OLAP work? Log new data unindexed until you accumulate a lot of it (10% of the data set). Sort the new data. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
  • 35. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP So, how does OLAP work? Log new data unindexed until you accumulate a lot of it (10% of the data set). Sort the new data. Use a merge pass through existing indexes to incorporate new data. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
  • 36. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP So, how does OLAP work? Log new data unindexed until you accumulate a lot of it (10% of the data set). Sort the new data. Use a merge pass through existing indexes to incorporate new data. Use indexes to do analytics. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
  • 37. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP So, how does OLAP work? Log new data unindexed until you accumulate a lot of it (10% of the data set). Sort the new data. Use a merge pass through existing indexes to incorporate new data. Use indexes to do analytics. Moral: OLAP techniques can handle high insertion volume, but query results are delayed. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
  • 38. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #1: OLAP So, how does OLAP work? Log new data unindexed until you accumulate a lot of it (10% of the data set). Sort the new data. Use a merge pass through existing indexes to incorporate new data. Use indexes to do analytics. Moral: OLAP techniques can handle high insertion volume, but query results are delayed. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
  • 39. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures LSM-trees Leif Walsh (Tokutek) Fractal Trees November 12, 2014 14 / 31
  • 40. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP? Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
  • 41. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP? The buffer is small, let’s index it! Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
  • 42. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP? The buffer is small, let’s index it! Inserts go into the “buffer B-tree”. When the buffer gets full, we merge it with the “main B-tree”. Queries have to touch both trees and merge results, but results are available immediately. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
  • 43. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP? The buffer is small, let’s index it! Inserts go into the “buffer B-tree”. When the buffer gets full, we merge it with the “main B-tree”. Queries have to touch both trees and merge results, but results are available immediately. (This specific technique (which is not yet an LSM-tree) is used in InnoDB and is called the “change buffer”.) Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
  • 44. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees Why is this fast? Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
  • 45. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees Why is this fast? The buffer is in-memory, so inserts are fast. When we merge, we put many new elements in each leaf in the main B-tree (this amortizes the I/O cost to read the leaf ). Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
  • 46. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees Why is this fast? The buffer is in-memory, so inserts are fast. When we merge, we put many new elements in each leaf in the main B-tree (this amortizes the I/O cost to read the leaf ). Eventually, we reach a problem: Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
  • 47. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees Why is this fast? The buffer is in-memory, so inserts are fast. When we merge, we put many new elements in each leaf in the main B-tree (this amortizes the I/O cost to read the leaf ). Eventually, we reach a problem: If the buffer gets too big, inserts get slow. If the buffer stays too small, the merge gets inefficient (back to O(N logB N)). Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
  • 48. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How can we fix this? Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
  • 49. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How can we fix this? More buffering! Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
  • 50. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How can we fix this? More buffering! Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
  • 51. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How can we fix this? More buffering! Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
  • 52. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How can we fix this? More buffering! Each level is twice as large as the previous level, for some value of 2. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
  • 53. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How can we fix this? More buffering! Each level is twice as large as the previous level, for some value of 2. We’ll use 2. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
  • 54. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How do queries work? Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
  • 55. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How do queries work? Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
  • 56. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How do queries work? Search cost is: logB B + : : : + logB N 8 + logB N 4 + logB N 2 + logB N Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
  • 57. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How do queries work? Search cost is: logB B + : : : + logB N 8 + logB N 4 + logB N 2 + logB N = 1 log B (1 + : : : + lg(N) 3 + lg(N) 2 + lg(N) 1 + lg(N)) Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
  • 58. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How do queries work? Search cost is: logB B + : : : + logB N 8 + logB N 4 + logB N 2 + logB N = 1 log B (1 + : : : + lg(N) 3 + lg(N) 2 + lg(N) 1 + lg(N)) Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
  • 59. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How do queries work? Search cost is: logB B + : : : + logB N 8 + logB N 4 + logB N 2 + logB N = 1 log B (1 + : : : + lg(N) 3 + lg(N) 2 + lg(N) 1 + lg(N)) = O(log N logB N) Leif Walsh (Tokutek) Fractal Trees November 12, 2014 18 / 31
  • 60. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How much do inserts cost? Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
  • 61. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How much do inserts cost? Cost to flush a tree Tj of size X is O(X/B). Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
  • 62. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How much do inserts cost? Cost to flush a tree Tj of size X is O(X/B). Cost per element to flush Tj is O(1/B). Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
  • 63. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How much do inserts cost? Cost to flush a tree Tj of size X is O(X/B). Cost per element to flush Tj is O(1/B). Each element moves log N times. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
  • 64. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #2: LSM-trees How much do inserts cost? Cost to flush a tree Tj of size X is O(X/B). Cost per element to flush Tj is O(1/B). Each element moves log N times. Total amortized insert cost per element is O ( log N B ) . Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
  • 65. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization in external memory data structures Fractal Trees Leif Walsh (Tokutek) Fractal Trees November 12, 2014 20 / 31
  • 66. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees The pain in LSM-trees is doing a full O(logB N) search in each level. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 21 / 31
  • 67. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees The pain in LSM-trees is doing a full O(logB N) search in each level. We use fractional cascading to reduce the search per level to O(1). Leif Walsh (Tokutek) Fractal Trees November 12, 2014 21 / 31
  • 68. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees The pain in LSM-trees is doing a full O(logB N) search in each level. We use fractional cascading to reduce the search per level to O(1). The idea is that once we’ve searched Ti, we know where the key would be in Ti, and we can use that information to guide our search of Ti+1. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 21 / 31
  • 69. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees Add forwarding pointers from leaves in Ti to leaves in Ti+1 (but remove the redundant ones that point to the same leaf ): Leif Walsh (Tokutek) Fractal Trees November 12, 2014 22 / 31
  • 70. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees Add ghost pointers to leaves not pointed to in Ti+1 in leaves in Ti: Leif Walsh (Tokutek) Fractal Trees November 12, 2014 23 / 31
  • 71. [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson ’07] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees Now, after searching Ti for a missing element c, we look left and right for forwarding or ghost pointers, and follow them down to look at O(1) leaves in Ti+1. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 24 / 31
  • 72. [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson ’07] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees Now, after searching Ti for a missing element c, we look left and right for forwarding or ghost pointers, and follow them down to look at O(1) leaves in Ti+1. This way, search is only O(logR N) (in our example, R = 2). Leif Walsh (Tokutek) Fractal Trees November 12, 2014 24 / 31
  • 73. [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson ’07] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees Now, after searching Ti for a missing element c, we look left and right for forwarding or ghost pointers, and follow them down to look at O(1) leaves in Ti+1. This way, search is only O(logR N) (in our example, R = 2). The internal node structure in each level is now redundant, so we can represent each level as an array. This is called a Cache-Oblivious Lookahead Array. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 24 / 31
  • 74. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees Though the amortized analysis says our inserts are fast, when we flush a very large level to the next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
  • 75. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees Though the amortized analysis says our inserts are fast, when we flush a very large level to the next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better. We break each level’s array into chunks that can be flushed independently. Each chunk flushes to a localized region of a few chunks in the next level down, found using its forwarding pointers. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
  • 76. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees Though the amortized analysis says our inserts are fast, when we flush a very large level to the next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better. We break each level’s array into chunks that can be flushed independently. Each chunk flushes to a localized region of a few chunks in the next level down, found using its forwarding pointers. Now we have a tree again! Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
  • 77. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Write-optimization technique #3: Fractal Trees Though the amortized analysis says our inserts are fast, when we flush a very large level to the next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better. We break each level’s array into chunks that can be flushed independently. Each chunk flushes to a localized region of a few chunks in the next level down, found using its forwarding pointers. Now we have a tree again! As it turns out, this structure makes it easier to manage an LRU-style cache of blocks and is more flexible in the face of “hotspot” workloads. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
  • 78. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results Modified B-tree-like dynamic (inserts, updates, deletes) data structure that supports point and range queries. Inserts ( are a factor B/ log B (typically 10-100x in practice) faster than a B-tree: O log N B ) O ( log N log B ) . Searches are a factor log B/ log R slower than a B-tree: O ( log N log R ) O ( log N log B ) . To amortize flush costs over many elements, we want each block we write to be large (4MB), much larger than typical B-tree blocks (16KB). These compress well. Leif Walsh (Tokutek) Fractal Trees November 12, 2014 26 / 31
  • 79. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications TokuDB for MySQL, TokuMX for MongoDB: Faster indexed insertions. Hot schema changes. Compression. Faster replication on secondaries (TokuMX). Lower impact migrations (TokuMX). Fast (no read before write) updates (in TokuDB, coming soon in TokuMX). Leif Walsh (Tokutek) Fractal Trees November 12, 2014 27 / 31
  • 80. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications TokuDB for MySQL, TokuMX for MongoDB: Faster indexed insertions. Hot schema changes. Compression. Faster replication on secondaries (TokuMX). Lower impact migrations (TokuMX). Fast (no read before write) updates (in TokuDB, coming soon in TokuMX). ACID transactions. Concurrency (TokuMX). Leif Walsh (Tokutek) Fractal Trees November 12, 2014 27 / 31
  • 81. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmarks Leif Walsh (Tokutek) Fractal Trees November 12, 2014 28 / 31
  • 82. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmarks Leif Walsh (Tokutek) Fractal Trees November 12, 2014 29 / 31
  • 83. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmarks Leif Walsh (Tokutek) Fractal Trees November 12, 2014 30 / 31
  • 84. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Questions? Leif Walsh leif@tokutek.com @leifwalsh Downloads: www.tokutek.com/downloads Docs: docs.tokutek.com Slides: slidesha.re/1yDBLXQ Leif Walsh (Tokutek) Fractal Trees November 12, 2014 31 / 31