Big problems, Massive dataStratified B-trees
Versioned dictionaries•   put(k,ver,data)                                          Monday 12:00   v10•   get(k_start,k_end...
Versioned dictionaries•   put(k,ver,data)                                          Monday 12:00   v10•   get(k_start,k_end...
Why?                           •    Powerful: cloning, time-travel,                                cache and space-efficien...
State of the art: copy-on-write  Used in ZFS, WAFL, Btrfs, ... Apply path-copying [DSST] to  the B-tree
State of the art: copy-on-write  Used in ZFS, WAFL, Btrfs, ... Apply path-copying [DSST] to  the B-tree                   ...
State of the art: copy-on-write  Used in ZFS, WAFL, Btrfs, ... Apply path-copying [DSST] to  the B-tree                   ...
~ log (2^30)/log 10000       = 3 IOs/update                               CoW B-tree                            [ZFS,WAFL,...
important for flash   ~ log (2^30)/log 10000                                    ~ log (2^30)/10000       = 3 IOs/update    ...
Unversioned Case[Doubling Array]
Doubling Array                      Inserts Buffer arrays in memoryuntil we have > B of them
Doubling Array                      Inserts   2 Buffer arrays in memoryuntil we have > B of them
Doubling Array                      Inserts   2   9 Buffer arrays in memoryuntil we have > B of them
Doubling Array                      Inserts   2   9 Buffer arrays in memoryuntil we have > B of them
Doubling Array                      Inserts            2    9 Buffer arrays in memoryuntil we have > B of them
Doubling Array        Inserts2   9
Doubling Array             Inserts11   2   9
Doubling Array             Inserts11   2   98
Doubling Array         Inserts2   98   11
Doubling Array         Inserts2   98   11
Doubling Array                       Inserts                        2   8   9   11                                        ...
Doubling Array    Queries
Doubling Array                 Queries• Add an index to each array to do lookups
Doubling Array                 Queries              query(k)• Add an index to each array to do lookups• query(k) searches ...
Doubling Array                   Queries                query(k)• Bloom Filters can help exclude arrays from  search• ... ...
Fractional Cascading
Fractional Cascading• Fractional Cascading:    Use information from search at level l          to help search at level l+1...
Fractional Cascading                  found entry• Fractional Cascading:    Use information from search at level l        ...
Fractional Cascading                  found entry                  ‘forward pointers’ give bounds for search in next array...
Fractional Cascading forward pointer data
Fractional Cascadingsearch
Fractional Cascadingsearch
Fractional Cascading search
Fractional Cascading          search
Fractional Cascading                       search
Fractional Cascading
Fractional Cascading• In case you might get unlucky with the  sampling...
Fractional Cascading• In case you might get unlucky with the  sampling...• ... add regular ‘secondary’ pointers to nearest...
Versioned case (sketch)
Adding versions                  version 1k1   k2   k3k4   k5   k6   k7   k8   k9 k10 k11 k12 k13if layout is good for v1 ...
Adding versions                  version 1k1   k2   k3k4   k5   k6   k7   k8   k9 k10 k11 k12 k13 k6                      ...
Adding versions                               version 1         k1   k2   k3k4   k5    k6   k7   k8   k9 k10 k11 k12 k13 k...
Adding versions                             version 1       k1    k2   k3   k4   k5   k6   k7   k8   k9 k10 k11 k12 k13 k6...
Density                           k0      k1      k2     k3                    v0                    v4                   ...
Density                                     k0      k1      k2     k3      live at v1                              v0    l...
Density                                     k0      k1      k2     k3      live at v1                              v0    l...
optimal bound of O(log Nv + Z/B). For much smaller rangequeries, the worst-case performance may be the same as fora point ...
Don’t worry, stay dense!       •   Version sets disjoint at each level -- lookups examine one array/level       •   merge ...
“density amplification”                                                         k0   k1   k2   k3       live(v0) = 2       ...
e-    If (A, V ) also satisfies (L-live) then every split of it does      (since all live elements are included), and likew...
O n snapshot or clone of version v to new descendant ver-            ou tpu tsion v , v is registered for each array A whi...
O n snapshot or clone of version v to new descendant ver-           ou tpu tsion v , v is registered for each array A whic...
9: return [split(r)] O n snapshot or clone of version v to new descendant ver-             ou tpu t sion v , v is register...
Does it work?
Insert rate, as a function of dictionary size                     1e+06                     100000Inserts per second      ...
Range rate, as a function of dictionary size                   1e+09                   1e+08Reads per second              ...
bitbucket.org/acunu                                          www.acunu.com/downloadApache, Apache Cassandra, Cassandra, Ha...
Upcoming SlideShare
Loading in …5
×

2011.06.20 stratified-btree

6,832 views

Published on

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,832
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • LolCoW. if you want to do fast updates, then CoW technique cannot help -- the cow is built around the assumption that every update can do a lookup, and update reference counts\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • The crucial notion is density. A versioned array, a version tree and its layout on disk. Versions v1,v2,v3 are tagged, so dark entries are lead entries.\nThe entry (k0,v0,x) is written in v0, so it is not a lead entry, but it is live at v1,v2 and v3. Similarly, (k1, v0, x) is live at v1 and v3 (since it was not overwritten at v1) but not at v2.\nThe live counts are as follows: live(v1) = 4, live(v2) = 4, live(v3) = 4, density = 4/8.\nIn practice, the on-disk layout can be compressed by writing the key once for all the versions, and other well-known techniques.\n
  • The crucial notion is density. A versioned array, a version tree and its layout on disk. Versions v1,v2,v3 are tagged, so dark entries are lead entries.\nThe entry (k0,v0,x) is written in v0, so it is not a lead entry, but it is live at v1,v2 and v3. Similarly, (k1, v0, x) is live at v1 and v3 (since it was not overwritten at v1) but not at v2.\nThe live counts are as follows: live(v1) = 4, live(v2) = 4, live(v3) = 4, density = 4/8.\nIn practice, the on-disk layout can be compressed by writing the key once for all the versions, and other well-known techniques.\n
  • \n
  • \n
  • Example of density amplification. The merged array has density $\\frac{2}{11} < \\frac{1}{5}$, so it is not dense. We find a split into two parts: the first split $(A_{1},\\{v_{0},v_{5}\\})$ has size 4 and density $\\frac{1}{2}$. The second split $(A_{2},\\{v_{4}, v_{1}, v_{2},v_{3}\\})$ has size 7 and density $\\frac{2}{7}$. Both splits have size $<8$ and density $\\ge \\frac{1}{5}$, so they can remain at the current level.\n\nWe start at the root version and greedily search for a version $v$ and some subset of its children whose split arrays can be merged into one dense array at level $l$. More precisely, letting $\\mathcal{U}=\\bigcup_{i} \\mathcal{W'}[v_{i}]$, we search for a subset of $v$'s children $\\{v_{i}\\}$ such that \n$$|\\mathrm{split}(\\mathcal{A'},\\mathcal{U}) | < 2^{l+1}.$$ \n\nIf no such set exists at $v$, we recurse into the child $v_{i}$ maximizing $|\\mathrm{split}(\\mathcal{A'}, \\mathcal{W'}[v_{i}])|$. It is possible to show that this always finds a dense split. Once such a set $\\mathcal{U}$ is identified, the corresponding array is written out, and we recurse on the remainder $\\mathrm{split}(\\mathcal{A'}, \\mathcal{W'} \\setminus \\mathcal{U})$. Figure \\ref{fig:split} gives an example of density amplification.\n\n\n
  • Example of density amplification. The merged array has density $\\frac{2}{11} < \\frac{1}{5}$, so it is not dense. We find a split into two parts: the first split $(A_{1},\\{v_{0},v_{5}\\})$ has size 4 and density $\\frac{1}{2}$. The second split $(A_{2},\\{v_{4}, v_{1}, v_{2},v_{3}\\})$ has size 7 and density $\\frac{2}{7}$. Both splits have size $<8$ and density $\\ge \\frac{1}{5}$, so they can remain at the current level.\n\nWe start at the root version and greedily search for a version $v$ and some subset of its children whose split arrays can be merged into one dense array at level $l$. More precisely, letting $\\mathcal{U}=\\bigcup_{i} \\mathcal{W'}[v_{i}]$, we search for a subset of $v$'s children $\\{v_{i}\\}$ such that \n$$|\\mathrm{split}(\\mathcal{A'},\\mathcal{U}) | < 2^{l+1}.$$ \n\nIf no such set exists at $v$, we recurse into the child $v_{i}$ maximizing $|\\mathrm{split}(\\mathcal{A'}, \\mathcal{W'}[v_{i}])|$. It is possible to show that this always finds a dense split. Once such a set $\\mathcal{U}$ is identified, the corresponding array is written out, and we recurse on the remainder $\\mathrm{split}(\\mathcal{A'}, \\mathcal{W'} \\setminus \\mathcal{U})$. Figure \\ref{fig:split} gives an example of density amplification.\n\n\n
  • \n
  • \n
  • \n
  • \n
  • The plot shows range query performance (elements/s extracted using range queries of size 1000).\nThe CoW B-tree is limited by random IO here ((100/s*32KB)/(200 bytes/key) = 16384 key/s), but the Stratified B-tree is CPU-bound (OCaml is single-threaded).\nPreliminary performance results from a highly-concurrent in-kernel implementation suggest that well over 500k updates/s are possible with 16 cores \n
  • \n
  • 2011.06.20 stratified-btree

    1. 1. Big problems, Massive dataStratified B-trees
    2. 2. Versioned dictionaries• put(k,ver,data) Monday 12:00 v10• get(k_start,k_end,ver)• clone(v): create a child of v Monday 16:00 v11 that inherits the latest version of its keys Now v12
    3. 3. Versioned dictionaries• put(k,ver,data) Monday 12:00 v10• get(k_start,k_end,ver)• clone(v): create a child of v Monday 16:00 v11 that inherits the latest version of its keys Now v12 This talk: a versioned dictionary with fast updates, and optimal space/query/update tradeoffs
    4. 4. Why? • Powerful: cloning, time-travel, cache and space-efficiency, ...Monday 12:00 v10 • Give developers a recent branch of live datasetMonday 16:00 v11 • Expose different views of same base dataset Now v12 v13 Run analytics/tests/etc on this clone, without performance impact.
    5. 5. State of the art: copy-on-write Used in ZFS, WAFL, Btrfs, ... Apply path-copying [DSST] to the B-tree
    6. 6. State of the art: copy-on-write Used in ZFS, WAFL, Btrfs, ... Apply path-copying [DSST] to the B-tree Problems: • Space blowup: Each update may rewrite an entire path • Slow updates: as above • Needs random IO to scale • Concurrency is tricky
    7. 7. State of the art: copy-on-write Used in ZFS, WAFL, Btrfs, ... Apply path-copying [DSST] to the B-tree Problems: • Space blowup: Each update may rewrite an entire path • Slow updates: as above • Needs random IO to scale • Concurrency is tricky A log file system makes updates sequential, but relies on garbage collection (achilles heel!)
    8. 8. ~ log (2^30)/log 10000 = 3 IOs/update CoW B-tree [ZFS,WAFL,Btrfs,..] O(logB Nv) Update random IOs Range query O(Z/B) random (size Z) Space O(N B logB Nv)Nv = #keys live (accessible) at version vB = “block size”, say 1MB at 100 bytes/entry = 10000 entriescomplication: B is asymmetric for flash..
    9. 9. important for flash ~ log (2^30)/log 10000 ~ log (2^30)/10000 = 3 IOs/update = 0.003 IOs/update CoW B-tree This talk [ZFS,WAFL,Btrfs,..] O(logB Nv) O((log Nv) / B) Update random IOs cache-oblivious IOs Range query O(Z/B) random O(Z/B) sequential (size Z) Space O(N B logB Nv) O(N)Nv = #keys live (accessible) at version vB = “block size”, say 1MB at 100 bytes/entry = 10000 entriescomplication: B is asymmetric for flash..
    10. 10. Unversioned Case[Doubling Array]
    11. 11. Doubling Array Inserts Buffer arrays in memoryuntil we have > B of them
    12. 12. Doubling Array Inserts 2 Buffer arrays in memoryuntil we have > B of them
    13. 13. Doubling Array Inserts 2 9 Buffer arrays in memoryuntil we have > B of them
    14. 14. Doubling Array Inserts 2 9 Buffer arrays in memoryuntil we have > B of them
    15. 15. Doubling Array Inserts 2 9 Buffer arrays in memoryuntil we have > B of them
    16. 16. Doubling Array Inserts2 9
    17. 17. Doubling Array Inserts11 2 9
    18. 18. Doubling Array Inserts11 2 98
    19. 19. Doubling Array Inserts2 98 11
    20. 20. Doubling Array Inserts2 98 11
    21. 21. Doubling Array Inserts 2 8 9 11 etc...Similar to log-structured merge trees (LSM), cache-oblivious lookahead array (COLA), ...O(log N) “levels”, each element is rewritten once per level O((log N) / B) IOs
    22. 22. Doubling Array Queries
    23. 23. Doubling Array Queries• Add an index to each array to do lookups
    24. 24. Doubling Array Queries query(k)• Add an index to each array to do lookups• query(k) searches each array independently
    25. 25. Doubling Array Queries query(k)• Bloom Filters can help exclude arrays from search• ... but don’t help with range queries
    26. 26. Fractional Cascading
    27. 27. Fractional Cascading• Fractional Cascading: Use information from search at level l to help search at level l+1• From each array, sample every 4th element and put a pointer to it in previous level
    28. 28. Fractional Cascading found entry• Fractional Cascading: Use information from search at level l to help search at level l+1• From each array, sample every 4th element and put a pointer to it in previous level
    29. 29. Fractional Cascading found entry ‘forward pointers’ give bounds for search in next array• Fractional Cascading: Use information from search at level l to help search at level l+1• From each array, sample every 4th element and put a pointer to it in previous level
    30. 30. Fractional Cascading forward pointer data
    31. 31. Fractional Cascadingsearch
    32. 32. Fractional Cascadingsearch
    33. 33. Fractional Cascading search
    34. 34. Fractional Cascading search
    35. 35. Fractional Cascading search
    36. 36. Fractional Cascading
    37. 37. Fractional Cascading• In case you might get unlucky with the sampling...
    38. 38. Fractional Cascading• In case you might get unlucky with the sampling...• ... add regular ‘secondary’ pointers to nearest FP above and below
    39. 39. Versioned case (sketch)
    40. 40. Adding versions version 1k1 k2 k3k4 k5 k6 k7 k8 k9 k10 k11 k12 k13if layout is good for v1 ... v1 v2
    41. 41. Adding versions version 1k1 k2 k3k4 k5 k6 k7 k8 k9 k10 k11 k12 k13 k6 version 2if layout is good for v1 ... ... then it’s bad for v2 v1 v2
    42. 42. Adding versions version 1 k1 k2 k3k4 k5 k6 k7 k8 k9 k10 k11 k12 k13 k6 version 2 if layout is good for v1 ... ... then it’s bad for v2if you try to keep all versions of a key close... v1 k1 k2 k3 k4 k5 k6 k6 k7 k8 k9 k10 k11 k12 k13 v2
    43. 43. Adding versions version 1 k1 k2 k3 k4 k5 k6 k7 k8 k9 k10 k11 k12 k13 k6 version 2 if layout is good for v1 ... ... then it’s bad for v2if you try to keep all versions of a key close... k1 k2 k3 k4 k5 k6 k6 k6 k6 k6 ... k7 k8 k9 k10 k11 k12 k13 ... then it’s bad for all versions versions 2, 3, 4, ...
    44. 44. Density k0 k1 k2 k3 v0 v4 v0 v5 v4 v5 v1 v1 v2 v2 v3 v3 W={v1,v2,v3} k0, v0, x k1, v0, x k1, v2, x k2, v1, x k2, v2, x k2, v3, x k3, v1, x k3, v2, x• Arrays are tagged with a version set W
    45. 45. Density k0 k1 k2 k3 live at v1 v0 live(v1) = 4 live(v2) = 4 v4 live at v3 v0 live(v3) = 4 density = 4/8 v5 v4 v5 v1 v1 v2 v2 v3 v3 W={v1,v2,v3} k0, v0, x k1, v0, x k1, v2, x k2, v1, x k2, v2, x k2, v3, x k3, v1, x k3, v2, x• f(A,v) = (#elements in A live at version v) / |A|• density(A,W) = min{w in W} f(A,w)
    46. 46. Density k0 k1 k2 k3 live at v1 v0 live(v1) = 4 live(v2) = 4 v4 live at v3 v0 live(v3) = 4 density = 4/8 v5 v4 v5 v1 v1 v2 v2 v3 v3 W={v1,v2,v3} k0, v0, x k1, v0, x k1, v2, x k2, v1, x k2, v2, x k2, v3, x k3, v1, x k3, v2, x• f(A,v) = (#elements in A live at version v) / |A|• density(A,W) = min{w in W} f(A,w)• We say the array (A,W) is dense if density ≥1/5• Tradeoff: high density means good range queries, but many duplicates (imagine density 1 and density 1/N)
    47. 47. optimal bound of O(log Nv + Z/B). For much smaller rangequeries, the worst-case performance may be the same as fora point query. We now prove the amortized bound, which Range queriesapplies to smaller queries. Theorem 2. A range query at version v costs O(log Nv +Z/B) amortized I/Os. (k,*) Proof. We first consider just point queries, and amortizethe cost of lookup(k, v) over all keys live at v. Let l(k, v) be •the cost of lookup(k, v), then the amortized cost is given by imagine scanning over each accessible array k l(k, v)/Nv . • density => trivially true for large (‘voluminous’) range queries •For anfor point queries: v, Ai ) be the number of I/Os used array Ai , let l(k,in examining elements in Ai for lookup(k,v). The idea is • amortize over all k for a fixed version v • each query examines disjoint regions of the array • density implies total size examined = O(Nv log Nv)
    48. 48. Don’t worry, stay dense! • Version sets disjoint at each level -- lookups examine one array/level • merge arrays with intersecting version sets • the result of a merge might not be dense • Answer: density amplification! promote merge density amplification demote... ... {1,2} {2,3} {1,2,3} {1,3} {1} {4} {4} {4}
    49. 49. “density amplification” k0 k1 k2 k3 live(v0) = 2 v0 density = 2/11 v4 live(v0) = 2 k0 k1 k2 k3 live(v5) = 4 split 1 v5v0 v1 density = 2/4v4 v2v5 v3v1v2v3 k0 k1 k2 k3 v0 live(v4) = 2 v0 split 2 v4 live(v1) = 3 live(v2) = 3 v5 live(v3) = 3 v4 v5 v1 v1 density = 2/7 split 1 v2 v3 v2 v3 split 2
    50. 50. e- If (A, V ) also satisfies (L-live) then every split of it does (since all live elements are included), and likewise for (L- “density amplification”r- h edge). It follows that version splitting (A , V ) – whichm necessarily has no promotable versions – results in a set of arrays all of which satisfy all of the L-* conditions necessary k0 k1 k2 k3 to stay atlive(v0) = 2 level l. v0 s, density = 2/11 v4 live(v0) = 2he The main result of k3 k0 k1 k2 this process is the following. live(v5) = 4 split 1 v5al v0 v1 density = 2/4 n v4ut Lemma 3 (Promotion). T he fraction of lead elements v5 v2e, over v1 l output arrays after a version split is ≥ 1/39. al v3 v2 v3 Proof. First, we claim that under k0 k1same conditions the k2 k3 st as the version split lemma, if in addition |A| < 2M live(v4) = 2 v0 and n split 2 live(v) >= M/3 for all v, then the number of output strata = 3 v0 v4 live(v1)re live(v2) = 3 is at most 13. Consider the arrays which obey the live(v3) = 3 v5 lead o v4 v5 fraction constraint. Each has sizev1at least M/3, since at v1ng least one version is split live in it, and least half of the array is= 2/7 1 v2 density d lead, sov2at least M/6 lead keys. The total number of lead v3 v3re keys in split 2 array A is ≤ 2M , since the array itself is no theui larger than this; it follows that there can be no more than
    51. 51. O n snapshot or clone of version v to new descendant ver- ou tpu tsion v , v is registered for each array A which is currently3.9 Update boundregistered to the parent of v. T his does not require any I / Os. Update T he th rays ca ting. Theorem 1. The stratified doubling array performs up-dates to a leaf version v in a cache-oblivious O (log N v / B ) 3.10amortized I/Os. For lar Z = Ω Proof. A ssume we have at our disposal a memory buffer properof size at least B (recall that B is not known to the algo- op timarithm). T hen each array that is involved in a disk merge querieshas size at least B , so a merge of some number of arrays of a pointotal size k elements costs O (k / B ) I / Os. In the C O L A [5], applieseach element exists in exactly one array and may participatein O (log N ) merges, which immediately gives the desiredamortized bound. In the scheme described here, elements Themay exist in many arrays, and elements may participate in Z/B) amany merges at the same level (eg when an array at levell is version split and some subarrays remain at level l after Prothe version split). N evertheless, we shall prove the theorem the cos
    52. 52. O n snapshot or clone of version v to new descendant ver- ou tpu tsion v , v is registered for each array A which is currently3.9 Update boundregistered to the parent of v. T his does not require any I / Os. Update T he th rays ca ting. Theorem 1. The stratified doubling array performs up-dates to a leaf version v in a cache-oblivious O (log N v / B ) 3.10amortized I/Os. For lar Z = Ω• Not possible to use basic amortized method (some elements in Proof.arrays; somehave at ourmerged many times) A ssume we elements disposal a memory buffer proper manyof size at least B (recall that B is not known to the algo- op tima•rithm). T hen each array of merges/splits to leaddisk merge only queries Idea: charge the cost that is involved in a elements • (k,v) appears as lead in of some array -> always N total leadpoinhas size at least B , so a merge exactly 1 number of arrays of atotal size k elements costs O (k / B ) I / Os. In the C O L A [5], applies •each element exists in exactly one array andpromotion each lead element receives $c/B on may participate •in O (log N ) merges, which immediately v / B) the desired total charge for version v is O(log N givesamortized bound. In the scheme described here, elements Themay exist in many arrays, and elements may participate in Z/B) amany merges at the same level (eg when an array at levell is version split and some subarrays remain at level l after Prothe version split). N evertheless, we shall prove the theorem the cos
    53. 53. 9: return [split(r)] O n snapshot or clone of version v to new descendant ver- ou tpu t sion v , v is registered for each array A which is currently Update bound registered to the parent of v. T his does not require any I / Os.there is a version split of (A, V ), say (Ai , Vi ) for i = 1 . . . n,such that each array satisfies ( L-dense) and ( L-size) for level T he th rays cal, and Updateat most one index i for which lead(Ai ) < 3.9 there is ting.|AiTheorem 1. The stratified doubling array performs up- |/2. dates to a leaf version v in a cache-oblivious O (log N v / B ) 3.10 amortized I/Os. For larIf (A, V ) also satisfies (L-live) then every split of it does Z = Ω•(since all live elements basic amortized method (some elements in Not possible to use are included), and likewise for (L- Proof.arrays; somehave at ourmerged many times) A ssume we elements disposal a memory buffer proper manyedge). It follows that version splitting (A , V ) – which of size at least B (recall that B is not known to the algo- op tima•necessarily has no promotable versions – results in a set of only rithm). T hen each array of merges/splits to leaddisk merge Idea: charge the cost that is involved in a elementsarrays all of which satisfy all of the L-* conditions necessary queries • (k,v) appears as lead in of some array -> always N total leadpoin has size at least B , so a merge exactly 1 number of arrays ofto stay at level l. a total size k elements costs O (k / B ) I / Os. In the C O L A [5], applies • element exists in exactly one array andpromotion each lead element receives $c/B on may eachmain result of this process is the following. participateThe • in O (log N ) merges, which immediately v / B) the desired total charge for version v is O(log N gives amortized bound. In the scheme described here, elements The may exist 3 many arrays, andTelements may lead elements in (Promotion). he fraction of participate in Z/B) a Lemmaover al merges at the same level (eg when is ≥array at level many l output arrays after a version split an 1/39. l is version split and some subarrays remain at level l after Pro the version split). N evertheless, we shall prove the theorem the cos
    54. 54. Does it work?
    55. 55. Insert rate, as a function of dictionary size 1e+06 100000Inserts per second 10000 1000 100 Stratified B-tree CoW B-tree 1 10 Keys (millions) ~3 OoM
    56. 56. Range rate, as a function of dictionary size 1e+09 1e+08Reads per second 1e+07 1e+06 100000 Stratified B-tree CoW B-tree 10000 1 10 Keys (millions) ~1 OoM
    57. 57. bitbucket.org/acunu www.acunu.com/downloadApache, Apache Cassandra, Cassandra, Hadoop, and the eye andelephant logos are trademarks of the Apache Software Foundation.

    ×