I/O Efficient Algorithms and Data Structures 3-4 (Lecture by Lars Arge)

1,429 views

Published on

In many modern applications that deal with massive data sets, communication between internal and external memory, and not actual computation time, is the bottleneck in the computation. This is due to the huge difference in access time of fast internal memory and slower external memory such as disks. In order to amortize this time over a large amount of data, disks typically read or write large blocks of contiguous data at once. This means that it is important to design algorithms with a high degree of locality in their disk access pattern, that is, algorithms where data accessed close in time is also stored close on disk. Such algorithms take advantage of block transfers by amortizing the large access time over a large number of accesses. In the area of I/O-efficient algorithms the main goal is to develop algorithms that minimize the number of block transfers (I/Os) used to solve a given problem. In these four lectures, we will cover fundamental I/O-efficient algorithms and data structures, such as sorting algorithms, search trees and priority queues. We will also discuss geometric problems, as well as geometric data structures for various range searching problems.

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,429
On SlideShare
0
From Embeds
0
Number of Embeds
297
Actions
Shares
0
Downloads
30
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

I/O Efficient Algorithms and Data Structures 3-4 (Lecture by Lars Arge)

  1. 1. I/O-efficient Algorithms and Data Structures Lars Arge Professor and center director Aarhus University ALMADA summer school August 5, 2013
  2. 2. Lars Arge 2 I/O-Model • Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory T = # output size in searching problem • We often assume that M>B2 • I/O: Movement of block between memory and disk D P M Block I/O I/O-efficient Algorithms and Data Structures
  3. 3. Lars Arge 3 Fundamental Bounds Internal External • Scanning: N • Sorting: N log N • Permuting • Searching: • Note: – Linear I/O: O(N/B) – Permuting not linear – Permuting and sorting bounds are equal in all practical cases – B factor VERY important: – Cannot sort optimally with search tree NBlog B N B N B Mlog B N NB N B N B N B Mlog }log,min{ B N B N B MNN N2log I/O-efficient Algorithms and Data Structures
  4. 4. Lars Arge 4 External Sort • Merge sort: – Create N/M memory sized sorted runs – Merge runs together M/B at a time phases using I/Os each • Distribution sort similar (but harder – split elements) )( B NO)(log M N B MO I/O-efficient Algorithms and Data Structures
  5. 5. Lower bounds • Permuting N elements according to a given permutation takes I/Os in “indivisibility” model • Sorting N elements takes I/Os in comparison model – Proved using counting arguments Lars Arge 5 I/O-efficient Algorithms and Data Structures })log,(min{ B N B N B MN )log( B N B N B M !)!())(( NBN B N B M t
  6. 6. Lars Arge 6 Cache-Oblivious Model • N, B, and M as in I/O-model – Assumed that M>B2 (tall cache assumption) • M and B not used in algorithm description • Block transfers by optimal paging strategy Analyze in two-level model Efficient on one level, efficient on all levels (of fully associative hierarchy using LRU) I/O-efficient Algorithms and Data Structures
  7. 7. Lars Arge 7 Cache-Oblivious Distribution Sort • Break into subarrays of size – Recursively sort each subarray • Distribute into buckets of size ~ – Recursively sort each bucket • Distribution performed in I/Os = N I/O-efficient Algorithms and Data Structures )log( B N B N B MO N N N MNO MNONTN NT B N B N if)( if)()(2 )( )( B N O
  8. 8. Lars Arge 8 Sorting summary • External merge or distribution sort takes I/Os – Merge-sort based on M/B-way merging – Distribution sort based on -way distribution and (complicated) split elements finding – Also cache-oblivious using -way distribution • Optimal in comparison model • lower bound in stronger indivisibility model – Holds even for permuting )log( B N B N B MO })log,(min{ B N B N B MN B M I/O-efficient Algorithms and Data Structures N
  9. 9. Distribution sweeping • Combination of distribution and plane sweep – Used to solve batched geometric problems • General idea: – Divide plane into M/B slabs with O(N/(M/B)) endpoints each – Sweep plane top-down while finding solutions involving objects in different slabs – Distribute data to M/B slabs recourse in each slab • Examples: – Orthogonal line segment intersection – Rectangle intersection – Homework Lars Arge 9 I/O-efficient Algorithms and Data Structures
  10. 10. Outline • Thursday: – Sorting upper bound (merge- and distribution-sort) – Sorting (permuting) lower bound – Cache-oblivious sorting – Batched geometric problems: Distribution sweeping • Today: – Searching (B-trees) – Batched searching (Buffer-trees) – I/O-efficient priority queues – I/O-efficient list ranking Lars Arge 10 I/O-efficient Algorithms and Data Structures
  11. 11. Lars Arge 11 – If nodes stored arbitrarily on disk Search in I/Os Rangesearch in I/Os • Binary search tree: – Standard method for search among N elements – We assume elements in leaves – Search traces at least one root-leaf path External Search Trees )(log2 NO )(log2 N )(log2 TNO I/O-efficient Algorithms and Data Structures
  12. 12. Lars Arge 12 External Search Trees • BFS blocking: – Block height – Output elements blocked Rangesearch in I/Os • Optimal: O(N/B) space and query )(log2 B )(B )(log)(log/)(log 22 NOBONO B )(log B T B N )(log B T B N I/O-efficient Algorithms and Data Structures
  13. 13. Lars Arge 13 Cache-Oblivious Search Tree • “Van Emde Boas” recursive layout: NBlog B1 B A B2 N Nlog2 1 Nlog2 1 Recursive memory layout: A B1 B2 BN I/O-efficient Algorithms and Data Structures
  14. 14. Lars Arge 14 Cache-Oblivious Search Tree • Search analysis: – Conceptually stop recursion when recursive subtrees have size and each recursive subtree fits in a block NBlog – Search path “hits” blocks)(log)(log/log NOBN BB B B B B B B )(logB I/O-efficient Algorithms and Data Structures
  15. 15. Lars Arge 15 • Maintaining BFS blocking during updates? – Balance normally maintained in search trees using rotations • Seems very difficult to maintain BFS blocking during rotation – Also need to make sure output (leaves) is blocked! Dynamic External Search Trees? x y x y I/O-efficient Algorithms and Data Structures
  16. 16. Lars Arge 16 B-trees • BFS-blocking naturally corresponds to tree with fan-out • B-trees balanced by allowing node degree to vary – Rebalancing performed by splitting and merging nodes )(B I/O-efficient Algorithms and Data Structures
  17. 17. Lars Arge 17 • (a,b)-tree uses linear space and has height Choosing a,b = each node/leaf stored in one disk block space and query (a,b)-tree • T is an (a,b)-tree (a≥2 and b≥2a-1) – All leaves on the same level and contain between a and b elements – Except for the root, all nodes have degree between a and b – Root has degree between 2 and b )(log NO a )(log B T B N )(B tree I/O-efficient Algorithms and Data Structures
  18. 18. Lars Arge 18 (a,b)-Tree Insert • Insert: Search and insert element in leaf v DO v has b+1 elements/children Split v: make nodes v’ and v’’ with and elements insert element (ref) in parent(v) (make new root if necessary) v=parent(v) • Insert touch nodes bb 2 1 ab 2 1 )(log Na v v’ v’’ 2 1b 2 1b 1b I/O-efficient Algorithms and Data Structures
  19. 19. Lars Arge 19 (2,4)-Tree Insert I/O-efficient Algorithms and Data Structures
  20. 20. 20 (a,b)-Tree Delete • Delete: Search and delete element from leaf v DO v has a-1 elements/children Fuse v with sibling v’: move children of v’ to v delete element (ref) from parent(v) (delete root if necessary) If v has >b (and ≤ a+b-1<2b) children split v v=parent(v) • Delete touch nodes)(log NO a v v 1a 12a I/O-efficient Algorithms and Data Structures Lars Arge
  21. 21. Lars Arge 21 (2,4)-Tree Delete I/O-efficient Algorithms and Data Structures
  22. 22. Lars Arge 22 • (a,b)-tree properties: – If b=2a-1 every update can cause many rebalancing operations – If b≥2a update only cause O(1) rebalancing operations amortized – If b>2a only rebalancing operations amortized * Both somewhat hard to show – If b=4a easy to show that update causes rebalance operations amortized * After split during insert a leaf contains 4a/2=2a elements * After fuse during delete a leaf contains between 2a and 5a elements (split if more than 3a between 3/2a and 5/2a) (a,b)-Tree )()( 11 2 aa OO b )log(1 NO aa insert delete (2,3)-tree I/O-efficient Algorithms and Data Structures
  23. 23. Lars Arge 23 B-tree Summary • B-trees: (a,b)-trees with a,b = – O(N/B) space – O(logB N+T/B) query – O(logB N) update • B-trees with elements in the leaves sometimes called B+-tree • Construction in I/Os – Sort elements and construct leaves – Build tree level-by-level bottom-up )(B )log( B N B N B MO I/O-efficient Algorithms and Data Structures
  24. 24. Lars Arge 24 Generalized B-tree • B-tree with branching parameter b and leaf parameter k (b,k≥8) – All leaves on same level and contain between 1/4k and k elements – Except for the root, all nodes have degree between 1/4b and b – Root has degree between 2 and b • B-tree with leaf parameter – O(N/B) space – Height – amortized leaf rebalance operations – amortized internal node rebalance operations )(log B N bO )(1 kO )log( 1 B N bkb O )(Bk I/O-efficient Algorithms and Data Structures
  25. 25. B-tree Sorting • Normally we can sort N elements in O(N log N) time with search tree – Insert all elements one-by-one (construct tree) – Output in sorted order using in-order traversal • Same algorithm using B-tree use I/Os – A factor of non-optimal • We would like to have dynamic data structure to use in algorithms I/O operations )log( NNO B )( log log B B M BO )log( B N BMB N O )log( 1 B N BMB O I/O-efficient Algorithms and Data Structures 25Lars Arge
  26. 26. 26 • Main idea: Logically group nodes together and add buffers – Insertions done in a “lazy” way – elements inserted in buffers – When a buffer runs full elements are pushed one level down – Buffer-emptying in O(M/B) I/Os every block touched constant number of times on each level inserting N elements (N/B blocks) costs I/Os)log( B N BMB N O Buffer-tree Technique B B M elements fan-out M/B )(log B N BMO I/O-efficient Algorithms and Data Structures Lars Arge
  27. 27. Lars Arge 27 • Definition: – B-tree with branching parameter and leaf parameter B – Size M buffer in each internal node • Updates: – Add time-stamp to insert/delete element – Collect B elements in memory before inserting in root buffer – Perform buffer-emptying when buffer runs full Basic Buffer-tree B M $m$ blocks M B M B M ...4 1 B I/O-efficient Algorithms and Data Structures
  28. 28. Lars Arge 28 Basic Buffer-tree • Note: – Buffer can be larger than M during recursive buffer-emptying * Elements distributed in sorted order at most M elements in buffer unsorted – Rebalancing needed when “leaf-node” buffer emptied * Leaf-node buffer-emptying only performed after all full internal node buffers are emptied $m$ blocks M B M B M ...4 1 B I/O-efficient Algorithms and Data Structures
  29. 29. Lars Arge 29 Basic Buffer-tree • Internal node buffer-empty: – Load first M (unsorted) elements into memory and sort them – Merge elements in memory with rest of (already sorted) elements – Scan through sorted list while * Removing “matching” insert/deletes * Distribute elements to child buffers – Recursively empty full child buffers • Emptying buffer of size X takes O(X/B+M/B)=O(X/B) I/Os $m$ blocks M B M B M ...4 1 I/O-efficient Algorithms and Data Structures
  30. 30. Lars Arge 30 Basic Buffer-tree • Buffer-empty of leaf node with K elements in leaves – Sort buffer as previously – Merge buffer elements with elements in leaves – Remove “matching” insert/deletes obtaining K’ elements – If K’<K then * Add K-K’ “dummy” elements and insert in “dummy” leaves Otherwise * Place K elements in leaves * Repeatedly insert block of elements in leaves and rebalance • Delete dummy leaves and rebalance when all full buffers emptied K I/O-efficient Algorithms and Data Structures
  31. 31. Lars Arge 31 Basic Buffer-tree • Invariant: Buffers of nodes on path from root to emptied leaf-node are empty • Insert rebalancing (splits) performed as in normal B-tree • Delete rebalancing: v’ buffer emptied before fuse of v – Necessary buffer emptyings performed before next dummy- block delete – Invariant maintained v vv’ v v’ v’’ I/O-efficient Algorithms and Data Structures
  32. 32. Lars Arge 32 Basic Buffer-tree • Analysis: – Not counting rebalancing, a buffer-emptying of node with X ≥ M elements (full) takes O(X/B) I/Os total full node emptying cost I/Os – Delete rebalancing buffer-emptying (non-full) takes O(M/B) I/Os cost of one split/fuse O(M/B) I/Os – During N updates * O(N/B) leaf split/fuse * internal node split/fuse Total cost of N operations: I/Os )log( B N B M B M B N O )log( B N B N B MO )log( B N B N B MO I/O-efficient Algorithms and Data Structures
  33. 33. Lars Arge 33 Basic Buffer-tree • Emptying all buffers after N insertions: Perform buffer-emptying on all nodes in BFS-order resulting full-buffer emptyings cost I/Os empty non-full buffers using O(M/B) O(N/B) I/Os • N elements can be sorted using buffer tree in I/Os )log( B N B N B MO )( B M B N O $m$ blocks M B M B M ...4 1 B )log( B N B N B MO I/O-efficient Algorithms and Data Structures
  34. 34. Lars Arge 34 • Batching of operations on B-tree using M-sized buffers – I/O updates amortized – All buffers emptied in I/Os • One-dim. rangesearch operations can also be supported in I/Os amortized – Search elements handle lazily like updates – All elements in relevant sub-trees reported during buffer-emptying – Buffer-emptying in O(X/B+T’/B), where T’ is reported elements Buffer-tree Summary )log( 1 B N BMB O )log( 1 B T B N BMB O $m$ blocks )log( B N B N B MO I/O-efficient Algorithms and Data Structures
  35. 35. Lars Arge 35 Orthogonal Line Segment Intersection • Sweep plane top-down while maintaining search tree T on vertical segments crossing sweep line (by x-coordinates) – Top endpoint of vertical segment: Insert in T – Bottom endpoint of vertical segment: Delete from T – Horizontal segment: Perform range query with x-interval on T I/Os using buffer-tree I/O-efficient Algorithms and Data Structures )log( B T B N B N B MO
  36. 36. Lars Arge 36 • Basic buffer tree can be used in external priority queue • To delete minimal element: – Empty all buffers on leftmost path – Delete elements in leftmost leaf and keep in memory – Deletion of next M minimal elements free – Inserted elements checked against minimal elements in memory • I/Os every O(M) delete amortized Buffered Priority Queue )log( B N BMB M O M4 1 )log( 1 B N BMB O )( B M B I/O-efficient Algorithms and Data Structures
  37. 37. Lars Arge 37 List Ranking • Problem: – Given N-vertex linked list stored in array – Compute rank (number in list) of each vertex • One of the simplest graph problem one can think of • Straightforward O(N) internal algorithm – Also uses O(N) I/Os in external memory • Much harder to get external algorithm 3 4 5 9 68 27 101 5 2 6 73 4 108 9 )log( B N BMB N O I/O-efficient Algorithms and Data Structures
  38. 38. Lars Arge 38 List Ranking • We will solve more general problem: – Given N-vertex linked list with edge-weights stored in array – Compute sum of weights (rank) from start for each vertex • List ranking: All edge weights one • Note: Weight stored in array entry together with edge (next vertex) 1 5 2 6 73 4 108 9 1 1 11 1 1 1 1 1 1 I/O-efficient Algorithms and Data Structures
  39. 39. Lars Arge 39 List Ranking • Algorithm: 1. Find and mark independent set of vertices 2. “Bridge-out” independent set: Add new edges 3. Recursively rank resulting list 4. “Bridge-in” independent set: Compute rank of independent set • Step 1, 2 and 4 in I/Os • Independent set of size αN for 0 < α ≤ 1 I/Os 11 111 1 1 11 1 2 2 2 1 3 4 6 8 9 102 5 7 )log( B N BMB N O )log()log())1(()( B N BMB N B N BMB N OONTNT I/O-efficient Algorithms and Data Structures
  40. 40. Lars Arge 40 List Ranking: Bridge-out/in • Obtain information (edge or rang) of successor – Make copy of original list – Sort original list by successor id – Scan original and copy together to obtain successor information – Sort modified original list by id I/Os 11 2 3 4 5 9 68 27 102 3 4 95 86 7 103 4 5 9 68 27 103 4 8 9 627 10 )log( B N BMB N O I/O-efficient Algorithms and Data Structures
  41. 41. Lars Arge 41 List Ranking: Independent Set • Easy to design randomized algorithm: – Scan list and flip a coin for each vertex – Independent set is vertices with head and successor with tails Independent set of expected size N/4 • Deterministic algorithm: – 3-color vertices (no vertex same color as predecessor/successor) – Independent set is vertices with most popular color Independent set of size at least N/3 • 3-coloring I/O algorithm)log( B N BMB N O )log( B N BMB N O )log( B N BMB N O 3 4 5 9 68 27 10 I/O-efficient Algorithms and Data Structures
  42. 42. Lars Arge 42 List Ranking: 3-coloring • Algorithm: – Consider forward and backward lists (heads/tails in two lists) – Color forward lists (except tail) alternately red and blue – Color backward lists (except tail) alternately green and blue 3-coloring 3 4 5 9 68 27 10 I/O-efficient Algorithms and Data Structures
  43. 43. Lars Arge 43 List Ranking: Forward List Coloring • Identify heads and tails • For each head, insert red element in priority-queue (priority=position) • Repeatedly: – Extract minimal element from queue – Access and color corresponding element in list – Insert opposite color element corresponding to successor in queue • Scan of list • O(N) priority-queue operations I/Os `3 4 5 9 68 27 10 )log( B N BMB N O I/O-efficient Algorithms and Data Structures
  44. 44. Lars Arge 44 List Ranking Summary • Simplest graph problem: Traverse linked list • Very easy O(N) algorithm in internal memory • Much more difficult external memory – Finding independent set via 3-coloring – Bridging vertices in/out • Permuting bound best possible – Also true for other graph problems )log( B N BMB N O })log,(min{ B N B N B MNO 3 4 5 9 68 27 101 5 2 6 73 4 108 9 I/O-efficient Algorithms and Data Structures
  45. 45. Lars Arge 45 List Ranking Summary • External list ranking algorithm similar to PRAM algorithm – Sometimes external algorithms by “PRAM algorithm simulation” • Forward list coloring algorithm example of “time forward processing” – Use external priority-queue to send information “forward in time” to vertices to be processed later • PRAM algorithm ideas and time forward processing used in many graph algorithms 3 4 5 9 68 27 10 I/O-efficient Algorithms and Data Structures
  46. 46. References • External Memory Geometric Data Structures (section 1-5) Lecture notes by L. Arge • I/O-Efficient Graph Algorithms (section 1-2) Lecture notes by N. Zeh • Cache-Oblivious Algorithms. M. Frigo, C. Leiserson, H. Prokop, and S. Ramachandran. Proc. FOCS '99. Lars Arge 46 I/O-efficient Algorithms and Data Structures
  47. 47. Summary • We discussed – Sorting – Hardness of permuting (sorting) – Batched geometric problems – Searching: B-trees, buffer-trees, priority-queue – Graph algorithms: List-ranking – Cache-oblivious algorithms • Important techniques – Multiway merging/distribution – Distribution sweeping – Multiway-trees, buffered data structures – Time-forward processing (and PRAM simulation) Lars Arge 47 I/O-efficient Algorithms and Data Structures
  48. 48. Thanks large@cs.au.dk www.madalgo.au.dk

×