Successfully reported this slideshow.                                           Upcoming SlideShare
×

# Purely Functional Data Structures for On-Line LCA

This talk improves the known asymptotic complexity of online lowest common ancestor search from O(h) to O(log h), opening the door to new uses in distributed computing and version control.

• Full Name
Comment goes here.

Are you sure you want to Yes No ### Purely Functional Data Structures for On-Line LCA

1. 1. Purely Functional Data Structures for On-line LCA Edward Kmett
2. 2. OverviewThe Lowest Common Ancestor (LCA) ProblemTarjan’s Off-line LCAOff-line Tree-Like LCAOff-line Range-Min LCANaïve On-line LCAData Structures from Number SystemsSkew-Binary Random Access ListsSkew-Binary On-line LCA
3. 3. The Lowest Common Ancestor ProblemGiven a tree, and two nodes in the tree, find the lowestentry in the tree that is an ancestor to both. A B E C D F G I H J
4. 4. The Lowest Common Ancestor ProblemGiven a tree and two nodes in the tree, find the lowestentry in the tree that is an ancestor to both.Applications: Computing Dominators in Flow Graphs Three-Way Merge Algorithms in Revision Control Common Word Roots/Suffixes Range-Min Query (RMQ) problems Computing Distance in a Tree …
5. 5. The Lowest Common Ancestor ProblemGiven a tree and two nodes in the tree, find the lowestentry in the tree that is an ancestor to both.First formalized by Aho, Hopcraft, and Ullman in 1973.They provided ephemeral on-line and off-line versions ofthe problem in terms of two operations, with their off-lineversion of the algorithm requiring O(n log*(n)) and theironline version requiring O(n log n) steps.Research has largely focused on the off-line versions ofthis problem where you are given the entire tree a priori.
6. 6. cons, link, or grow?The original formulation of LCA was in terms of twooperations link x y which grafts an unattached tree xon as a child of y, and lca x y which computes thelowest common ancestor of x and y.Alternately, we can work with lca x y and cons ay, which returns a new extended version of the path ygrown downward with the globally unique node ID a, andWe can replace cons a y with a monadic grow y, whichtracks the variable supply internally. By using a concurrentvariable supply like the one supplied by the concurrent-supply package enables you to grow the tree in parallel.
7. 7. Tarjan’s Off-line LCAIn 1979, Robert Tarjan found a way to compute apredetermined set of distinct LCA queries at the sametime given the complete tree by creatively using disjoint-set forests in O(nα(n)). (This is stronger condition than the usual offline problemstatement.) TarjanOLCA(u) function MakeSet(u); u.ancestor := u; for each v in u.children do TarjanOLCA(v); Union(u,v); Find(u).ancestor := u; u.colour := black; for each v such that {u,v} in P do if v.colour == black print "The LCA of “+u+" and “+v+" is " + Find(v).ancestor;
8. 8. Tarjan’s Off-line LCAIn 1979, Robert Tarjan found a way to compute apredetermined set of distinct LCA queries at the sametime given the complete tree by creatively using disjoint-set forests in O(nα(n)).In 1983, Harold Gabow and Robert Tarjan improved theasymptotics of the preceding algorithm to O(n) by notingspecial-case opportunities not available in generalpurpose disjoint-set forest problems.
9. 9. Tree-Like Off-line LCAIn 1984, Dov Harel and Robert Tarjan provided the firstasymptotically optimal off-line solution, which converts thetree in O(n) into a structure that can be queried in O(1).In 1988, Baruch Scheiber and Uzi Vishkin simplified thatstructure, by building arbitrary-fanout trees out of pathsand binary trees, and providing fast indexing into eachcase.
10. 10. Range-Min Off-line LCAIn 1993, Omer Berkman and Uzi Vishkin found anotherconversion with the same O(n) preprocessing using anEuler tour to convert the tree structure into a Range-Minstructure, that can be queried in O(1) time.This was improved in 2000 by Michael Bender and MartinFarach-Colton.Alstrup, Gavoille, Kaplan and Rauhe focused ondistributing this algorithm.Fischer and Heun reduced the memory requirements, butalso show logarithmically slower RMQ algorithms areoften faster the common problem sizes of today!
11. 11. Backup Plans
12. 12. Naïve On-line LCABuild paths as lists of node IDs, using cons as you go. x = [5,4,3,2,1] :# 5 y = [6,3,2,1] :# 4To compute lca x y, first cut both lists to have the samelength. x’ = [4,3,2,1], y’ = [6,3,2,1], len = 4Then keep dropping elements from both until the IDsmatch. lca x y = [3,2,1] :# 3
13. 13. Naïve On-line LCANo preprocessing step.O(h) LCA query time where h is the length of the path.O(1) to extend a path.No need to store the entire tree, just the paths you arecurrently using. This helps with distribution andparallelization.As an on-line algorithm, the tree can grow withoutrequiring costly recalculations.
14. 14. Naïve On-line LCATo go faster we’d need to extract a common suffix insublinear time. Very Well…
15. 15. Data Structures from Number SystemsWe are already familiar with at least one data structurederived from a number system. data Nat = Zero | Succ Nat data List a = Nil | Cons a (List a) O(1) succ grants us O(1) cons
16. 16. Binary Random-Access ListsWe could construct a data structure from binary numbersas well, where you have a linked list of “flags” with 2nelements in them.However, adding 1 to a binary number can affect all log ndigits in the number, yielding O(log n) cons.
17. 17. Skew-Binary Numbers 15 7 3 1 0 1 2 1 0The nth digit has value2n+1-1, and each 1 1digit has a value of 0,1, or 2. 1 2 2 0We only allow a single 2 in the 1 0 0number, which must be the first non-zero 1 0 1digit. 1 0 2 1 1 0Every natural number can be uniquely 1 1 1represented by this scheme. 1 1 2 1 2 0succ is an O(1) operation. 2 0 0There are 2n+1-1 nodes in a complete tree 1 0 0 0of height n.
18. 18. Skew-Binary Random Access Lists We store a linked list of complete trees, where we are allowed to have two trees of the same size at the front of the list, but after that all trees are of strictly increasing height.data Tree a = Tip a | Bin a (Tree a) (Tree a)data Path a = Nil | Cons !Int !Int (Tree a) (Path a)length :: Path a -> Intlength Nil = 0length (Cons n _ _ _) = n I call these random-access lists a Path here, because of our use case.
19. 19. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match.
20. 20. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match.
21. 21. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match. 1
22. 22. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match. 2 1
23. 23. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match. 3 2 1
24. 24. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match. 4 3 2 1
25. 25. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match. 5 4 3 2 1
26. 26. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match. 6 3 5 4 2 1
27. 27. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match. 7 6 3 5 4 2 1
28. 28. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match. 8 7 6 3 5 4 2 1
29. 29. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match.-- O(1)cons :: a -> Path a -> Path acons a (Cons n w t (Cons _ w t2 ts)) | w == w = Cons (n + 1) (2 * w + 1) (Bin a t t2) tscons a ts = Cons (length ts + 1) 1 (Tip a) ts
30. 30. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match.lca :: Eq a => Path a -> Path a -> Path alca xs ys = case compare nxs nys of LT -> lca xs (keep nxs ys) EQ -> lca xs ys GT -> lca (keep nys xs) ys where nxs = length xs nys = length ys
31. 31. Skew-Binary KeepO(log (h - k)) to keep the top k elements of path of heighth keep 2 (fromList [6,5,4,3,2,1]) 6 3 5 4 2 1
32. 32. Skew-Binary KeepO(log (h - k)) to keep the top k elements of path of heighth keep 2 (fromList [6,5,4,3,2,1]) = keep 2 (fromList [3,2,1]) 6 3 5 4 2 1
33. 33. Skew-Binary KeepO(log (h - k)) to keep the top k elements of path of heighth keep 2 (fromList [6,5,4,3,2,1]) 6 3 5 4 2 1
34. 34. Skew-Binary Keep O(log (h - k)) to keep the top k elements of path of height hkeep :: Int -> Path a -> Path akeep _ Nil = Nilkeep k xs@(Cons n w t ts) | k >= n = xs | otherwise = case compare k (n - w) of GT -> keepT (k - n + w) w t ts EQ -> ts LT -> keep k tsconsT :: Int -> Tree a -> Path a -> Path aconsT w t ts = Cons (w + length ts) w t tskeepT :: Int -> Int -> Tree a -> Path a -> Path akeepT n w (Bin _ l r) ts = case compare n w2 of LT -> keepT n w2 r ts EQ -> consT w2 r ts GT | n == w - 1 -> consT w2 l (consT w2 r ts) | otherwise -> keepT (n - w2) w2 l (consT w2 r ts) where w2 = div w 2keepT _ _ _ ts = ts
35. 35. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. To compute lca x y, first cut both lists to have the same length. Then keep dropping elements until the IDs match.lca :: Eq a => Path a -> Path a -> Path alca xs ys = case compare nxs nys of LT -> lca xs (keep nxs ys) EQ -> lca xs ys GT -> lca (keep nys xs) ys where nxs = length xs nys = length ys
36. 36. Comparing Node IDs We can check to see if two paths have the same head or are both empty in O(1).infix 4 ~=(~=) :: Eq a => Path a -> Path a -> BoolNil ~= Nil = TrueCons _ _ s _ ~= Cons _ _ t _ = sameT s t_ ~= _ = FalsesameT :: Eq a => Tree a -> Tree a -> BoolsameT xs ys = root xs == root ysroot :: Tree a -> aroot (Tip a) = aroot (Bin a _ _) = a
37. 37. MonotonicityWe can modify the algorithmfor keep into an algorithm thattakes any monotone predicatethat only transitions from Falseto True once during the walkup the path and yields a resultin O(log h)We have exactly one shape for a given number of elements,so we can walk the spine of the two random access lists atthe same time in lock-step. This lets us, modify this algorithmto work with a pair of paths, because the shapes agree.(~=) is monotone given using globally unique IDs.
38. 38. Finding the Match lca’ requires the invariant that both paths have the same length. This is provided by the fact that lca, shown earlier, trims the lists first.lca :: Eq a => Path a -> Path a -> Path alca h@(Cons _ w x xs) (Cons _ _ y ys) | sameT x y = h | xs ~= ys = lcaT w x y xs | otherwise = lca xs yslca _ _ = NillcaT :: Eq a => Int -> Tree a -> Tree a -> Path a -> Path alcaT w (Bin _ la ra) (Bin _ lb rb) ts | sameT la lb = consT w2 la (consT w2 ra ts) | sameT ra rb = lcaT w2 la lb (consT w ra ts) | otherwise = lcaT w2 ra rb ts where w2 = div w 2lcaT _ _ _ ts = ts
39. 39. Skew-Binary On-line LCANaïve On-line LCA: Build paths as lists of node IDs, using cons as you go. O(1) To compute lca x y, first cut both lists to have the same length. O(h) Then keep dropping elements until the IDs match. O(h)Skew-Binary On-line LCA: Build paths as lists of node IDs, using cons as you go. O(1) To compute lca x y, first cut both lists to have the same length. O(log h) Then keep dropping elements until the IDs match. O(log h)
40. 40. Skew-Binary On-line LCANo preprocessing step.O(log h) LCA query time where h is the length of the path.O(1) to extend a path.No need to store the entire tree, just the paths you are currentlyusing. This helps with distribution and parallelization whenworking on large trees.As an on-line algorithm, the tree can grow without requiringcostly recalculations.Preserves all of the benefits of the naïve algorithm, whiledrastically reducing the costs.
41. 41. Now What?We found that skew-binary random access lists can be used toaccelerate the naïve online LCA algorithm while retaining thedesirable properties.You can install a working version of this algorithm from hackage cabal install lcaNext time I’ll talk about the applications of this algorithm to a“revision control” monad which can be used for parallel andincremental computation in Haskell.I am working with Daniel Peebles on a proof of correctness andasymptotic performance in Agda.
42. 42. Any Questions?