Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Csr2011 june16 17_00_lohrey

302 views

Published on

Published in: Technology, Education
  • Be the first to comment

Csr2011 june16 17_00_lohrey

  1. 1. Compressed Membership in Automata with Compressed Labels Markus Lohrey Christian Mathissen University of Leipzig June 16, 2011Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 1 / 12
  2. 2. Motivation Try to develop algorithms that directly work on compressed data.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12
  3. 3. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12
  4. 4. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed stringsLohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12
  5. 5. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed strings Applications: ◮ all domains, where massive string data arise and have to be processed, e.g. bioinformaticsLohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12
  6. 6. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed strings Applications: ◮ all domains, where massive string data arise and have to be processed, e.g. bioinformatics ◮ large (and highly compressible) strings often occur as intermediate data structures (e.g. in computational group theory, program analysis, verification).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12
  7. 7. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
  8. 8. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
  9. 9. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i .Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
  10. 10. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
  11. 11. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
  12. 12. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
  13. 13. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n. One may have |val(A)| = 2n .Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
  14. 14. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of definitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n. One may have |val(A)| = 2n . Thus, an SLP A can be seen as a compressed representation of val(A).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
  15. 15. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
  16. 16. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab:Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
  17. 17. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaabLohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
  18. 18. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter):Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
  19. 19. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter): ◮ From an SLP A one can compute in polynomial time LZ77(val(A)).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
  20. 20. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter): ◮ From an SLP A one can compute in polynomial time LZ77(val(A)). ◮ From LZ77(w ) one can compute in polynomial time an SLP A with val(A) = w .Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
  21. 21. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings:Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12
  22. 22. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)?Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12
  23. 23. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)? Note: The decompress-and-compare strategy does not work here. We cannot compute val(A) and val(B) in polynomial time.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12
  24. 24. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)? Note: The decompress-and-compare strategy does not work here. We cannot compute val(A) and val(B) in polynomial time. Plandowski’s algorithm uses combinatorics on words, in particular the Periodicity-Lemma of Fine and Wilf.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12
  25. 25. Improvements of Plandowski’s result Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara, Takeda (mid 90’s) The following problem can be solved in polynomial time (fully compressed pattern matching): INPUT: SLPs P, T QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ?Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 6 / 12
  26. 26. Improvements of Plandowski’s result Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara, Takeda (mid 90’s) The following problem can be solved in polynomial time (fully compressed pattern matching): INPUT: SLPs P, T QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ? The best known algorithm has a running time of O(|P| · |T|2 ) (Lifshits 2006).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 6 / 12
  27. 27. Generalization: Compressed automata A compressed automata is an ordinary finite automaton, where every transition is labelled with an SLP.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 7 / 12
  28. 28. Generalization: Compressed automata A compressed automata is an ordinary finite automaton, where every transition is labelled with an SLP. Example: The compressed automaton Σ Σ q A q accepts all words that have val(A) as a factor.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 7 / 12
  29. 29. Generalization: Compressed automata A compressed automata is an ordinary finite automaton, where every transition is labelled with an SLP. Example: The compressed automaton Σ Σ q A q accepts all words that have val(A) as a factor. The size |A| of the compressed automaton A (with the set ∆ of transition triples) is |A| = |A|. (p,A,q)∈∆Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 7 / 12
  30. 30. Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)?Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 8 / 12
  31. 31. Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)? Plandowski, Rytter 1999 ◮ Compressed membership for compressed automata belongs to PSPACE. ◮ Compressed membership for compressed automata over a unary alphabet is NP-complete.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 8 / 12
  32. 32. Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)? Plandowski, Rytter 1999 ◮ Compressed membership for compressed automata belongs to PSPACE. ◮ Compressed membership for compressed automata over a unary alphabet is NP-complete. Plandowski and Rytter conjectured that compressed membership for compressed automata is NP-complete (for every alphabet size).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 8 / 12
  33. 33. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12
  34. 34. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12
  35. 35. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12
  36. 36. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab. Then period(w ) = 3 and order(w ) = 2.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12
  37. 37. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab. Then period(w ) = 3 and order(w ) = 2. For a compressed automaton A, let: order(A) = max{order(val(A)) | A occurs as a label in A} period(A) = max{period(val(A)) | A occurs as a label in A}Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12
  38. 38. First main result Theorem 1 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 10 / 12
  39. 39. First main result Theorem 1 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A). We use the following combinatorial fact: Let u, v ∈ Σ∗ and let p be a position in v . Then there exist at most order(u) many occurrences of u in v that “touch” position p.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 10 / 12
  40. 40. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12
  41. 41. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12
  42. 42. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata). A word w = an has period 1!Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12
  43. 43. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata). A word w = an has period 1! The theorem is proven by a reduction to the case of a unary alphabet.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12
  44. 44. Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 12 / 12
  45. 45. Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete. Conjecture 2 If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there exists an accepting run of A on val(B) (viewed as a word over the set of transition triples of A), which can be generated by an SLP of size poly(|B|, |A|).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 12 / 12
  46. 46. Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete. Conjecture 2 If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there exists an accepting run of A on val(B) (viewed as a word over the set of transition triples of A), which can be generated by an SLP of size poly(|B|, |A|). Conjecture 2 implies Conjecture 1: ◮ Guess nondeterministically an SLP C of polynomial size. ◮ Check (in deterministic polynomial time), whether val(C) is an accepting run of A on val(B).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 12 / 12

×