Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

265 views

208 views

208 views

Published on

No Downloads

Total views

265

On SlideShare

0

From Embeds

0

Number of Embeds

21

Shares

0

Downloads

1

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Compressed Membership in Automata with Compressed Labels Markus Lohrey Christian Mathissen University of Leipzig June 16, 2011Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 1 / 12
- 2. Motivation Try to develop algorithms that directly work on compressed data.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12
- 3. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12
- 4. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed stringsLohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12
- 5. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed strings Applications: ◮ all domains, where massive string data arise and have to be processed, e.g. bioinformaticsLohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12
- 6. Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed strings Applications: ◮ all domains, where massive string data arise and have to be processed, e.g. bioinformatics ◮ large (and highly compressible) strings often occur as intermediate data structures (e.g. in computational group theory, program analysis, veriﬁcation).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 2 / 12
- 7. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
- 8. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
- 9. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i .Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
- 10. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
- 11. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
- 12. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
- 13. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n. One may have |val(A)| = 2n .Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
- 14. Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n. One may have |val(A)| = 2n . Thus, an SLP A can be seen as a compressed representation of val(A).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 3 / 12
- 15. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
- 16. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab:Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
- 17. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaabLohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
- 18. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter):Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
- 19. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter): ◮ From an SLP A one can compute in polynomial time LZ77(val(A)).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
- 20. Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter): ◮ From an SLP A one can compute in polynomial time LZ77(val(A)). ◮ From LZ77(w ) one can compute in polynomial time an SLP A with val(A) = w .Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 4 / 12
- 21. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings:Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12
- 22. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)?Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12
- 23. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)? Note: The decompress-and-compare strategy does not work here. We cannot compute val(A) and val(B) in polynomial time.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12
- 24. Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)? Note: The decompress-and-compare strategy does not work here. We cannot compute val(A) and val(B) in polynomial time. Plandowski’s algorithm uses combinatorics on words, in particular the Periodicity-Lemma of Fine and Wilf.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 5 / 12
- 25. Improvements of Plandowski’s result Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara, Takeda (mid 90’s) The following problem can be solved in polynomial time (fully compressed pattern matching): INPUT: SLPs P, T QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ?Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 6 / 12
- 26. Improvements of Plandowski’s result Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara, Takeda (mid 90’s) The following problem can be solved in polynomial time (fully compressed pattern matching): INPUT: SLPs P, T QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ? The best known algorithm has a running time of O(|P| · |T|2 ) (Lifshits 2006).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 6 / 12
- 27. Generalization: Compressed automata A compressed automata is an ordinary ﬁnite automaton, where every transition is labelled with an SLP.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 7 / 12
- 28. Generalization: Compressed automata A compressed automata is an ordinary ﬁnite automaton, where every transition is labelled with an SLP. Example: The compressed automaton Σ Σ q A q accepts all words that have val(A) as a factor.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 7 / 12
- 29. Generalization: Compressed automata A compressed automata is an ordinary ﬁnite automaton, where every transition is labelled with an SLP. Example: The compressed automaton Σ Σ q A q accepts all words that have val(A) as a factor. The size |A| of the compressed automaton A (with the set ∆ of transition triples) is |A| = |A|. (p,A,q)∈∆Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 7 / 12
- 30. Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)?Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 8 / 12
- 31. Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)? Plandowski, Rytter 1999 ◮ Compressed membership for compressed automata belongs to PSPACE. ◮ Compressed membership for compressed automata over a unary alphabet is NP-complete.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 8 / 12
- 32. Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)? Plandowski, Rytter 1999 ◮ Compressed membership for compressed automata belongs to PSPACE. ◮ Compressed membership for compressed automata over a unary alphabet is NP-complete. Plandowski and Rytter conjectured that compressed membership for compressed automata is NP-complete (for every alphabet size).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 8 / 12
- 33. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12
- 34. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12
- 35. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12
- 36. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab. Then period(w ) = 3 and order(w ) = 2.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12
- 37. Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab. Then period(w ) = 3 and order(w ) = 2. For a compressed automaton A, let: order(A) = max{order(val(A)) | A occurs as a label in A} period(A) = max{period(val(A)) | A occurs as a label in A}Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 9 / 12
- 38. First main result Theorem 1 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 10 / 12
- 39. First main result Theorem 1 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A). We use the following combinatorial fact: Let u, v ∈ Σ∗ and let p be a position in v . Then there exist at most order(u) many occurrences of u in v that “touch” position p.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 10 / 12
- 40. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12
- 41. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12
- 42. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata). A word w = an has period 1!Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12
- 43. Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata). A word w = an has period 1! The theorem is proven by a reduction to the case of a unary alphabet.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 11 / 12
- 44. Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 12 / 12
- 45. Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete. Conjecture 2 If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there exists an accepting run of A on val(B) (viewed as a word over the set of transition triples of A), which can be generated by an SLP of size poly(|B|, |A|).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 12 / 12
- 46. Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete. Conjecture 2 If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there exists an accepting run of A on val(B) (viewed as a word over the set of transition triples of A), which can be generated by an SLP of size poly(|B|, |A|). Conjecture 2 implies Conjecture 1: ◮ Guess nondeterministically an SLP C of polynomial size. ◮ Check (in deterministic polynomial time), whether val(C) is an accepting run of A on val(B).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 16, 2011 12 / 12

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment