1.
Compressed Membership in Automata with Compressed Labels Markus Lohrey Christian Mathissen University of Leipzig June 13, 2011Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 1 / 12
2.
Motivation Try to develop algorithms that directly work on compressed data.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
3.
Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
4.
Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed stringsLohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
5.
Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed strings Applications: ◮ all domains, where massive string data arise and have to be processed, e.g. bioinformaticsLohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
6.
Motivation Try to develop algorithms that directly work on compressed data. Goal: Beat straightforward decompress and manipulate strategy. In this talk: focus on compressed strings Applications: ◮ all domains, where massive string data arise and have to be processed, e.g. bioinformatics ◮ large (and highly compressible) strings often occur as intermediate data structures (e.g. in computational group theory, program analysis, veriﬁcation).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 2 / 12
7.
Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
8.
Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
9.
Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i .Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
10.
Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
11.
Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
12.
Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
13.
Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n. One may have |val(A)| = 2n .Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
14.
Straight-line programs Dictionary-based compression algorithms (LZ77, LZ78) exploit text repetition. Straight-line programs are a general representation for compressed strings, which covers most dictionary-based algorithms. A straight-line program (SLP) over the alphabet Γ is a sequence of deﬁnitions A = (Ai := αi )1≤i ≤n , where either αi ∈ Γ or αi = Aj Ak for some j, k < i . Alternatively: a context-free grammar that generates exactly one string. We write val(A) for the unique word generated by A. The size of an SLP A = (Ai := αi )1≤i ≤n is |A| = n. One may have |val(A)| = 2n . Thus, an SLP A can be seen as a compressed representation of val(A).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 3 / 12
15.
Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
16.
Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab:Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
17.
Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaabLohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
18.
Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter):Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
19.
Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter): ◮ From an SLP A one can compute in polynomial time LZ77(val(A)).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
20.
Straight-line programs Example: A = (A1 := b, A2 := a, Ai := Ai −1 Ai −2 for 3 ≤ i ≤ 7). Then val(A) = abaababaabaab: A3 = A2 A1 = ab A4 = A3 A2 = aba A5 = A4 A3 = abaab A6 = A5 A4 = abaababa A7 = A6 A5 = abaababaabaab Relationship to dictionary-based compression (Rytter): ◮ From an SLP A one can compute in polynomial time LZ77(val(A)). ◮ From LZ77(w ) one can compute in polynomial time an SLP A with val(A) = w .Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 4 / 12
21.
Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings:Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
22.
Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)?Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
23.
Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)? Note: The decompress-and-compare strategy does not work here. We cannot compute val(A) and val(B) in polynomial time.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
24.
Algorithms on SLP-compressed strings The mother of all algorithms on SLP -compressed strings: Plandowski 1994 The following problem can be solved in polynomial time: INPUT: SLPs A, B QUESTION: val(A) = val(B)? Note: The decompress-and-compare strategy does not work here. We cannot compute val(A) and val(B) in polynomial time. Plandowski’s algorithm uses combinatorics on words, in particular the Periodicity-Lemma of Fine and Wilf.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 5 / 12
25.
Improvements of Plandowski’s result Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara, Takeda (mid 90’s) The following problem can be solved in polynomial time (fully compressed pattern matching): INPUT: SLPs P, T QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ?Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 6 / 12
26.
Improvements of Plandowski’s result Gasieniec, Karpinski, Miyazaki, Plandowski, Rytter, Shinohara, Takeda (mid 90’s) The following problem can be solved in polynomial time (fully compressed pattern matching): INPUT: SLPs P, T QUESTION: Is val(P) a factor of val(T), i.e., ∃x, y : val(T) = x val(P) y ? The best known algorithm has a running time of O(|P| · |T|2 ) (Lifshits 2006).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 6 / 12
27.
Generalization: Compressed automata A compressed automata is an ordinary ﬁnite automaton, where every transition is labelled with an SLP.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 7 / 12
28.
Generalization: Compressed automata A compressed automata is an ordinary ﬁnite automaton, where every transition is labelled with an SLP. Example: The compressed automaton Σ Σ q A q accepts all words that have val(A) as a factor.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 7 / 12
29.
Generalization: Compressed automata A compressed automata is an ordinary ﬁnite automaton, where every transition is labelled with an SLP. Example: The compressed automaton Σ Σ q A q accepts all words that have val(A) as a factor. The size |A| of the compressed automaton A (with the set ∆ of transition triples) is |A| = |A|. (p,A,q)∈∆Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 7 / 12
30.
Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)?Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 8 / 12
31.
Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)? Plandowski, Rytter 1999 ◮ Compressed membership for compressed automata belongs to PSPACE. ◮ Compressed membership for compressed automata over a unary alphabet is NP-complete.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 8 / 12
32.
Compressed membership for compressed automata Compressed membership for compressed automata: INPUT: A compressed automaton A and an SLP B. QUESTION: val(B) ∈ L(A)? Plandowski, Rytter 1999 ◮ Compressed membership for compressed automata belongs to PSPACE. ◮ Compressed membership for compressed automata over a unary alphabet is NP-complete. Plandowski and Rytter conjectured that compressed membership for compressed automata is NP-complete (for every alphabet size).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 8 / 12
33.
Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
34.
Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
35.
Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
36.
Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab. Then period(w ) = 3 and order(w ) = 2.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
37.
Some combinatorics on words The period of the word w ∈ Σ∗ is the smallest number p such that w [k + p] = w [k] for all 1 ≤ k ≤ |w | − p (period(w ) = |w | if such a p does not exist). |w | Let order(w ) = ⌊ period(w ) ⌋. Example: Let w = abbabbab = (abb)2 ab. Then period(w ) = 3 and order(w ) = 2. For a compressed automaton A, let: order(A) = max{order(val(A)) | A occurs as a label in A} period(A) = max{period(val(A)) | A occurs as a label in A}Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 9 / 12
38.
First main result Theorem 1 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 10 / 12
39.
First main result Theorem 1 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) in time polynomial in (i) |A|, (ii) |B|, and (iii) order(A). We use the following combinatorial fact: Let u, v ∈ Σ∗ and let p be a position in v . Then there exist at most period(u) many occurrences of u in v that “touch” position p.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 10 / 12
40.
Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
41.
Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
42.
Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata). A word w = an has period 1!Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
43.
Second main result Theorem 2 For a compressed automaton A and an SLP B, we can check val(B) ∈ L(A) nondeterministically in time polynomial in (i) |A|, (ii) |B|, and (iii) period(A). Generalizes the result of Plandowski & Rytter for a unary alphabet (NP-completeness of compressed membership for compressed automata). A word w = an has period 1! The theorem is proven by a reduction to the case of a unary alphabet.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 11 / 12
44.
Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete.Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 12 / 12
45.
Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete. Conjecture 2 If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there exists an accepting run of A on val(B) (viewed as a word over the set of transition triples of A), which can be generated by an SLP of size poly(|B|, |A|).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 12 / 12
46.
Open problems and a conjecture Conjecture 1 Compressed membership for compressed automata is NP-complete. Conjecture 2 If val(B) ∈ L(A) for an SLP B and a compressed automaton A, then there exists an accepting run of A on val(B) (viewed as a word over the set of transition triples of A), which can be generated by an SLP of size poly(|B|, |A|). Conjecture 2 implies Conjecture 1: ◮ Guess nondeterministically an SLP C of polynomial size. ◮ Check (in deterministic polynomial time), whether val(C) is an accepting run of A on val(B).Lohrey, Mathissen (University of Leipzig) Compressed Membership June 13, 2011 12 / 12
Be the first to comment