SlideShare a Scribd company logo
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithmics on SLP-compressed strings
Antonis Antonopoulos
CoReLab
National Technical University of Athens
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Introduction
Introduction
In many areas, large string data have to be not only stored
in compressed form, but the initial data has to be
processed and analyzed as well.
Design of algorithms that operate directly on the
compressed data.
Decompress-and-solve strategy needs many resources.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Introduction
Introduction
A compressed representation of a string makes
regularities in the string explicit. These regularities may
be exploited in a second phase for speeding up an
algorithm.
So, we need a mathematical model for compressed
representations of strings.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Introduction
Introduction
A compressed representation of a string makes
regularities in the string explicit. These regularities may
be exploited in a second phase for speeding up an
algorithm.
So, we need a mathematical model for compressed
representations of strings.
Such a model should have two properties:
Cover many compression schemes used in practice
Be mathematically easy to handle
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Introduction
Straight Line Programs
Definition (Straight Line Programs)
A Straight Line Program (SLP) over the terminal alphabet Σ is
a context-free grammar A = (V, Σ, S, P), P ⊆ V × (V ∪ Σ)∗ such
that:
1 For every A ∈ V there exists exactly one production of the
form (A, α) ∈ P.
2 The relation {(A, B) | (A, α) ∈ P, B ∈ alph(α)} is acyclic.
The size of an SLP is |A| =
∑
(A,α)∈P |α|.
The (singleton) language generated by A is denoted
eval(A).
Let alph(s) be the set of symbols occuring in s.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Introduction
Straight Line Programs
Example (Fibonacci Words)
Let SLP A over the terminal alphabet {a, b} with the following
productions:
A1 → b
A2 → a
Ai → Ai−1Ai−2, for 3 ≤ i ≤ 7
The starting symbol is A7.
Then eval(A) = abaababaabaab, the 7th Fibonacci Word.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Introduction
Straight Line Programs
SLPs can capture all the usual compression methods. For
example:
Theorem
From the LZ77-factorization of a given string w ∈ Σ∗, we can
compute an SLP of size O
(
log |w|
m · m
)
for w in time
O
(
log |w|
m · m
)
, where m is the number of factors in the
LZ77-factorization of w.
Also, we can easily design polynomial-time algorithms to
compute |eval(A)| and eval(A)[i], given an SLP A.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Introduction
The Smallest Grammar Problem
Given a string, what is the smallest SLP for it?
This is a Kolmogorov Complexity (decidable) variant.
Let opt(w) the size of a minimal SLP for w, that is, an SLP
A with eval(A) = w, |A| = opt(w) and for every SLP B with
eval(B) = w, |B| ≥ |A|.
Definition
Given a string w, compute a minimal SLP for w.
Theorem (Approximation of an SLP)
There is a O (log |Σ| · n)-time algorithm that computes for a
given word w ∈ Σ∗ of length n an SLP A such that eval(A) = w
and |A| ≤ O
(
log n
opt(w)
)
· opt(w).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Introduction
The Smallest Grammar Problem
Theorem
Unless P = NP there is no polynomial time algorithm with the
following properties:
The input consists of a string w over some alphabet Σ.
The output is an SLP A such that eval(A) = w and
|A| ≤ 8569
8568 · opt(w).
The proof uses a reduction from the vertex cover problem
for graphs with max degree 3, which is hard to
approximate below a ration of 145
144 , unless P = NP.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
Compressed Equality Checking
Definition (Compressed Equality Checking)
Given two SLPs A and B, is eval(A) = eval(B)?
The algorithms for equality checking use combinatorial
properties of strings, such as the periodicity lemma.
Some results on sequential and parallel models:
Theorem
Compressed Equality Checking can be solved in
O
(
(|A| + |B|)2
)
.
Theorem
Compressed Equality Checking belongs to coRNC2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
Compressed Hamming Distance
Let dH(a, b) denote the Hamming Distance of a, b ∈ Σ∗.
(the numbers of symbols that a and b differ).
Theorem
The function dH(eval(A), eval(B)), where A, B are SLPs, is
#P-complete.
Proof:
Let:
G(S, T, y) =
{
“yes” , if Ty ̸= Sy
“no” , otherwise
G ∈ FP.
We will reduce the #P-complete problem #SUBSET SUM
to dH using an 1-Turing Reduction.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
Compressed Hamming Distance
Proof:
#SUBSET SUM asks, given W = {w1, . . . , wn}, t in binary,
the number of W′s subsets with elements summing up to
t, that is, the number of x ∈ {0, 1}n for which
x · (w1, . . . , wn) = x · w = t.
Let s =
∑n
i=1 wi.
Consider the texts:
T = (0t
10s−t
)2n
S = ⃝x∈{0,1}n (0x·w
10s−x·w
)
Notice that dH(T, S) is exactly two times the number of
W’s subsets that are not equal to t.
We can easily construct an SLP B such that eval(B) = T.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
Compressed Hamming Distance
Consider the following SLP A with the following rules:
A1 → 10s+w1
1
Ak+1 → Ak0s−sk+wk+1 Ak (1 ≤ k ≤ n − 1)
Using induction, we can prove that
eval(A) = ⃝x∈{0,1}n (0x·w10s−x·w) = S
The size of A is polynomial in the length of the binary
encoding of w.
Thus we can compute the answer to #SUBSET SUM as
2n − 1
2 dH(T, S).
□
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
Fully Compressed Pattern Matching
In its most general form, the Pattern Matching Problem
asks for given strings T and P, if P is a factor of T.
Many linear-time algorithms for the uncompressed case
(e.g. Knuth-Morris-Pratt).
Definition (Fully Compressed Pattern Matching)
Given two SLPs P and T, is eval(P) a factor of eval(T)?
An important observation that implies most algorithms is
that if a pattern p is a factor of eval(T), then there exists a
production X → YZ in eval(T) such that p has an
“occurence” in evalT(X) = evalT(Y)evalT(Z) that touches
the cut between evalT(Y) and evalT(Z).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
Fully Compressed Pattern Matching
It is a consequence of the Periodicity Lemmma (Fine,
Wilf, 1965) that the set of all starting positions of the
occurences of p in evalT(X) that touch the cut of X forms
an arithmetic progression.
Lifshit’s algorithm, for example, computes for every
nonterminal A of the pattern SLP P and each nonterminal
X of the text SLP T the arithmetic progression
corresponding to the occurences of evalP(A) in evalT(X)
that touch the cut of X.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
Fully Compressed Pattern Matching
These arithmetic progression can be computed
bottom-up, resulting in overall time bound O
(
|P|2|T|
)
.
Jez’s algorithm beats that time bound with
O ((|T| + |P|) log(|eval(P)|) log(|P| + |T|)) = O
(
n2 log n
)
For uncompressed patterns, we can achieve a bound of
O (|p| · ||T||).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
Subsequence Problems
In many applications, especially in Computational
Biology, approximate occurences of a pattern are more
relavant than exact matches.
Subsequence Problems consist very useful similarity
measures between sequences.
Definition (Fully Compressed Subsequence Problem)
Given two SLPs P and T, is eval(P) a subsequence of eval(T)?
Theorem
The Fully Compressed Subsequence Problem is in PSPACE and
it is PP-hard.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
Querying Compressed Texts
Definition (Compressed Querying)
Given an SLP A, a binary-coded number i, 1 ≤ i ≤ |eval(A)|
and a symbol a ∈ Σ, does eval(A)[i] = a hold?
Theorem
Compressed Querying is P-complete.
The proof of the aforementioned result is a logspace
reduction from the P-complete problem “super
increasing subset sum”.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
References I
Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy,
Manoj Prabhakaran, Amit Sahai, and Abhi Shelat.
The smallest grammar problem.
IEEE Trans. Information Theory, 51(7):2554–2576, 2005.
Dan Gusfield.
Algorithms on Strings, Trees, and Sequences: Computer
Science and Computational Biology.
Cambridge University Press, New York, NY, USA, 1997.
Yury Lifshits.
Processing compressed texts: A tractability border.
In Combinatorial Pattern Matching, 18th Annual
Symposium, CPM 2007, London, Canada, July 9-11, 2007,
Proceedings, pages 228–240, 2007.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems
Algorithms and Hardness for Compressed Problems
References II
Markus Lohrey.
Word problems and membership problems on
compressed words.
SIAM J. Comput., 35(5):1210–1240, 2006.
Markus Lohrey.
Algorithmics on SLP-compressed strings: A survey.
Groups Complexity Cryptology, 4(2):241–299, 2012.
Markus Lohrey.
Equality testing of compressed strings.
In Combinatorics on Words - 10th International
Conference, WORDS 2015, Kiel, Germany, September
14-17, 2015, Proceedings, pages 14–26, 2015.

More Related Content

What's hot

Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
Christian Robert
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
Christian Robert
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
Hyungjoo Cho
 
Chemistry Assignment Help
Chemistry Assignment Help Chemistry Assignment Help
Chemistry Assignment Help
Edu Assignment Help
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
Christian Robert
 
Algorithms and Complexity: Cryptography Theory
Algorithms and Complexity: Cryptography TheoryAlgorithms and Complexity: Cryptography Theory
Algorithms and Complexity: Cryptography Theory
Alex Prut
 
Harmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningHarmonic Analysis and Deep Learning
Harmonic Analysis and Deep Learning
Sungbin Lim
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
NYC Predictive Analytics
 
Computability - Tractable, Intractable and Non-computable Function
Computability - Tractable, Intractable and Non-computable FunctionComputability - Tractable, Intractable and Non-computable Function
Computability - Tractable, Intractable and Non-computable Function
Reggie Niccolo Santos
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
Christian Robert
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Chapter 23 aoa
Chapter 23 aoaChapter 23 aoa
Chapter 23 aoa
Hanif Durad
 
Chapter 26 aoa
Chapter 26 aoaChapter 26 aoa
Chapter 26 aoa
Hanif Durad
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
Sungbin Lim
 
Chapter 25 aoa
Chapter 25 aoaChapter 25 aoa
Chapter 25 aoa
Hanif Durad
 
Quantum Algorithms and Lower Bounds in Continuous Time
Quantum Algorithms and Lower Bounds in Continuous TimeQuantum Algorithms and Lower Bounds in Continuous Time
Quantum Algorithms and Lower Bounds in Continuous Time
David Yonge-Mallo
 

What's hot (20)

Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
Chemistry Assignment Help
Chemistry Assignment Help Chemistry Assignment Help
Chemistry Assignment Help
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
Algorithms and Complexity: Cryptography Theory
Algorithms and Complexity: Cryptography TheoryAlgorithms and Complexity: Cryptography Theory
Algorithms and Complexity: Cryptography Theory
 
Harmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningHarmonic Analysis and Deep Learning
Harmonic Analysis and Deep Learning
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
 
Computability - Tractable, Intractable and Non-computable Function
Computability - Tractable, Intractable and Non-computable FunctionComputability - Tractable, Intractable and Non-computable Function
Computability - Tractable, Intractable and Non-computable Function
 
Assignment 2 solution acs
Assignment 2 solution acsAssignment 2 solution acs
Assignment 2 solution acs
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Back tracking
Back trackingBack tracking
Back tracking
 
Np cooks theorem
Np cooks theoremNp cooks theorem
Np cooks theorem
 
Chapter 23 aoa
Chapter 23 aoaChapter 23 aoa
Chapter 23 aoa
 
Chapter 26 aoa
Chapter 26 aoaChapter 26 aoa
Chapter 26 aoa
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
Chapter 25 aoa
Chapter 25 aoaChapter 25 aoa
Chapter 25 aoa
 
Chapter 16
Chapter 16Chapter 16
Chapter 16
 
Quantum Algorithms and Lower Bounds in Continuous Time
Quantum Algorithms and Lower Bounds in Continuous TimeQuantum Algorithms and Lower Bounds in Continuous Time
Quantum Algorithms and Lower Bounds in Continuous Time
 

Similar to Algorithmics on SLP-compressed strings

Algorithms on Strings
Algorithms on StringsAlgorithms on Strings
Algorithms on Strings
Michael Soltys
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
Bipul Roy Bpl
 
Overview_of_a_computational_model_-_Languages_1.ppt
Overview_of_a_computational_model_-_Languages_1.pptOverview_of_a_computational_model_-_Languages_1.ppt
Overview_of_a_computational_model_-_Languages_1.ppt
SanthoshS508159
 
Automata
AutomataAutomata
Automata
Gaditek
 
Automata
AutomataAutomata
Automata
Gaditek
 
1 introduction
1 introduction1 introduction
1 introduction
parmeet834
 
01.ppt
01.ppt01.ppt
01.ppt
EyobYotor
 
Design and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpDesign and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment Help
Programming Homework Help
 
Design and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam HelpDesign and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam Help
Programming Exam Help
 
How Hard Can a Problem Be ?
How Hard Can a Problem Be ?How Hard Can a Problem Be ?
How Hard Can a Problem Be ?
Ahmed Saeed
 
Csr2011 june14 11_00_aaronson
Csr2011 june14 11_00_aaronsonCsr2011 june14 11_00_aaronson
Csr2011 june14 11_00_aaronson
CSR2011
 
Module 1 TOC.pptx
Module 1 TOC.pptxModule 1 TOC.pptx
Module 1 TOC.pptx
MohitJain21BCE1523
 
Introduction (1).pdf
Introduction (1).pdfIntroduction (1).pdf
Introduction (1).pdf
ShivareddyGangam
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
vsssuresh
 
Theory of Computation "Chapter 1, introduction"
Theory of Computation "Chapter 1, introduction"Theory of Computation "Chapter 1, introduction"
Theory of Computation "Chapter 1, introduction"
Ra'Fat Al-Msie'deen
 
Average Polynomial Time Complexity Of Some NP-Complete Problems
Average Polynomial Time Complexity Of Some NP-Complete ProblemsAverage Polynomial Time Complexity Of Some NP-Complete Problems
Average Polynomial Time Complexity Of Some NP-Complete Problems
Audrey Britton
 
Introduction to the computing theory in automata
Introduction to the computing theory in automataIntroduction to the computing theory in automata
Introduction to the computing theory in automata
AbubakarSadiq69
 
Unit -I Toc.pptx
Unit -I Toc.pptxUnit -I Toc.pptx
Unit -I Toc.pptx
viswanath kani
 
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurEnd semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Vivekananda Samiti
 

Similar to Algorithmics on SLP-compressed strings (20)

Algorithms on Strings
Algorithms on StringsAlgorithms on Strings
Algorithms on Strings
 
Mod1.pdf
Mod1.pdfMod1.pdf
Mod1.pdf
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 
Overview_of_a_computational_model_-_Languages_1.ppt
Overview_of_a_computational_model_-_Languages_1.pptOverview_of_a_computational_model_-_Languages_1.ppt
Overview_of_a_computational_model_-_Languages_1.ppt
 
Automata
AutomataAutomata
Automata
 
Automata
AutomataAutomata
Automata
 
1 introduction
1 introduction1 introduction
1 introduction
 
01.ppt
01.ppt01.ppt
01.ppt
 
Design and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpDesign and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment Help
 
Design and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam HelpDesign and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam Help
 
How Hard Can a Problem Be ?
How Hard Can a Problem Be ?How Hard Can a Problem Be ?
How Hard Can a Problem Be ?
 
Csr2011 june14 11_00_aaronson
Csr2011 june14 11_00_aaronsonCsr2011 june14 11_00_aaronson
Csr2011 june14 11_00_aaronson
 
Module 1 TOC.pptx
Module 1 TOC.pptxModule 1 TOC.pptx
Module 1 TOC.pptx
 
Introduction (1).pdf
Introduction (1).pdfIntroduction (1).pdf
Introduction (1).pdf
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
Theory of Computation "Chapter 1, introduction"
Theory of Computation "Chapter 1, introduction"Theory of Computation "Chapter 1, introduction"
Theory of Computation "Chapter 1, introduction"
 
Average Polynomial Time Complexity Of Some NP-Complete Problems
Average Polynomial Time Complexity Of Some NP-Complete ProblemsAverage Polynomial Time Complexity Of Some NP-Complete Problems
Average Polynomial Time Complexity Of Some NP-Complete Problems
 
Introduction to the computing theory in automata
Introduction to the computing theory in automataIntroduction to the computing theory in automata
Introduction to the computing theory in automata
 
Unit -I Toc.pptx
Unit -I Toc.pptxUnit -I Toc.pptx
Unit -I Toc.pptx
 
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurEnd semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
 

Recently uploaded

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 

Recently uploaded (20)

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 

Algorithmics on SLP-compressed strings

  • 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithmics on SLP-compressed strings Antonis Antonopoulos CoReLab National Technical University of Athens
  • 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Introduction Introduction In many areas, large string data have to be not only stored in compressed form, but the initial data has to be processed and analyzed as well. Design of algorithms that operate directly on the compressed data. Decompress-and-solve strategy needs many resources.
  • 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Introduction Introduction A compressed representation of a string makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. So, we need a mathematical model for compressed representations of strings.
  • 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Introduction Introduction A compressed representation of a string makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. So, we need a mathematical model for compressed representations of strings. Such a model should have two properties: Cover many compression schemes used in practice Be mathematically easy to handle
  • 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Introduction Straight Line Programs Definition (Straight Line Programs) A Straight Line Program (SLP) over the terminal alphabet Σ is a context-free grammar A = (V, Σ, S, P), P ⊆ V × (V ∪ Σ)∗ such that: 1 For every A ∈ V there exists exactly one production of the form (A, α) ∈ P. 2 The relation {(A, B) | (A, α) ∈ P, B ∈ alph(α)} is acyclic. The size of an SLP is |A| = ∑ (A,α)∈P |α|. The (singleton) language generated by A is denoted eval(A). Let alph(s) be the set of symbols occuring in s.
  • 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Introduction Straight Line Programs Example (Fibonacci Words) Let SLP A over the terminal alphabet {a, b} with the following productions: A1 → b A2 → a Ai → Ai−1Ai−2, for 3 ≤ i ≤ 7 The starting symbol is A7. Then eval(A) = abaababaabaab, the 7th Fibonacci Word.
  • 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Introduction Straight Line Programs SLPs can capture all the usual compression methods. For example: Theorem From the LZ77-factorization of a given string w ∈ Σ∗, we can compute an SLP of size O ( log |w| m · m ) for w in time O ( log |w| m · m ) , where m is the number of factors in the LZ77-factorization of w. Also, we can easily design polynomial-time algorithms to compute |eval(A)| and eval(A)[i], given an SLP A.
  • 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Introduction The Smallest Grammar Problem Given a string, what is the smallest SLP for it? This is a Kolmogorov Complexity (decidable) variant. Let opt(w) the size of a minimal SLP for w, that is, an SLP A with eval(A) = w, |A| = opt(w) and for every SLP B with eval(B) = w, |B| ≥ |A|. Definition Given a string w, compute a minimal SLP for w. Theorem (Approximation of an SLP) There is a O (log |Σ| · n)-time algorithm that computes for a given word w ∈ Σ∗ of length n an SLP A such that eval(A) = w and |A| ≤ O ( log n opt(w) ) · opt(w).
  • 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Introduction The Smallest Grammar Problem Theorem Unless P = NP there is no polynomial time algorithm with the following properties: The input consists of a string w over some alphabet Σ. The output is an SLP A such that eval(A) = w and |A| ≤ 8569 8568 · opt(w). The proof uses a reduction from the vertex cover problem for graphs with max degree 3, which is hard to approximate below a ration of 145 144 , unless P = NP.
  • 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems Compressed Equality Checking Definition (Compressed Equality Checking) Given two SLPs A and B, is eval(A) = eval(B)? The algorithms for equality checking use combinatorial properties of strings, such as the periodicity lemma. Some results on sequential and parallel models: Theorem Compressed Equality Checking can be solved in O ( (|A| + |B|)2 ) . Theorem Compressed Equality Checking belongs to coRNC2 .
  • 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems Compressed Hamming Distance Let dH(a, b) denote the Hamming Distance of a, b ∈ Σ∗. (the numbers of symbols that a and b differ). Theorem The function dH(eval(A), eval(B)), where A, B are SLPs, is #P-complete. Proof: Let: G(S, T, y) = { “yes” , if Ty ̸= Sy “no” , otherwise G ∈ FP. We will reduce the #P-complete problem #SUBSET SUM to dH using an 1-Turing Reduction.
  • 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems Compressed Hamming Distance Proof: #SUBSET SUM asks, given W = {w1, . . . , wn}, t in binary, the number of W′s subsets with elements summing up to t, that is, the number of x ∈ {0, 1}n for which x · (w1, . . . , wn) = x · w = t. Let s = ∑n i=1 wi. Consider the texts: T = (0t 10s−t )2n S = ⃝x∈{0,1}n (0x·w 10s−x·w ) Notice that dH(T, S) is exactly two times the number of W’s subsets that are not equal to t. We can easily construct an SLP B such that eval(B) = T.
  • 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems Compressed Hamming Distance Consider the following SLP A with the following rules: A1 → 10s+w1 1 Ak+1 → Ak0s−sk+wk+1 Ak (1 ≤ k ≤ n − 1) Using induction, we can prove that eval(A) = ⃝x∈{0,1}n (0x·w10s−x·w) = S The size of A is polynomial in the length of the binary encoding of w. Thus we can compute the answer to #SUBSET SUM as 2n − 1 2 dH(T, S). □
  • 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems Fully Compressed Pattern Matching In its most general form, the Pattern Matching Problem asks for given strings T and P, if P is a factor of T. Many linear-time algorithms for the uncompressed case (e.g. Knuth-Morris-Pratt). Definition (Fully Compressed Pattern Matching) Given two SLPs P and T, is eval(P) a factor of eval(T)? An important observation that implies most algorithms is that if a pattern p is a factor of eval(T), then there exists a production X → YZ in eval(T) such that p has an “occurence” in evalT(X) = evalT(Y)evalT(Z) that touches the cut between evalT(Y) and evalT(Z).
  • 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems Fully Compressed Pattern Matching It is a consequence of the Periodicity Lemmma (Fine, Wilf, 1965) that the set of all starting positions of the occurences of p in evalT(X) that touch the cut of X forms an arithmetic progression. Lifshit’s algorithm, for example, computes for every nonterminal A of the pattern SLP P and each nonterminal X of the text SLP T the arithmetic progression corresponding to the occurences of evalP(A) in evalT(X) that touch the cut of X.
  • 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems Fully Compressed Pattern Matching These arithmetic progression can be computed bottom-up, resulting in overall time bound O ( |P|2|T| ) . Jez’s algorithm beats that time bound with O ((|T| + |P|) log(|eval(P)|) log(|P| + |T|)) = O ( n2 log n ) For uncompressed patterns, we can achieve a bound of O (|p| · ||T||).
  • 17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems Subsequence Problems In many applications, especially in Computational Biology, approximate occurences of a pattern are more relavant than exact matches. Subsequence Problems consist very useful similarity measures between sequences. Definition (Fully Compressed Subsequence Problem) Given two SLPs P and T, is eval(P) a subsequence of eval(T)? Theorem The Fully Compressed Subsequence Problem is in PSPACE and it is PP-hard.
  • 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems Querying Compressed Texts Definition (Compressed Querying) Given an SLP A, a binary-coded number i, 1 ≤ i ≤ |eval(A)| and a symbol a ∈ Σ, does eval(A)[i] = a hold? Theorem Compressed Querying is P-complete. The proof of the aforementioned result is a logspace reduction from the P-complete problem “super increasing subset sum”.
  • 19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems References I Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem. IEEE Trans. Information Theory, 51(7):2554–2576, 2005. Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, NY, USA, 1997. Yury Lifshits. Processing compressed texts: A tractability border. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007, Proceedings, pages 228–240, 2007.
  • 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The Smallest Grammar Problem Algorithms and Hardness for Compressed Problems Algorithms and Hardness for Compressed Problems References II Markus Lohrey. Word problems and membership problems on compressed words. SIAM J. Comput., 35(5):1210–1240, 2006. Markus Lohrey. Algorithmics on SLP-compressed strings: A survey. Groups Complexity Cryptology, 4(2):241–299, 2012. Markus Lohrey. Equality testing of compressed strings. In Combinatorics on Words - 10th International Conference, WORDS 2015, Kiel, Germany, September 14-17, 2015, Proceedings, pages 14–26, 2015.