SlideShare a Scribd company logo
Ultra-efficient algorithms for testing
well-parenthesised expressions
Tatiana Starikovskaya (ENS Paris)
Joint work with Eldar Fisher (Technion) and Frédéric Magniez (Paris-Diderot)
WiMLDS Paris, November 24, 2017
Pattern matching: you use it every time you search for something
More general: algorithms on strings (= sequences of characters)
My research area
My research area
Applications
• Bioinformatics
• Information Retrieval
• …
Classical approaches
• We can read the whole input
• We can afford to store linear-space
data structures
In the Big Data world, we must do better!
My research area
Streaming algorithms
We receive the input as a stream, and must
process it on-the-fly, without storing it
Property testing algorithms
We must decide if the input has a property P,
but we can read only a small part of the input
?
?
?
We need efficient algorithms for string processing!
Property testers
Wait a second! How can we make the
decision not reading the whole input?
Well, in general, we cannot…
For example, we cannot say if the input is
well-parenthesised by reading just a small
fraction of it
?
?
?
Task: We must decide if the input has a property
P, but we can read only a small part of the input
Objective: Save time
()(()())()
()(()()(()
queried parentheses
are identical
? ? ?
? ? ?
Property testers
We must
1. accept, if the input has the property P
2. reject, if the input is far from having the
property
3. accept or reject otherwise
Far = we must fix at least εn characters of the
input so that the property is satisfied
The output must be correct probability at least 2/3
?
?
?
Task: We must decide if the input has a property
P, but we can read only a small part of the input
Objective: Save time
()(()())()
()(()()(((
()(()()(()
ε = 0.2, n = 10, εn = 2
?
Well-parenthesised expressions
Dm = well-balanced strings on parentheses of m types
Task: develop a property tester that decides whether
the input is in Dm
()([]())[]([]) ()(([][)()((([]
1. It accepts all inputs that are in Dm with
probability at least 2/3
2. It rejects all inputs that are ε-far from Dm
with probability at least 2/3
Time = number of read characters!
Simplicity: simplest context-free language
Universality: any context-free language can be expressed
through it (Chomsky-Schützenberger theorem)
Practicality: processing of semi-structured documents
• Visibly pushdown languages
• Nested strings
Why is it interesting?
Dm = well-balanced strings on parentheses of m types
What do we know
()(()())()
()(()()(((
()([]())([])
()(([)()(([]
const.m =1 Alon et al.’01
m ≥ 2
Parnas et al.’03c n1/11 < T < C n2/3
c n1/5 < T < C n2/5+δ
NEW!
Dm = well-balanced strings on parentheses of m types
New tester for Dm-membership
Dm = well-balanced strings on parentheses of m types
Hmmm… does not look like a simple property to test!
Let’s start with a property tester for strawberries
()({()})([]){((([]())([])([{}]())}([])))
red
sweet
yellow seeds
simple
properties,
easy to test!
?
New tester for Dm-membership
Dm = well-balanced strings on parentheses of m types
If we replace all opening parentheses with (, and all closing
parentheses with ), the resulting string must be in D1
And we know how to test in O(1) time [Alon et al.’01]!
Not sufficient: becomes
()({()})([]){((([]())([])([{}]()))([]))}
()((()))(())((((()())(())((())()))(())))
()({{)}) ()((()))
New tester for Dm-membership
Dm = well-balanced strings on parentheses of m types
Each block is Dm-consistent = is a substring of a string in Dm
We test that the blocks are Dm-consistent by running our
Dm-test in a recursive fashion
()({()})([ ]){((([]() )([])([{}] ()))([]))}
()({()})([ ]){((([]() )([])([{}] ()))([]))}
b = n4/5 b = n4/5 b = n4/5 b = n4/5
New tester for Dm-membership
Dm = well-balanced strings on parentheses of m types
We have checked that the string is good locally, but can we
guarantee that it is good globally?
()({()})([ ]){((([]() )([])([{}] ()))([]))}
()({()})([ ]){((([]() )([])([{}] ()))([]))}
b = n4/5 b = n4/5 b = n4/5 b = n4/5
New tester for Dm-membership
Dm = well-balanced strings on parentheses of m types
Approximate matching graph: nodes = blocks, edge (B1,B2) =
many excess parentheses in block B1 must be matched with excess
parentheses in block B2
()({()})([ ]){((([]() )([])([{}] ()))([]))}
()({()})([ ]){((([]() )([])([{}] ()))([]))}
b = n4/5 b = n4/5 b = n4/5 b = n4/5
New tester for Dm-membership
Dm = well-balanced strings on parentheses of m types
()({()})([ ]){((([]() )([])([{}] ()))([]))}
()({()})([ ]){((([]() )([])([{}] ()))([]))}
1. Build an approximate matching graph
2. Run a recursive inter-block matching procedure
b = n4/5 b = n4/5 b = n4/5 b = n4/5
()({()})([ ]){((([]() )([])([{}] ())}([])))
1. Build an approximate matching graph
2. Run a recursive inter-block matching procedure
]){((([]() ))((((()() (())((((()()))))
S S w/o types D1
{e1(S) = 2
e0(S) = 4
e1(S) - excess closing parentheses
e0(S) - excess opening parentheses
T1, T2, …, Tn/b - blocks of the input
Parentheses in Ti that must be matched with parentheses in Tj
min(e0(Ti), e1(Ti+1Ti+2…Tj)) - e1(Ti+1Ti+2…Tj-1)
()({()})([ ]){((([]() )([])([{}] ())}([])))
1. Build an approximate matching graph
2. Run a recursive inter-block matching procedure
]){((([]() ))((((()() (())((((()()))))
S S w/o types D1
{e1(S) = 2
e0(S) = 4
Observation e1(S) = max{S’ - prefix of S} (n1(S’) - n0(S’))
n1(S’) = |closing parentheses in S’|
n0(S’) = |opening parentheses in S’|
Lemma By querying x2/Δ2 positions of a string S of length x,
we can compute a Δ-additive approximation of n1(S’) for any
substring S’ of S correctly w.h.p.
()({()})([ ]){((([]() )([])([{}] ())}([])))
1. Build an approximate matching graph
2. Run a recursive inter-block matching procedure
Lemma By querying x2/Δ2 positions of a string S of length x,
we can compute a Δ-additive approximation of n1(S’) for any
substring S’ of S correctly w.h.p.
Proof
Query x2/Δ2 positions of S uniformly at random
If |S’| ≤ Δ, output Δ
Otherwise, |S’| = yΔ, where y > 1
S’ contains ~yx/Δ of the queried positions
()({()})([ ]){((([]() )([])([{}] ())}([])))
1. Build an approximate matching graph
2. Run a recursive inter-block matching procedure
Lemma By querying x2/Δ2 positions of a string S of length x,
we can compute a Δ-additive approximation of n1(S’) for any
substring S’ of S correctly w.h.p.
Proof (cont.)
Xi = 1 iff the i-th queried position is a closing parenthesis
E[(Δ2/x) ⋅ Σ Xi] = (Δ2/x)⋅ n1(S’) (yx/Δ) / yΔ = n1(S’)
By additive Chernoff bound,
P[|(Δ2/x) ⋅ Σ Xi - n1(S’)| > Δ] < 2e-2
New tester for Dm-membership
1. Build an approximate matching graph
2. Run a recursive inter-block matching procedure
If we replace all opening parentheses with (, and all
closing parentheses with ), the resulting string ∈ D1
Test that the blocks are Dm-consistent by running
the test in a recursive fashion
Complexity: O(n2/5)
()({()})([ ]){((([]() )([])([{}] ())}([])))
()({()})([ ]){((([]() )([])([{}] ())}([])))
O(1)
O(√b)
b = n4/5
b = n4/5 b = n4/5 b = n4/5
O(n2/b2)
Take-home message
• Streaming or property testing settings
• We have new, ultra-efficient algorithms for string
processing
• It is enough to use a polylog space or to read a
constant number of data items in the input to solve
a problem with good guarantees
Questions? Comments?

More Related Content

What's hot

Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
meresie tesfay
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
Ravinder Singla
 
Theory of computing
Theory of computingTheory of computing
Theory of computingRanjan Kumar
 
Regular language and Regular expression
Regular language and Regular expressionRegular language and Regular expression
Regular language and Regular expression
Animesh Chaturvedi
 
Regular expression
Regular expressionRegular expression
Regular expression
Larry Nung
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
Rabia Khalid
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)bolovv
 
Python strings
Python stringsPython strings
Python strings
Mohammed Sikander
 
Formal language
Formal languageFormal language
Formal language
Rajendran
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)bolovv
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture Notes
FellowBuddy.com
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentationarnolambert
 
Regular Expression in Compiler design
Regular Expression in Compiler designRegular Expression in Compiler design
Regular Expression in Compiler design
Riazul Islam
 
2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex
PHP Conference Argentina
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
Eran Zimbler
 
FLAT Notes
FLAT NotesFLAT Notes
FLAT Notes
dilip kumar
 
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Lecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular LanguagesLecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular Languages
Marina Santini
 

What's hot (20)

Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 
Regular language and Regular expression
Regular language and Regular expressionRegular language and Regular expression
Regular language and Regular expression
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Python strings
Python stringsPython strings
Python strings
 
Formal language
Formal languageFormal language
Formal language
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture Notes
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentation
 
Regular Expression in Compiler design
Regular Expression in Compiler designRegular Expression in Compiler design
Regular Expression in Compiler design
 
2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
FLAT Notes
FLAT NotesFLAT Notes
FLAT Notes
 
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
 
Regular expression (compiler)
Regular expression (compiler)Regular expression (compiler)
Regular expression (compiler)
 
Lecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular LanguagesLecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular Languages
 

Similar to Ultra-efficient algorithms for testing well-parenthesised expressions by Tatiana Starikovskaya

Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
Dmitriy Selivanov
 
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Mail.ru Group
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
telss09
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
jainaaru59
 
Declare Your Language: Name Resolution
Declare Your Language: Name ResolutionDeclare Your Language: Name Resolution
Declare Your Language: Name Resolution
Eelco Visser
 
Declare Your Language (at DLS)
Declare Your Language (at DLS)Declare Your Language (at DLS)
Declare Your Language (at DLS)
Eelco Visser
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
Mahmoud Shiri Varamini
 
Space-efficient Feature Maps for String Alignment Kernels
Space-efficient Feature Maps for String Alignment KernelsSpace-efficient Feature Maps for String Alignment Kernels
Space-efficient Feature Maps for String Alignment Kernels
Yasuo Tabei
 
Breaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelBreaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language Model
Ssu-Rui Lee
 
php string part 4
php string part 4php string part 4
php string part 4
monikadeshmane
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
Sander Timmer
 
Testing Forest-Isomorphism in the Adjacency List Model
Testing Forest-Isomorphismin the Adjacency List ModelTesting Forest-Isomorphismin the Adjacency List Model
Testing Forest-Isomorphism in the Adjacency List Modelirrrrr
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
Dev Nath
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGGeorge Simov
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
James Wong
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
Harry Potter
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
Luis Goldster
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
Fraboni Ec
 

Similar to Ultra-efficient algorithms for testing well-parenthesised expressions by Tatiana Starikovskaya (20)

Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
 
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
 
Ch2
Ch2Ch2
Ch2
 
Declare Your Language: Name Resolution
Declare Your Language: Name ResolutionDeclare Your Language: Name Resolution
Declare Your Language: Name Resolution
 
Declare Your Language (at DLS)
Declare Your Language (at DLS)Declare Your Language (at DLS)
Declare Your Language (at DLS)
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
Space-efficient Feature Maps for String Alignment Kernels
Space-efficient Feature Maps for String Alignment KernelsSpace-efficient Feature Maps for String Alignment Kernels
Space-efficient Feature Maps for String Alignment Kernels
 
Python lecture 05
Python lecture 05Python lecture 05
Python lecture 05
 
Breaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelBreaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language Model
 
php string part 4
php string part 4php string part 4
php string part 4
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Testing Forest-Isomorphism in the Adjacency List Model
Testing Forest-Isomorphismin the Adjacency List ModelTesting Forest-Isomorphismin the Adjacency List Model
Testing Forest-Isomorphism in the Adjacency List Model
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERING
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 

More from Paris Women in Machine Learning and Data Science

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
Paris Women in Machine Learning and Data Science
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
Paris Women in Machine Learning and Data Science
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
Paris Women in Machine Learning and Data Science
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
Paris Women in Machine Learning and Data Science
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Paris Women in Machine Learning and Data Science
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
Paris Women in Machine Learning and Data Science
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Paris Women in Machine Learning and Data Science
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
Paris Women in Machine Learning and Data Science
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Paris Women in Machine Learning and Data Science
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
Paris Women in Machine Learning and Data Science
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
Paris Women in Machine Learning and Data Science
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Paris Women in Machine Learning and Data Science
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
Paris Women in Machine Learning and Data Science
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Paris Women in Machine Learning and Data Science
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
Paris Women in Machine Learning and Data Science
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Paris Women in Machine Learning and Data Science
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
Paris Women in Machine Learning and Data Science
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Paris Women in Machine Learning and Data Science
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Paris Women in Machine Learning and Data Science
 

More from Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 

Recently uploaded

Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 

Recently uploaded (20)

Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 

Ultra-efficient algorithms for testing well-parenthesised expressions by Tatiana Starikovskaya

  • 1. Ultra-efficient algorithms for testing well-parenthesised expressions Tatiana Starikovskaya (ENS Paris) Joint work with Eldar Fisher (Technion) and Frédéric Magniez (Paris-Diderot) WiMLDS Paris, November 24, 2017
  • 2. Pattern matching: you use it every time you search for something More general: algorithms on strings (= sequences of characters) My research area
  • 3. My research area Applications • Bioinformatics • Information Retrieval • … Classical approaches • We can read the whole input • We can afford to store linear-space data structures In the Big Data world, we must do better!
  • 4. My research area Streaming algorithms We receive the input as a stream, and must process it on-the-fly, without storing it Property testing algorithms We must decide if the input has a property P, but we can read only a small part of the input ? ? ? We need efficient algorithms for string processing!
  • 5. Property testers Wait a second! How can we make the decision not reading the whole input? Well, in general, we cannot… For example, we cannot say if the input is well-parenthesised by reading just a small fraction of it ? ? ? Task: We must decide if the input has a property P, but we can read only a small part of the input Objective: Save time ()(()())() ()(()()(() queried parentheses are identical ? ? ? ? ? ?
  • 6. Property testers We must 1. accept, if the input has the property P 2. reject, if the input is far from having the property 3. accept or reject otherwise Far = we must fix at least εn characters of the input so that the property is satisfied The output must be correct probability at least 2/3 ? ? ? Task: We must decide if the input has a property P, but we can read only a small part of the input Objective: Save time ()(()())() ()(()()((( ()(()()(() ε = 0.2, n = 10, εn = 2 ?
  • 7. Well-parenthesised expressions Dm = well-balanced strings on parentheses of m types Task: develop a property tester that decides whether the input is in Dm ()([]())[]([]) ()(([][)()((([] 1. It accepts all inputs that are in Dm with probability at least 2/3 2. It rejects all inputs that are ε-far from Dm with probability at least 2/3 Time = number of read characters!
  • 8. Simplicity: simplest context-free language Universality: any context-free language can be expressed through it (Chomsky-Schützenberger theorem) Practicality: processing of semi-structured documents • Visibly pushdown languages • Nested strings Why is it interesting? Dm = well-balanced strings on parentheses of m types
  • 9. What do we know ()(()())() ()(()()((( ()([]())([]) ()(([)()(([] const.m =1 Alon et al.’01 m ≥ 2 Parnas et al.’03c n1/11 < T < C n2/3 c n1/5 < T < C n2/5+δ NEW! Dm = well-balanced strings on parentheses of m types
  • 10. New tester for Dm-membership Dm = well-balanced strings on parentheses of m types Hmmm… does not look like a simple property to test! Let’s start with a property tester for strawberries ()({()})([]){((([]())([])([{}]())}([]))) red sweet yellow seeds simple properties, easy to test! ?
  • 11. New tester for Dm-membership Dm = well-balanced strings on parentheses of m types If we replace all opening parentheses with (, and all closing parentheses with ), the resulting string must be in D1 And we know how to test in O(1) time [Alon et al.’01]! Not sufficient: becomes ()({()})([]){((([]())([])([{}]()))([]))} ()((()))(())((((()())(())((())()))(()))) ()({{)}) ()((()))
  • 12. New tester for Dm-membership Dm = well-balanced strings on parentheses of m types Each block is Dm-consistent = is a substring of a string in Dm We test that the blocks are Dm-consistent by running our Dm-test in a recursive fashion ()({()})([ ]){((([]() )([])([{}] ()))([]))} ()({()})([ ]){((([]() )([])([{}] ()))([]))} b = n4/5 b = n4/5 b = n4/5 b = n4/5
  • 13. New tester for Dm-membership Dm = well-balanced strings on parentheses of m types We have checked that the string is good locally, but can we guarantee that it is good globally? ()({()})([ ]){((([]() )([])([{}] ()))([]))} ()({()})([ ]){((([]() )([])([{}] ()))([]))} b = n4/5 b = n4/5 b = n4/5 b = n4/5
  • 14. New tester for Dm-membership Dm = well-balanced strings on parentheses of m types Approximate matching graph: nodes = blocks, edge (B1,B2) = many excess parentheses in block B1 must be matched with excess parentheses in block B2 ()({()})([ ]){((([]() )([])([{}] ()))([]))} ()({()})([ ]){((([]() )([])([{}] ()))([]))} b = n4/5 b = n4/5 b = n4/5 b = n4/5
  • 15. New tester for Dm-membership Dm = well-balanced strings on parentheses of m types ()({()})([ ]){((([]() )([])([{}] ()))([]))} ()({()})([ ]){((([]() )([])([{}] ()))([]))} 1. Build an approximate matching graph 2. Run a recursive inter-block matching procedure b = n4/5 b = n4/5 b = n4/5 b = n4/5
  • 16. ()({()})([ ]){((([]() )([])([{}] ())}([]))) 1. Build an approximate matching graph 2. Run a recursive inter-block matching procedure ]){((([]() ))((((()() (())((((()())))) S S w/o types D1 {e1(S) = 2 e0(S) = 4 e1(S) - excess closing parentheses e0(S) - excess opening parentheses T1, T2, …, Tn/b - blocks of the input Parentheses in Ti that must be matched with parentheses in Tj min(e0(Ti), e1(Ti+1Ti+2…Tj)) - e1(Ti+1Ti+2…Tj-1)
  • 17. ()({()})([ ]){((([]() )([])([{}] ())}([]))) 1. Build an approximate matching graph 2. Run a recursive inter-block matching procedure ]){((([]() ))((((()() (())((((()())))) S S w/o types D1 {e1(S) = 2 e0(S) = 4 Observation e1(S) = max{S’ - prefix of S} (n1(S’) - n0(S’)) n1(S’) = |closing parentheses in S’| n0(S’) = |opening parentheses in S’| Lemma By querying x2/Δ2 positions of a string S of length x, we can compute a Δ-additive approximation of n1(S’) for any substring S’ of S correctly w.h.p.
  • 18. ()({()})([ ]){((([]() )([])([{}] ())}([]))) 1. Build an approximate matching graph 2. Run a recursive inter-block matching procedure Lemma By querying x2/Δ2 positions of a string S of length x, we can compute a Δ-additive approximation of n1(S’) for any substring S’ of S correctly w.h.p. Proof Query x2/Δ2 positions of S uniformly at random If |S’| ≤ Δ, output Δ Otherwise, |S’| = yΔ, where y > 1 S’ contains ~yx/Δ of the queried positions
  • 19. ()({()})([ ]){((([]() )([])([{}] ())}([]))) 1. Build an approximate matching graph 2. Run a recursive inter-block matching procedure Lemma By querying x2/Δ2 positions of a string S of length x, we can compute a Δ-additive approximation of n1(S’) for any substring S’ of S correctly w.h.p. Proof (cont.) Xi = 1 iff the i-th queried position is a closing parenthesis E[(Δ2/x) ⋅ Σ Xi] = (Δ2/x)⋅ n1(S’) (yx/Δ) / yΔ = n1(S’) By additive Chernoff bound, P[|(Δ2/x) ⋅ Σ Xi - n1(S’)| > Δ] < 2e-2
  • 20. New tester for Dm-membership 1. Build an approximate matching graph 2. Run a recursive inter-block matching procedure If we replace all opening parentheses with (, and all closing parentheses with ), the resulting string ∈ D1 Test that the blocks are Dm-consistent by running the test in a recursive fashion Complexity: O(n2/5) ()({()})([ ]){((([]() )([])([{}] ())}([]))) ()({()})([ ]){((([]() )([])([{}] ())}([]))) O(1) O(√b) b = n4/5 b = n4/5 b = n4/5 b = n4/5 O(n2/b2)
  • 21. Take-home message • Streaming or property testing settings • We have new, ultra-efficient algorithms for string processing • It is enough to use a polylog space or to read a constant number of data items in the input to solve a problem with good guarantees Questions? Comments?