SlideShare a Scribd company logo
A High-Performance Approach to String
Similarity using Most Frequent K Characters
”Correctness and Completeness”
Andre Valdestilhas and Axel-Cyrille Ngonga Ngomo
Universit¨at Leipzig, AKSW {lastname}@informatik.uni-leipzig.de
1 Correctness and Completeness
In this section, we prove formally that our MFKC does so on any pair of datasets.
We achieve this goal by proving that our MFKC is both correct and complete,
where:
– We say that an approach is correct if the output O it returns is such that
O ⊆ R(Ds, Dt, σ, θ).
– Approaches are said to be complete if their output O is a superset of R(Ds, Dt, σ, θ),
i.e., O ⊇ R(Ds, Dt, σ, θ).
Our MFKC consists of three nested filters, each of which creates a subset of
pairs.
A ⊆ L ⊆ N ⊆ Ds × Dt (1)
For the purpose of clearness, we name each filtering rule:
R1 ⇔
h1(s, k)k + |t|
|s| + |t|
< θ
R2 ⇔ |h(s, k) ∩ h(t, k)| = 0
R3 ⇔ σ(s, t) ≥ θ
Each subset of our MFKC can be redefined as follows:
N = { s, t /∈ Ds × Dt : R1} (2)
L = { s, t ∈ Ds × Dt : R1 ∧ R2} (3)
A = { s, t ∈ Ds × Dt : R1 ∧ R2 ∧ R3} (4)
We then introduce A∗
as the set of pairs whose similarity score is more or
equal than the threshold θ.
A∗
= { s, t ∈ Ds × Dt : σ(s, t) ≥ θ} = { s, t ∈ Ds × Dt : R3} (5)
Lemma 1. Our MFKC is correct and complete.
Proof. This lemma is equivalent to showing that A = A∗
. Let us consider all the
pairs in A. Our MFKC’s correctness follows directly from eq. (4). All we need
to show now is our MFKC’s completeness, i.e. that none of pairs discarded by
the filters actually belongs to A∗
.
Assuming that the hashes are sorted in a descending way according the fre-
quencies of the characters, therefore the first element of each hash has the highest
frequency. Therefore, once we have
h1(s, k)k + |t|
|s| + |t|
< θ,
the pair of strings s and t can be discarded without calculating the entire simi-
larity.
When the rule R3 applies, we have σ(s, t) < θ.
Which leads to R3 ⇒ R1 thus, set A can be rewritten as:
A = { s, t ∈ Ds × Dt : R2 ∧ R3} (6)
We are given two strings s and t and the respective hashes h(s, k) and h(t, k),
the intersection between the characters of these two hashes is a empty set. There-
fore, there is no character matching, which implies that s and t cannot be con-
sidered to have a similarity score greater or equal the threshold θ:
h(s, k) ∩ h(t, k) = ∅ ⇒ σ(s, t) = 0
When the rule R3 applies, we have σ(s, t) < θ.
Which leads to R3 ⇒ R2 thus, set A can be rewritten as:
A = { s, t ∈ Ds × Dt : R3} (7)
which is the definition of A∗
eq. (5). Thus, A = A∗
.

More Related Content

What's hot

Quick sort
Quick sortQuick sort
Quick sort
AreenGaur
 
Knuth morris pratt string matching algo
Knuth morris pratt string matching algoKnuth morris pratt string matching algo
Knuth morris pratt string matching algo
sabiya sabiya
 
Chapter -3 Logarithms
Chapter -3 LogarithmsChapter -3 Logarithms
Chapter -3 Logarithms
GpmMaths
 
Rabin Karp Algorithm
Rabin Karp AlgorithmRabin Karp Algorithm
Rabin Karp Algorithm
Sohail Ahmed
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
Mahdi Esmailoghli
 
Find Transitive Closure Using Floyd-Warshall Algorithm
Find Transitive Closure Using Floyd-Warshall AlgorithmFind Transitive Closure Using Floyd-Warshall Algorithm
Find Transitive Closure Using Floyd-Warshall Algorithm
Rajib Roy
 
String matching Algorithm by Foysal
String matching Algorithm by FoysalString matching Algorithm by Foysal
String matching Algorithm by Foysal
Foysal Mahmud
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
Dr Shashikant Athawale
 
String matching algorithm
String matching algorithmString matching algorithm
String matching algorithm
Alokeparna Choudhury
 
The gamma function
The gamma functionThe gamma function
The gamma function
Fatemeh Habibi
 
String matching algorithms-pattern matching.
String matching algorithms-pattern matching.String matching algorithms-pattern matching.
String matching algorithms-pattern matching.
Swapan Shakhari
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
NETAJI SUBHASH ENGINEERING COLLEGE , KOLKATA
 
25 String Matching
25 String Matching25 String Matching
25 String Matching
Andres Mendez-Vazquez
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithm
Kamal Nayan
 
Inverse laplacetransform
Inverse laplacetransformInverse laplacetransform
Inverse laplacetransform
Tarun Gehlot
 
Naive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer ScienceNaive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer Science
Transweb Global Inc
 
Kmp
KmpKmp
String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.
Malek Sumaiya
 
String searching
String searching String searching
String searching
thinkphp
 
Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatching
dbhanumahesh
 

What's hot (20)

Quick sort
Quick sortQuick sort
Quick sort
 
Knuth morris pratt string matching algo
Knuth morris pratt string matching algoKnuth morris pratt string matching algo
Knuth morris pratt string matching algo
 
Chapter -3 Logarithms
Chapter -3 LogarithmsChapter -3 Logarithms
Chapter -3 Logarithms
 
Rabin Karp Algorithm
Rabin Karp AlgorithmRabin Karp Algorithm
Rabin Karp Algorithm
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Find Transitive Closure Using Floyd-Warshall Algorithm
Find Transitive Closure Using Floyd-Warshall AlgorithmFind Transitive Closure Using Floyd-Warshall Algorithm
Find Transitive Closure Using Floyd-Warshall Algorithm
 
String matching Algorithm by Foysal
String matching Algorithm by FoysalString matching Algorithm by Foysal
String matching Algorithm by Foysal
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
String matching algorithm
String matching algorithmString matching algorithm
String matching algorithm
 
The gamma function
The gamma functionThe gamma function
The gamma function
 
String matching algorithms-pattern matching.
String matching algorithms-pattern matching.String matching algorithms-pattern matching.
String matching algorithms-pattern matching.
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
 
25 String Matching
25 String Matching25 String Matching
25 String Matching
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithm
 
Inverse laplacetransform
Inverse laplacetransformInverse laplacetransform
Inverse laplacetransform
 
Naive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer ScienceNaive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer Science
 
Kmp
KmpKmp
Kmp
 
String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.
 
String searching
String searching String searching
String searching
 
Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatching
 

Viewers also liked

Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Christophe Debruyne
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Muhammad Saleem
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Francesco Osborne
 
Towards maintainable constraint validation and repair for taxonomies: The Poo...
Towards maintainable constraint validation and repair for taxonomies: The Poo...Towards maintainable constraint validation and repair for taxonomies: The Poo...
Towards maintainable constraint validation and repair for taxonomies: The Poo...
Monika Solanki
 
Knowledge Extraction and Linked Data: Playing with Frames
Knowledge Extraction and Linked Data: Playing with FramesKnowledge Extraction and Linked Data: Playing with Frames
Knowledge Extraction and Linked Data: Playing with Frames
Valentina Presutti
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Ivan Ermilov
 
Interoperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldInteroperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT world
Monika Solanki
 
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
SisInfLab-SWoT @Politecnico di Bari
 
Enabling combined Software and Data engineering at Web-scale
Enabling combined Software and Data engineering at Web-scaleEnabling combined Software and Data engineering at Web-scale
Enabling combined Software and Data engineering at Web-scale
Monika Solanki
 
A replication study of the top performing systems in SemEval twitter sentimen...
A replication study of the top performing systems in SemEval twitter sentimen...A replication study of the top performing systems in SemEval twitter sentimen...
A replication study of the top performing systems in SemEval twitter sentimen...
Raphael Troncy
 
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Chris Bizer
 
DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13
Bradley Allen
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
Deloitte United States
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018
Brian Solis
 

Viewers also liked (14)

Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
 
Towards maintainable constraint validation and repair for taxonomies: The Poo...
Towards maintainable constraint validation and repair for taxonomies: The Poo...Towards maintainable constraint validation and repair for taxonomies: The Poo...
Towards maintainable constraint validation and repair for taxonomies: The Poo...
 
Knowledge Extraction and Linked Data: Playing with Frames
Knowledge Extraction and Linked Data: Playing with FramesKnowledge Extraction and Linked Data: Playing with Frames
Knowledge Extraction and Linked Data: Playing with Frames
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
Interoperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldInteroperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT world
 
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
 
Enabling combined Software and Data engineering at Web-scale
Enabling combined Software and Data engineering at Web-scaleEnabling combined Software and Data engineering at Web-scale
Enabling combined Software and Data engineering at Web-scale
 
A replication study of the top performing systems in SemEval twitter sentimen...
A replication study of the top performing systems in SemEval twitter sentimen...A replication study of the top performing systems in SemEval twitter sentimen...
A replication study of the top performing systems in SemEval twitter sentimen...
 
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
 
DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018
 

Similar to Iswc 2016 completeness correctude

Bag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinBag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse Codin
Karlos Svoboda
 
MC0082 –Theory of Computer Science
MC0082 –Theory of Computer ScienceMC0082 –Theory of Computer Science
MC0082 –Theory of Computer Science
Aravind NC
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)
inventionjournals
 
Regular expression
Regular expressionRegular expression
Regular expression
Rajon
 
leanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerleanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL Reasoner
Adriano Melo
 
Common fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesCommon fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spaces
Alexander Decker
 
Improved helsgaun
Improved helsgaunImproved helsgaun
Improved helsgaun
Kaal Nath
 
Unequal-Cost Prefix-Free Codes
Unequal-Cost Prefix-Free CodesUnequal-Cost Prefix-Free Codes
Unequal-Cost Prefix-Free Codes
Daniel Bulhosa Solórzano
 
Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.
Alexander Decker
 
04_AJMS_254_19.pdf
04_AJMS_254_19.pdf04_AJMS_254_19.pdf
04_AJMS_254_19.pdf
BRNSS Publication Hub
 
Lec12
Lec12Lec12
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet SeriesOn Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
IOSR Journals
 
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
BRNSS Publication Hub
 
Lecture10 Signal and Systems
Lecture10 Signal and SystemsLecture10 Signal and Systems
Lecture10 Signal and Systems
babak danyal
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
Programming Exam Help
 
Query optimization in database
Query optimization in databaseQuery optimization in database
Query optimization in database
Rakesh Kumar
 
Design and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpDesign and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment Help
Programming Homework Help
 
Master Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksMaster Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural Networks
Alina Leidinger
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sort
ijfcstjournal
 
20070823
2007082320070823
20070823
neostar
 

Similar to Iswc 2016 completeness correctude (20)

Bag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinBag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse Codin
 
MC0082 –Theory of Computer Science
MC0082 –Theory of Computer ScienceMC0082 –Theory of Computer Science
MC0082 –Theory of Computer Science
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)
 
Regular expression
Regular expressionRegular expression
Regular expression
 
leanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerleanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL Reasoner
 
Common fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesCommon fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spaces
 
Improved helsgaun
Improved helsgaunImproved helsgaun
Improved helsgaun
 
Unequal-Cost Prefix-Free Codes
Unequal-Cost Prefix-Free CodesUnequal-Cost Prefix-Free Codes
Unequal-Cost Prefix-Free Codes
 
Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.
 
04_AJMS_254_19.pdf
04_AJMS_254_19.pdf04_AJMS_254_19.pdf
04_AJMS_254_19.pdf
 
Lec12
Lec12Lec12
Lec12
 
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet SeriesOn Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
 
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
 
Lecture10 Signal and Systems
Lecture10 Signal and SystemsLecture10 Signal and Systems
Lecture10 Signal and Systems
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
 
Query optimization in database
Query optimization in databaseQuery optimization in database
Query optimization in database
 
Design and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpDesign and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment Help
 
Master Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksMaster Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural Networks
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sort
 
20070823
2007082320070823
20070823
 

More from André Valdestilhas

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdf
André Valdestilhas
 
Presentation MSE 2022
Presentation MSE 2022Presentation MSE 2022
Presentation MSE 2022
André Valdestilhas
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
André Valdestilhas
 
Eswc2018 wimu slides
Eswc2018 wimu slidesEswc2018 wimu slides
Eswc2018 wimu slides
André Valdestilhas
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017
André Valdestilhas
 
WASOTA (poster)
WASOTA (poster)WASOTA (poster)
WASOTA (poster)
André Valdestilhas
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
André Valdestilhas
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
André Valdestilhas
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resources
André Valdestilhas
 
Using semiotic profile
Using semiotic profileUsing semiotic profile
Using semiotic profile
André Valdestilhas
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applications
André Valdestilhas
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
André Valdestilhas
 

More from André Valdestilhas (12)

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdf
 
Presentation MSE 2022
Presentation MSE 2022Presentation MSE 2022
Presentation MSE 2022
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
 
Eswc2018 wimu slides
Eswc2018 wimu slidesEswc2018 wimu slides
Eswc2018 wimu slides
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017
 
WASOTA (poster)
WASOTA (poster)WASOTA (poster)
WASOTA (poster)
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resources
 
Using semiotic profile
Using semiotic profileUsing semiotic profile
Using semiotic profile
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applications
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
 

Recently uploaded

Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
suyashempire
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
RAYMUNDONAVARROCORON
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
Lattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptxLattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptx
DrRajeshDas
 
一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理
gyhwyo
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
fatima132662
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Nutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptxNutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptx
vimalveerammal
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 

Recently uploaded (20)

Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
Lattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptxLattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptx
 
一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Nutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptxNutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptx
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 

Iswc 2016 completeness correctude

  • 1. A High-Performance Approach to String Similarity using Most Frequent K Characters ”Correctness and Completeness” Andre Valdestilhas and Axel-Cyrille Ngonga Ngomo Universit¨at Leipzig, AKSW {lastname}@informatik.uni-leipzig.de 1 Correctness and Completeness In this section, we prove formally that our MFKC does so on any pair of datasets. We achieve this goal by proving that our MFKC is both correct and complete, where: – We say that an approach is correct if the output O it returns is such that O ⊆ R(Ds, Dt, σ, θ). – Approaches are said to be complete if their output O is a superset of R(Ds, Dt, σ, θ), i.e., O ⊇ R(Ds, Dt, σ, θ). Our MFKC consists of three nested filters, each of which creates a subset of pairs. A ⊆ L ⊆ N ⊆ Ds × Dt (1) For the purpose of clearness, we name each filtering rule: R1 ⇔ h1(s, k)k + |t| |s| + |t| < θ R2 ⇔ |h(s, k) ∩ h(t, k)| = 0 R3 ⇔ σ(s, t) ≥ θ Each subset of our MFKC can be redefined as follows: N = { s, t /∈ Ds × Dt : R1} (2) L = { s, t ∈ Ds × Dt : R1 ∧ R2} (3) A = { s, t ∈ Ds × Dt : R1 ∧ R2 ∧ R3} (4) We then introduce A∗ as the set of pairs whose similarity score is more or equal than the threshold θ. A∗ = { s, t ∈ Ds × Dt : σ(s, t) ≥ θ} = { s, t ∈ Ds × Dt : R3} (5)
  • 2. Lemma 1. Our MFKC is correct and complete. Proof. This lemma is equivalent to showing that A = A∗ . Let us consider all the pairs in A. Our MFKC’s correctness follows directly from eq. (4). All we need to show now is our MFKC’s completeness, i.e. that none of pairs discarded by the filters actually belongs to A∗ . Assuming that the hashes are sorted in a descending way according the fre- quencies of the characters, therefore the first element of each hash has the highest frequency. Therefore, once we have h1(s, k)k + |t| |s| + |t| < θ, the pair of strings s and t can be discarded without calculating the entire simi- larity. When the rule R3 applies, we have σ(s, t) < θ. Which leads to R3 ⇒ R1 thus, set A can be rewritten as: A = { s, t ∈ Ds × Dt : R2 ∧ R3} (6) We are given two strings s and t and the respective hashes h(s, k) and h(t, k), the intersection between the characters of these two hashes is a empty set. There- fore, there is no character matching, which implies that s and t cannot be con- sidered to have a similarity score greater or equal the threshold θ: h(s, k) ∩ h(t, k) = ∅ ⇒ σ(s, t) = 0 When the rule R3 applies, we have σ(s, t) < θ. Which leads to R3 ⇒ R2 thus, set A can be rewritten as: A = { s, t ∈ Ds × Dt : R3} (7) which is the definition of A∗ eq. (5). Thus, A = A∗ .