SlideShare a Scribd company logo
Property Matching and Query Expansion on
Linked Data Using Kullback-Leibler Divergence
Sean Golliher, Nathan Fortier, Logan Perreault

December 12, 2013

1 / 25
Property Matching Problem

Databases with different properties:

2 / 25
def: Query Expansion

Query expansion (QE) is the process of reformulating a seed
query to improve retrieval performance in information retrieval
operations.

3 / 25
Societal Cloud

4 / 25
Cloud Diagram (TRIZ Problem Solving)

5 / 25
Cloud Diagram Broken

6 / 25
Property Matching Problem

How do we find all actors in both databases?
Don’t want to manually inspect all databases
Can we use SPARQL query language to infer across all datasets?
SELECT ?p
WHERE { s ?p o }
Can only match total sizes of returned triple sets

7 / 25
Original Bayesian Approach

Problems with Bayesian Approach
Had to create, and track, a large vocabulary for training
Smoothing issues with very sparse text
Underflow issues – small confidence values
Complexity of likelihood was growing:
n different features in feature set X and c classes + tunable parameters.

8 / 25
KL-Divergence

Original paper from 1951 entitled “On Information and Sufficiency”
Also referred to as“relative entropy”
A system gains entropy when it moves to a state with more possible
arrangements. For example, a liquid to a gas.
Used in paper from 2003 for text categorization:
”Using KL-Distance for Text Categorization
Elegant and efficient method for plagiarism detection

9 / 25
KL-Divergence

Measure of divergence of information between two distributions:
D(P

Q) =

P(x) log
x∈X

P(x)
Q(x)

Not symmetric

10 / 25
KL-Divergence Example

11 / 25
KL-Divergence Example

Table : Generic Vocabularies Generated by Fixing on Predicates

d1

d2

d3

subject1
object1
object2
subject2
object3
object3

subject3
object4

subject1
object1
object2
subject4
object3

subject2
object3

ex: D(d1 d2 ) = 1 log 1/5 + 1 log 1/5 + ........ + 2 log 2/5
5
0
5
0
5
1/4
tf( subject1 ) is 1/5 in d1 and 0 in d2 – using value for now

12 / 25
Algorithm Description

13 / 25
Formal Problem Statement

Given:
Two databases DB1 and DB2
A predicate p1 ∈ DB1
An object type S1 where some triple “s p1 o exists in D1
where s ∈ S1

Find predicate p2 in DB2 where p2 is equivilant to p1

14 / 25
High Level Description

Create a document d1 containing labels of all objects linked
by p1
Find an object type S2 ∈ d2 where S1 is equivilant to S2
For each predicate p2 used by S2 create a document d2
containing labels of all objects linked by p2
Remove stop words and language tags from d1 and d2
For each document compute the normalized KL-Divergence,
KLD ∗ (d1 , d2 )
Return predicate corresponding to the document with the
lowest KL-Divergence

15 / 25
Algorithm 1 FindPredicate(DB1 , DB2 , p1 , S1 )
Create document d1 containing labels of all objects linked by p1
Find an object type S2 ∈ d2 where S1 is equivilant to S2
for each predicate p2 used by S2 do
Create document d2 containing labels of all objects linked by p2
end for
Remove stop words and language tags from d1 and d2
min ← 1
for each predicate pi used by S2 do
k ← KLD ∗ (d1 , di )
if k < min then
min ← k
pmap ← pi
end if
end for
return pmap

16 / 25
Computing KL-Divergence
KL-Divergence is computed as
(P(tk , di ) − P(tk , dj )) × log

KLD(di , dj ) =
k∈V

Where
P(tk , di ) =

tf (tk , di )
x∈di tf (tx , dj )

P(tk , di )
(1)
P(tk , dj )

(2)

If tk does not occur in di then P(tk , di ) ←
KL-Divergence is then normalized as follows:
KLD ∗ (di , dj ) =

KLD(di , dj )
KLD(di , 0)

(3)

17 / 25
Algorithm 2 tf (tk , di )
tf ← 0
for each term tx in di do
if sim(tk , tx ) > τ then
tf ← tf + 1
end if
end for
return tf

18 / 25
Experimental Results

19 / 25
Experimental Results

20 / 25
Experimental Results

21 / 25
Experimental Results

22 / 25
Experimental Results

23 / 25
Experimental Results

24 / 25
Questions?

25 / 25

More Related Content

What's hot

CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306Yasuo Tabei
 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_fariaPaulo Faria
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
Carlos Castillo (ChaTo)
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
Yasuo Tabei
 
Functional programming
Functional programmingFunctional programming
Functional programming
Heman Gandhi
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
Sakthivel C R
 
lecture 12
lecture 12lecture 12
lecture 12sajinsc
 
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
marxliouville
 
IR-ranking
IR-rankingIR-ranking
IR-rankingFELIX75
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
GoDataDriven
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
Sameera Horawalavithana
 
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackEvaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Twitter Inc.
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
Yoonho Lee
 
Data Structure and Algorithms
Data Structure and Algorithms Data Structure and Algorithms
Data Structure and Algorithms
ManishPrajapati78
 
4.2 bst 02
4.2 bst 024.2 bst 02
4.2 bst 02
Krish_ver2
 
Extract And Manage Knowledge
Extract And Manage KnowledgeExtract And Manage Knowledge
Extract And Manage Knowledge
abedali
 
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
Tarek Dib
 
LSH
LSHLSH

What's hot (20)

CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
 
Functional programming
Functional programmingFunctional programming
Functional programming
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
lecture 12
lecture 12lecture 12
lecture 12
 
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
How to share a secret
How to share a secretHow to share a secret
How to share a secret
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackEvaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
Data Structure and Algorithms
Data Structure and Algorithms Data Structure and Algorithms
Data Structure and Algorithms
 
4.2 bst 02
4.2 bst 024.2 bst 02
4.2 bst 02
 
Extract And Manage Knowledge
Extract And Manage KnowledgeExtract And Manage Knowledge
Extract And Manage Knowledge
 
Computational Complexity
Computational ComplexityComputational Complexity
Computational Complexity
 
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
 
LSH
LSHLSH
LSH
 

Similar to Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Vrije Universiteit Amsterdam
 
IVR - Chapter 7 - Patch models and dictionary learning
IVR - Chapter 7 - Patch models and dictionary learningIVR - Chapter 7 - Patch models and dictionary learning
IVR - Chapter 7 - Patch models and dictionary learning
Charles Deledalle
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencevini89
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
Feynman Liang
 
Practical Collapsed Stochastic Variational Inference
Practical Collapsed Stochastic Variational InferencePractical Collapsed Stochastic Variational Inference
Practical Collapsed Stochastic Variational Inference
Arnim Bleier
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realm
Daniele Dell'Aglio
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
Parameswaran Raman
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsJie Bao
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept Analysis
Mehwish Alam
 
Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo Theories
Quoc-Sang Phan
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
Marco Righini
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
MLconf
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
Facultad de Informática UCM
 
Structure and interpretation of computer programs modularity, objects, and ...
Structure and interpretation of computer programs   modularity, objects, and ...Structure and interpretation of computer programs   modularity, objects, and ...
Structure and interpretation of computer programs modularity, objects, and ...
bdemchak
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Tomonari Masada
 
Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterAdila Krisnadhi
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
WrushabhShirsat3
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
rusbase
 
Introduction to Prolog
Introduction to PrologIntroduction to Prolog
Introduction to Prolog
Chamath Sajeewa
 

Similar to Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence (20)

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
IVR - Chapter 7 - Patch models and dictionary learning
IVR - Chapter 7 - Patch models and dictionary learningIVR - Chapter 7 - Patch models and dictionary learning
IVR - Chapter 7 - Patch models and dictionary learning
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
 
Practical Collapsed Stochastic Variational Inference
Practical Collapsed Stochastic Variational InferencePractical Collapsed Stochastic Variational Inference
Practical Collapsed Stochastic Variational Inference
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realm
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description Logics
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept Analysis
 
Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo Theories
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Structure and interpretation of computer programs modularity, objects, and ...
Structure and interpretation of computer programs   modularity, objects, and ...Structure and interpretation of computer programs   modularity, objects, and ...
Structure and interpretation of computer programs modularity, objects, and ...
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 Poster
 
Lec1
Lec1Lec1
Lec1
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Introduction to Prolog
Introduction to PrologIntroduction to Prolog
Introduction to Prolog
 

More from Sean Golliher

Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)
Sean Golliher
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:Sean Golliher
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Sean Golliher
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Sean Golliher
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
Sean Golliher
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Sean Golliher
 
PageRank and The Google Matrix
PageRank and The Google MatrixPageRank and The Google Matrix
PageRank and The Google Matrix
Sean Golliher
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
Sean Golliher
 

More from Sean Golliher (9)

Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
 
Goprez sg
Goprez  sgGoprez  sg
Goprez sg
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
 
PageRank and The Google Matrix
PageRank and The Google MatrixPageRank and The Google Matrix
PageRank and The Google Matrix
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
 

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

  • 1. Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence Sean Golliher, Nathan Fortier, Logan Perreault December 12, 2013 1 / 25
  • 2. Property Matching Problem Databases with different properties: 2 / 25
  • 3. def: Query Expansion Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations. 3 / 25
  • 5. Cloud Diagram (TRIZ Problem Solving) 5 / 25
  • 7. Property Matching Problem How do we find all actors in both databases? Don’t want to manually inspect all databases Can we use SPARQL query language to infer across all datasets? SELECT ?p WHERE { s ?p o } Can only match total sizes of returned triple sets 7 / 25
  • 8. Original Bayesian Approach Problems with Bayesian Approach Had to create, and track, a large vocabulary for training Smoothing issues with very sparse text Underflow issues – small confidence values Complexity of likelihood was growing: n different features in feature set X and c classes + tunable parameters. 8 / 25
  • 9. KL-Divergence Original paper from 1951 entitled “On Information and Sufficiency” Also referred to as“relative entropy” A system gains entropy when it moves to a state with more possible arrangements. For example, a liquid to a gas. Used in paper from 2003 for text categorization: ”Using KL-Distance for Text Categorization Elegant and efficient method for plagiarism detection 9 / 25
  • 10. KL-Divergence Measure of divergence of information between two distributions: D(P Q) = P(x) log x∈X P(x) Q(x) Not symmetric 10 / 25
  • 12. KL-Divergence Example Table : Generic Vocabularies Generated by Fixing on Predicates d1 d2 d3 subject1 object1 object2 subject2 object3 object3 subject3 object4 subject1 object1 object2 subject4 object3 subject2 object3 ex: D(d1 d2 ) = 1 log 1/5 + 1 log 1/5 + ........ + 2 log 2/5 5 0 5 0 5 1/4 tf( subject1 ) is 1/5 in d1 and 0 in d2 – using value for now 12 / 25
  • 14. Formal Problem Statement Given: Two databases DB1 and DB2 A predicate p1 ∈ DB1 An object type S1 where some triple “s p1 o exists in D1 where s ∈ S1 Find predicate p2 in DB2 where p2 is equivilant to p1 14 / 25
  • 15. High Level Description Create a document d1 containing labels of all objects linked by p1 Find an object type S2 ∈ d2 where S1 is equivilant to S2 For each predicate p2 used by S2 create a document d2 containing labels of all objects linked by p2 Remove stop words and language tags from d1 and d2 For each document compute the normalized KL-Divergence, KLD ∗ (d1 , d2 ) Return predicate corresponding to the document with the lowest KL-Divergence 15 / 25
  • 16. Algorithm 1 FindPredicate(DB1 , DB2 , p1 , S1 ) Create document d1 containing labels of all objects linked by p1 Find an object type S2 ∈ d2 where S1 is equivilant to S2 for each predicate p2 used by S2 do Create document d2 containing labels of all objects linked by p2 end for Remove stop words and language tags from d1 and d2 min ← 1 for each predicate pi used by S2 do k ← KLD ∗ (d1 , di ) if k < min then min ← k pmap ← pi end if end for return pmap 16 / 25
  • 17. Computing KL-Divergence KL-Divergence is computed as (P(tk , di ) − P(tk , dj )) × log KLD(di , dj ) = k∈V Where P(tk , di ) = tf (tk , di ) x∈di tf (tx , dj ) P(tk , di ) (1) P(tk , dj ) (2) If tk does not occur in di then P(tk , di ) ← KL-Divergence is then normalized as follows: KLD ∗ (di , dj ) = KLD(di , dj ) KLD(di , 0) (3) 17 / 25
  • 18. Algorithm 2 tf (tk , di ) tf ← 0 for each term tx in di do if sim(tk , tx ) > τ then tf ← tf + 1 end if end for return tf 18 / 25