Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

•

1 like•693 views

This document presents an algorithm for matching properties between linked databases using Kullback-Leibler divergence (KL-Divergence). It first creates documents representing the distributions of objects linked to properties in each database. It then computes the normalized KL-Divergence between all document pairs to identify the most similar properties. The property with the lowest KL-Divergence score to a given property is returned as its match. Experimental results on real linked datasets found the algorithm could accurately match properties over 90% of the time.

Technology Education

Property Matching and Query Expansion on
Linked Data Using Kullback-Leibler Divergence
Sean Golliher, Nathan Fortier, Logan Perreault

December 12, 2013

1 / 25

Property Matching Problem

Databases with diﬀerent properties:

2 / 25

def: Query Expansion

Query expansion (QE) is the process of reformulating a seed
query to improve retrieval performance in information retrieval
operations.

3 / 25

Cloud Diagram (TRIZ Problem Solving)

5 / 25

Property Matching Problem

How do we ﬁnd all actors in both databases?
Don’t want to manually inspect all databases
Can we use SPARQL query language to infer across all datasets?
SELECT ?p
WHERE { s ?p o }
Can only match total sizes of returned triple sets

7 / 25

Original Bayesian Approach

Problems with Bayesian Approach
Had to create, and track, a large vocabulary for training
Smoothing issues with very sparse text
Underﬂow issues – small conﬁdence values
Complexity of likelihood was growing:
n diﬀerent features in feature set X and c classes + tunable parameters.

8 / 25

KL-Divergence

Original paper from 1951 entitled “On Information and Suﬃciency”
Also referred to as“relative entropy”
A system gains entropy when it moves to a state with more possible
arrangements. For example, a liquid to a gas.
Used in paper from 2003 for text categorization:
”Using KL-Distance for Text Categorization
Elegant and eﬃcient method for plagiarism detection

9 / 25

KL-Divergence

Measure of divergence of information between two distributions:
D(P

Q) =

P(x) log
x∈X

P(x)
Q(x)

Not symmetric

10 / 25

KL-Divergence Example

Table : Generic Vocabularies Generated by Fixing on Predicates

d1

d2

d3

subject1
object1
object2
subject2
object3
object3

subject3
object4

subject1
object1
object2
subject4
object3

subject2
object3

ex: D(d1 d2 ) = 1 log 1/5 + 1 log 1/5 + ........ + 2 log 2/5
5
0
5
0
5
1/4
tf( subject1 ) is 1/5 in d1 and 0 in d2 – using value for now

12 / 25

Formal Problem Statement

Given:
Two databases DB1 and DB2
A predicate p1 ∈ DB1
An object type S1 where some triple “s p1 o exists in D1
where s ∈ S1

Find predicate p2 in DB2 where p2 is equivilant to p1

14 / 25

High Level Description

Create a document d1 containing labels of all objects linked
by p1
Find an object type S2 ∈ d2 where S1 is equivilant to S2
For each predicate p2 used by S2 create a document d2
containing labels of all objects linked by p2
Remove stop words and language tags from d1 and d2
For each document compute the normalized KL-Divergence,
KLD ∗ (d1 , d2 )
Return predicate corresponding to the document with the
lowest KL-Divergence

15 / 25

Algorithm 1 FindPredicate(DB1 , DB2 , p1 , S1 )
Create document d1 containing labels of all objects linked by p1
Find an object type S2 ∈ d2 where S1 is equivilant to S2
for each predicate p2 used by S2 do
Create document d2 containing labels of all objects linked by p2
end for
Remove stop words and language tags from d1 and d2
min ← 1
for each predicate pi used by S2 do
k ← KLD ∗ (d1 , di )
if k < min then
min ← k
pmap ← pi
end if
end for
return pmap

16 / 25

Computing KL-Divergence
KL-Divergence is computed as
(P(tk , di ) − P(tk , dj )) × log

KLD(di , dj ) =
k∈V

Where
P(tk , di ) =

tf (tk , di )
x∈di tf (tx , dj )

P(tk , di )
(1)
P(tk , dj )

(2)

If tk does not occur in di then P(tk , di ) ←
KL-Divergence is then normalized as follows:
KLD ∗ (di , dj ) =

KLD(di , dj )
KLD(di , 0)

(3)

17 / 25

Algorithm 2 tf (tk , di )
tf ← 0
for each term tx in di do
if sim(tk , tx ) > τ then
tf ← tf + 1
end if
end for
return tf

18 / 25

We review our recent progress in the development of graph kernels. We discuss the hash graph kernel framework, which makes the computation of kernels for graphs with vertices and edges annotated with real-valued information feasible for large data sets. Moreover, we summarize our general investigation of the benefits of explicit graph feature maps in comparison to using the kernel trick. Our experimental studies on real-world data sets suggest that explicit feature maps often provide sufficient classification accuracy while being computed more efficiently. Finally, we describe how to construct valid kernels from optimal assignments to obtain new expressive graph kernels. These make use of the kernel trick to establish one-to-one correspondences. We conclude by a discussion of our results and their implication for the future development of graph kernels.

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks

Christopher Morris

Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs

Christopher Morris

Most state-of-the-art graph kernels only take local graph properties into account, i.e., the kernel is computed with regard to properties of the neighborhood of vertices or other small substructures. On the other hand, kernels that do take global graph properties into account may not scale well to large graph databases. Here we propose to start exploring the space between local and global graph kernels, striking the balance between both worlds. Specifically, we introduce a novel graph kernel based on the k-dimensional Weisfeiler-Lehman algorithm. Unfortunately, the k-dimensional Weisfeiler-Lehman algorithm scales exponentially in k. Consequently, we devise a stochastic version of the kernel with provable approximation guarantees using conditional Rademacher averages. On bounded-degree graphs, it can even be computed in constant time. We support our theoretical results with experiments on several graph classification benchmarks, showing that our kernels often outperform the state-of-the-art in terms of classification accuracies.

Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...

Dwaipayan Roy

Chapter 10 ds

Hanif Durad

3 - Finding similar items

Viet-Trung TRAN

Prediction and Explanation over DL-Lite Data Streams

Szymon Klarman

Wendell Kuling works as a Data Scientist at ING in the Wholesale Banking Advanced Analytics team. Their projects aim to provide better services to corporate customers of ING, by using innovative techniques from data-science. In this talk, Wendell covers key insights from their experience in matching large datasets based on names. After covering the key algorithms and packages ING uses for name matching, Wendell will share his best-practice approach in applying these algorithms at scale… would you bet on a Cruncher (48-CPU/512 MB RAM machine), a Tesla (Cuda Tesla K80 with 4992 cores, 24GB memory) or a Spark cluster (80 cores/2,5 TB memory)?

How to share a secretCamilo Garrido

Locality sensitive hashing

Sameera Horawalavithana

Evaluating the Effectiveness of Axiomatic Approaches in Web Track

Twitter Inc.

Gradient Estimation Using Stochastic Computation Graphs

Yoonho Lee

Data Structure and Algorithms

ManishPrajapati78

4.2 bst 02

Krish_ver2

Extract And Manage Knowledge

abedali

Computational ComplexityKasun Ranga Wijeweera

Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN

Tarek Dib

LSH

Hsiao-Fei Liu

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Vrije Universiteit Amsterdam

IVR - Chapter 7 - Patch models and dictionary learning

Charles Deledalle

What's hot

CPM2013-tabei201306Yasuo Tabei

2014-mo444-practical-assignment-02-paulo_fariaPaulo Faria

K-Means Algorithm

Carlos Castillo (ChaTo)

DCC2014 - Fully Online Grammar Compression in Constant Space

Yasuo Tabei

Functional programming

Heman Gandhi

Ch03 Mining Massive Data Sets stanford

Sakthivel C R

lecture 12sajinsc

A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...

marxliouville

IR-rankingFELIX75

PyData Amsterdam - Name Matching at Scale

GoDataDriven

How to share a secretCamilo Garrido

Locality sensitive hashing

Sameera Horawalavithana

Evaluating the Effectiveness of Axiomatic Approaches in Web Track

Twitter Inc.

Gradient Estimation Using Stochastic Computation Graphs

Yoonho Lee

Data Structure and Algorithms

ManishPrajapati78

4.2 bst 02

Krish_ver2

Extract And Manage Knowledge

abedali

Computational ComplexityKasun Ranga Wijeweera

Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN

Tarek Dib

LSH

Hsiao-Fei Liu

What's hot (20)

CPM2013-tabei201306

2014-mo444-practical-assignment-02-paulo_faria

K-Means Algorithm

DCC2014 - Fully Online Grammar Compression in Constant Space

Functional programming

Ch03 Mining Massive Data Sets stanford

lecture 12

A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...

IR-ranking

PyData Amsterdam - Name Matching at Scale

How to share a secret

Locality sensitive hashing

Evaluating the Effectiveness of Axiomatic Approaches in Web Track

Gradient Estimation Using Stochastic Computation Graphs

Data Structure and Algorithms

4.2 bst 02

Extract And Manage Knowledge

Computational Complexity

Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN

LSH

Similar to Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Vrije Universiteit Amsterdam

IVR - Chapter 7 - Patch models and dictionary learning

Charles Deledalle

Artificial Intelligencevini89

Detecting paraphrases using recursive autoencoders

Feynman Liang

Practical Collapsed Stochastic Variational Inference

Arnim Bleier

Recent advances have made it feasible to apply the stochastic variational paradigm to a collapsed representation of latent Dirichlet allocation (LDA). While the stochastic variational paradigm has successfully been applied to an uncollapsed representation of the hierarchical Dirichlet process (HDP), no attempts to apply this type of inference in a collapsed setting of non-parametric topic modeling have been put forward so far. In this paper we explore such a collapsed stochastic variational Bayes inference for the HDP. The proposed online algorithm is easy to implement and accounts for the inference of hyper-parameters. First experiments show a promising improvement in predictive performance. http://mimno.infosci.cornell.edu/nips2013ws/nips2013tm_submission_29.pdf

On Unified Stream Reasoning - The RDF Stream Processing realm

Daniele Dell'Aglio

DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism

Parameswaran Raman

A Distributed Tableau Algorithm for Package-based Description LogicsJie Bao

Navigating and Exploring RDF Data using Formal Concept Analysis

Mehwish Alam

In this study we propose a new approach based on Pattern Structures, an extension of Formal Concept Analysis, to provide exploration over Linked Data through concept lattices. It takes RDF triples and RDF Schema based on user requirements and provides one navigation space resulting from several RDF resources. This navigation space provides interactive exploration over RDF data and allows user to visualize only the part of data that is interesting for her.

Symbolic Execution as DPLL Modulo Theories

Quoc-Sang Phan

Latent Dirichlet Allocation

Marco Righini

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...

MLconf

Tensor Decomposition: A Mathematical Tool for Data Analysis: Tensors are multiway arrays, and tensor decompositions are powerful tools for data analysis. In this talk, we demonstrate the wide-ranging utility of the canonical polyadic (CP) tensor decomposition with examples in neuroscience and chemical detection. The CP model is extremely useful for interpretation, as we show with an example in neuroscience. However, it can be difficult to fit to real data for a variety of reasons. We present a novel randomized method for fitting the CP decomposition to dense data that is more scalable and robust than the standard techniques. We further consider the modeling assumptions for fitting tensor decompositions to data and explain alternative strategies for different statistical scenarios, resulting in a _generalized_ CP tensor decomposition. Bio: Tamara G. Kolda is a member of the Data Science and Cyber Analytics Department at Sandia National Laboratories in Livermore, CA. Her research is generally in the area of computational science and data analysis, with specialties in multilinear algebra and tensor decompositions, graph models and algorithms, data mining, optimization, nonlinear solvers, parallel computing and the design of scientific software. She has received a Presidential Early Career Award for Scientists and Engineers (PECASE), been named a Distinguished Scientist of the Association for Computing Machinery (ACM) and a Fellow of the Society for Industrial and Applied Mathematics (SIAM). She was the winner of an R&D100 award and three best paper prizes at international conferences. She is currently a member of the SIAM Board of Trustees and serves as associate editor for both the SIAM J. Scientific Computing and the SIAM J. Matrix Analysis and Applications.

Type and proof structures for concurrency

Facultad de Informática UCM

The main challenge of concurrent software verification has always been in achieving modularity, i.e., the ability to divide and conquer the correctness proofs with the goal of scaling the verification effort. Types are a formal method well-known for its ability to modularize programs, and in the case of dependent types, the ability to modularize and scale complex mathematical proofs. In this talk I will present our recent work towards reconciling dependent types with shared memory concurrency, with the goal of achieving modular proofs for the latter. Applying the type-theoretic paradigm to concurrency has lead us to view separation logic as a type theory of state, and has motivated novel abstractions for expressing concurrency proofs based on the algebraic structure of a resource and on structure-preserving functions (i.e., morphisms) between resources.

Structure and interpretation of computer programs modularity, objects, and ...

bdemchak

Context-dependent Token-wise Variational Autoencoder for Topic Modeling

Tomonari Masada

Local Closed World Semantics - DL 2011 PosterAdila Krisnadhi

Lec1Prafulla Kiran

Intro.ppt

WrushabhShirsat3

Сергей Кольцов —НИУ ВШЭ —ICBDA 2015

rusbase

Выступление Сергея Кольцова (НИУ ВШЭ) на International Conference on Big Data and its Applications (ICBDA). ICBDA — конференция для предпринимателей и разработчиков о том, как эффективно решать бизнес-задачи с помощью анализа больших данных. http://icbda2015.org/

Introduction to Prolog

Chamath Sajeewa

Similar to Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence (20)

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

IVR - Chapter 7 - Patch models and dictionary learning

Artificial Intelligence

Detecting paraphrases using recursive autoencoders

Practical Collapsed Stochastic Variational Inference

On Unified Stream Reasoning - The RDF Stream Processing realm

DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism

A Distributed Tableau Algorithm for Package-based Description Logics

Navigating and Exploring RDF Data using Formal Concept Analysis

Symbolic Execution as DPLL Modulo Theories

Latent Dirichlet Allocation

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...

Type and proof structures for concurrency

Structure and interpretation of computer programs modularity, objects, and ...

Context-dependent Token-wise Variational Autoencoder for Topic Modeling

Local Closed World Semantics - DL 2011 Poster

Lec1

Intro.ppt

Сергей Кольцов —НИУ ВШЭ —ICBDA 2015

Introduction to Prolog

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

"Impact of front-end architecture on development cost", Viktor Turskyi

Fwdays

I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

Knowledge engineering: from people to machines and back

Elena Simperl

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Abida Shariff

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

JMeter webinar - integration with InfluxDB and Grafana

RTTS

Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application. In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics. Length: 30 minutes Session Overview ------------------------------------------- During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana: - What out-of-the-box solutions are available for real-time monitoring JMeter tests? - What are the benefits of integrating InfluxDB and Grafana into the load testing stack? - Which features are provided by Grafana? - Demonstration of InfluxDB and Grafana using a practice web application To view the webinar recording, go to: https://www.rttsweb.com/jmeter-integration-webinar

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

"Impact of front-end architecture on development cost", Viktor Turskyi

Key Trends Shaping the Future of Infrastructure.pdf

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

GraphRAG is All You need? LLM & Knowledge Graph

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

How world-class product teams are winning in the AI era by CEO and Founder, P...

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

Knowledge engineering: from people to machines and back

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

The Art of the Pitch: WordPress Relationships and Sales

JMeter webinar - integration with InfluxDB and Grafana

Leading Change strategies and insights for effective change management pdf 1.pdf

Mission to Decommission: Importance of Decommissioning Products to Increase E...

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

1. Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence Sean Golliher, Nathan Fortier, Logan Perreault December 12, 2013 1 / 25

2. Property Matching Problem Databases with diﬀerent properties: 2 / 25

3. def: Query Expansion Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations. 3 / 25

4. Societal Cloud 4 / 25

5. Cloud Diagram (TRIZ Problem Solving) 5 / 25

6. Cloud Diagram Broken 6 / 25

7. Property Matching Problem How do we ﬁnd all actors in both databases? Don’t want to manually inspect all databases Can we use SPARQL query language to infer across all datasets? SELECT ?p WHERE { s ?p o } Can only match total sizes of returned triple sets 7 / 25

8. Original Bayesian Approach Problems with Bayesian Approach Had to create, and track, a large vocabulary for training Smoothing issues with very sparse text Underflow issues – small confidence values Complexity of likelihood was growing: n different features in feature set X and c classes + tunable parameters. 8 / 25

9. KL-Divergence Original paper from 1951 entitled “On Information and Suﬃciency” Also referred to as“relative entropy” A system gains entropy when it moves to a state with more possible arrangements. For example, a liquid to a gas. Used in paper from 2003 for text categorization: ”Using KL-Distance for Text Categorization Elegant and eﬃcient method for plagiarism detection 9 / 25

10. KL-Divergence Measure of divergence of information between two distributions: D(P Q) = P(x) log x∈X P(x) Q(x) Not symmetric 10 / 25

11. KL-Divergence Example 11 / 25

12. KL-Divergence Example Table : Generic Vocabularies Generated by Fixing on Predicates d1 d2 d3 subject1 object1 object2 subject2 object3 object3 subject3 object4 subject1 object1 object2 subject4 object3 subject2 object3 ex: D(d1 d2 ) = 1 log 1/5 + 1 log 1/5 + ........ + 2 log 2/5 5 0 5 0 5 1/4 tf( subject1 ) is 1/5 in d1 and 0 in d2 – using value for now 12 / 25

13. Algorithm Description 13 / 25

14. Formal Problem Statement Given: Two databases DB1 and DB2 A predicate p1 ∈ DB1 An object type S1 where some triple “s p1 o exists in D1 where s ∈ S1 Find predicate p2 in DB2 where p2 is equivilant to p1 14 / 25

15. High Level Description Create a document d1 containing labels of all objects linked by p1 Find an object type S2 ∈ d2 where S1 is equivilant to S2 For each predicate p2 used by S2 create a document d2 containing labels of all objects linked by p2 Remove stop words and language tags from d1 and d2 For each document compute the normalized KL-Divergence, KLD ∗ (d1 , d2 ) Return predicate corresponding to the document with the lowest KL-Divergence 15 / 25

16. Algorithm 1 FindPredicate(DB1 , DB2 , p1 , S1 ) Create document d1 containing labels of all objects linked by p1 Find an object type S2 ∈ d2 where S1 is equivilant to S2 for each predicate p2 used by S2 do Create document d2 containing labels of all objects linked by p2 end for Remove stop words and language tags from d1 and d2 min ← 1 for each predicate pi used by S2 do k ← KLD ∗ (d1 , di ) if k < min then min ← k pmap ← pi end if end for return pmap 16 / 25

17. Computing KL-Divergence KL-Divergence is computed as (P(tk , di ) − P(tk , dj )) × log KLD(di , dj ) = k∈V Where P(tk , di ) = tf (tk , di ) x∈di tf (tx , dj ) P(tk , di ) (1) P(tk , dj ) (2) If tk does not occur in di then P(tk , di ) ← KL-Divergence is then normalized as follows: KLD ∗ (di , dj ) = KLD(di , dj ) KLD(di , 0) (3) 17 / 25

18. Algorithm 2 tf (tk , di ) tf ← 0 for each term tx in di do if sim(tk , tx ) > τ then tf ← tf + 1 end if end for return tf 18 / 25

19. Experimental Results 19 / 25

20. Experimental Results 20 / 25

21. Experimental Results 21 / 25

22. Experimental Results 22 / 25

23. Experimental Results 23 / 25

24. Experimental Results 24 / 25

25. Questions? 25 / 25

Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Similar to Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence (20)

More from Sean Golliher

More from Sean Golliher (9)

Recently uploaded

Recently uploaded (20)

Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence