Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

575 views

Published on

Slides for our final paper

Published in: Technology, Education
1 Comment
1 Like
Statistics
Notes
  • Is it odd that my wife was excited by reading this to the point where it led to the conception of another child?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
575
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
14
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

  1. 1. Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence Sean Golliher, Nathan Fortier, Logan Perreault December 12, 2013 1 / 25
  2. 2. Property Matching Problem Databases with different properties: 2 / 25
  3. 3. def: Query Expansion Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations. 3 / 25
  4. 4. Societal Cloud 4 / 25
  5. 5. Cloud Diagram (TRIZ Problem Solving) 5 / 25
  6. 6. Cloud Diagram Broken 6 / 25
  7. 7. Property Matching Problem How do we find all actors in both databases? Don’t want to manually inspect all databases Can we use SPARQL query language to infer across all datasets? SELECT ?p WHERE { s ?p o } Can only match total sizes of returned triple sets 7 / 25
  8. 8. Original Bayesian Approach Problems with Bayesian Approach Had to create, and track, a large vocabulary for training Smoothing issues with very sparse text Underflow issues – small confidence values Complexity of likelihood was growing: n different features in feature set X and c classes + tunable parameters. 8 / 25
  9. 9. KL-Divergence Original paper from 1951 entitled “On Information and Sufficiency” Also referred to as“relative entropy” A system gains entropy when it moves to a state with more possible arrangements. For example, a liquid to a gas. Used in paper from 2003 for text categorization: ”Using KL-Distance for Text Categorization Elegant and efficient method for plagiarism detection 9 / 25
  10. 10. KL-Divergence Measure of divergence of information between two distributions: D(P Q) = P(x) log x∈X P(x) Q(x) Not symmetric 10 / 25
  11. 11. KL-Divergence Example 11 / 25
  12. 12. KL-Divergence Example Table : Generic Vocabularies Generated by Fixing on Predicates d1 d2 d3 subject1 object1 object2 subject2 object3 object3 subject3 object4 subject1 object1 object2 subject4 object3 subject2 object3 ex: D(d1 d2 ) = 1 log 1/5 + 1 log 1/5 + ........ + 2 log 2/5 5 0 5 0 5 1/4 tf( subject1 ) is 1/5 in d1 and 0 in d2 – using value for now 12 / 25
  13. 13. Algorithm Description 13 / 25
  14. 14. Formal Problem Statement Given: Two databases DB1 and DB2 A predicate p1 ∈ DB1 An object type S1 where some triple “s p1 o exists in D1 where s ∈ S1 Find predicate p2 in DB2 where p2 is equivilant to p1 14 / 25
  15. 15. High Level Description Create a document d1 containing labels of all objects linked by p1 Find an object type S2 ∈ d2 where S1 is equivilant to S2 For each predicate p2 used by S2 create a document d2 containing labels of all objects linked by p2 Remove stop words and language tags from d1 and d2 For each document compute the normalized KL-Divergence, KLD ∗ (d1 , d2 ) Return predicate corresponding to the document with the lowest KL-Divergence 15 / 25
  16. 16. Algorithm 1 FindPredicate(DB1 , DB2 , p1 , S1 ) Create document d1 containing labels of all objects linked by p1 Find an object type S2 ∈ d2 where S1 is equivilant to S2 for each predicate p2 used by S2 do Create document d2 containing labels of all objects linked by p2 end for Remove stop words and language tags from d1 and d2 min ← 1 for each predicate pi used by S2 do k ← KLD ∗ (d1 , di ) if k < min then min ← k pmap ← pi end if end for return pmap 16 / 25
  17. 17. Computing KL-Divergence KL-Divergence is computed as (P(tk , di ) − P(tk , dj )) × log KLD(di , dj ) = k∈V Where P(tk , di ) = tf (tk , di ) x∈di tf (tx , dj ) P(tk , di ) (1) P(tk , dj ) (2) If tk does not occur in di then P(tk , di ) ← KL-Divergence is then normalized as follows: KLD ∗ (di , dj ) = KLD(di , dj ) KLD(di , 0) (3) 17 / 25
  18. 18. Algorithm 2 tf (tk , di ) tf ← 0 for each term tx in di do if sim(tk , tx ) > τ then tf ← tf + 1 end if end for return tf 18 / 25
  19. 19. Experimental Results 19 / 25
  20. 20. Experimental Results 20 / 25
  21. 21. Experimental Results 21 / 25
  22. 22. Experimental Results 22 / 25
  23. 23. Experimental Results 23 / 25
  24. 24. Experimental Results 24 / 25
  25. 25. Questions? 25 / 25

×