Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

312 views

Published on

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
312
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

  1. 1. Institute for Web Science & Technologies – WeSTWorkshop Web of Linked Entities (WoLE 2012) at ISWC 2012 Sunday, 11 November 2012Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations Christian Hachenberg and Thomas Gottron
  2. 2. Mapping Documents to Entities dbpedia.org:Rob_Roy_(film)Finding Good URLs Thomas Gottron WoLE Workshop 2012 2
  3. 3. Mapping Entities to Documents dbpedia.org:Rob_Roy_(film) Align entities in KB with public documents • Publish knowledge base • Propagate changes • Human readable representationFinding Good URLs Thomas Gottron WoLE Workshop 2012 3
  4. 4. Task Definition George Lucas type: director dbpedia:George_Lucas type: movie ???dbpedia:Star_Wars_Episode_IV:_A_New_Hope Star Wars IV: A New Hope 3 types of information: dbpedia:Harrison_Ford • Labels type: actor • Link structure Harrison Ford • TypesFinding Good URLs Thomas Gottron WoLE Workshop 2012 4
  5. 5. Label Search (using Web Search Engine) George Lucas type: director SW4 dbpedia:George_Lucas type: movie SW4dbpedia:Star_Wars_Episode_IV:_A_New_Hope Star Wars IV: A New Hope SW4 dbpedia:Harrison_Ford Implementation: type: actor Harrison Ford • BingFinding Good URLs Thomas Gottron WoLE Workshop 2012 5
  6. 6. Exploiting Link Structure George Lucas type: director GL SW4 dbpedia:George_Lucas type: movie SW4dbpedia:Star_Wars_Episode_IV:_A_New_Hope Star Wars IV: A New Hope HF Implementation: SW4 dbpedia:Harrison_Ford • In-degree type: actor Harrison Ford • PageRank • HITSFinding Good URLs Thomas Gottron WoLE Workshop 2012 6
  7. 7. Type Filtering Gran Torino type: movie SW4 dbpedia:Gran_Torino_(film) GT type: movie SW4dbpedia:Star_Wars_Episode_IV:_A_New_Hope RR Star Wars IV: A New Hope Rob Roy type: movie SW4 dbpedia:Rob_Roy_(film) Implementation: • Borda Count for domain rankingFinding Good URLs Thomas Gottron WoLE Workshop 2012 7
  8. 8. Experimental Setup 100 Entities  4 domains (cities, companies, persons, movies)  Stratified by little, medium and large representation on the web  Complete network of linked entities Application of label search and link structure approaches Type-filtering as post-process User evaluation (Cranfield setup, pooling)  Graded relevance judgements  High juror agreement (Krippendorffs Alpha >0.67)Finding Good URLs Thomas Gottron WoLE Workshop 2012 8
  9. 9. Evaluation MetricsFinding Good URLs Thomas Gottron WoLE Workshop 2012 9
  10. 10. Evaluation: Results Statistically significant , p=0.05Finding Good URLs Thomas Gottron WoLE Workshop 2012 10
  11. 11. Evaluation: Results (Domain, Stratum)Finding Good URLs Thomas Gottron WoLE Workshop 2012 11
  12. 12. Evaluation: Results (Filtering)Finding Good URLs Thomas Gottron WoLE Workshop 2012 12
  13. 13. Conclusions and Next Steps Novel task: Mapping entities to public web URLs – Evaluated 9 link analysis and web search methods (+1 post- processing using Borda counts) – Best methods: Label Search and Focussed HITS • Semantic Typing boosts all results Next steps: Investigate domain-dependent performance of methodsFinding Good URLs Thomas Gottron WoLE Workshop 2012 13
  14. 14. Thank you!Contact: WeST – Institute for Web Science and Technologies Universität Koblenz-Landau gottron@uni-koblenz.deFinding Good URLs Thomas Gottron WoLE Workshop 2012 14

×