http://dbpedia.org/resource/Tim_Berners-Lee<br />http://dbpedia.org/resource/Spain<br />http://acm.rkbexplorer.com/id/reso...
URI Disambiguation in the Context of Linked Data<br />Presentation Outline<br />Linked Data Repositories<br />Coreference ...
URI Disambiguation in the Context of Linked Data<br />RKBexplorer.com<br />Contains URIs for more than 10 million entities...
URI Disambiguation in the Context of Linked Data<br />Linked Data Repositories<br />Existing databases on the Web are bein...
URI Disambiguation in the Context of Linked Data<br />Disambiguation on the Semantic Web<br />Coreference on the Semantic ...
URI Disambiguation in the Context of Linked Data<br />URI Multiplicity<br />URIs for ‘Spain’:<br />http://dbpedia.org/reso...
URI Disambiguation in the Context of Linked Data<br />Author Disambiguation<br />A known problem in the Information Scienc...
URI Disambiguation in the Context of Linked Data<br />Existing Approaches<br />String Metrics<br />- Name Equivalence iden...
URI Disambiguation in the Context of Linked Data<br />DBLP Linked Data<br />Converted from an XML dump of DBLP database<br...
URI Disambiguation in the Context of Linked Data<br />DBLP Author Disambiguation<br />49 names - 10 most common English su...
8<br />LDOW2008 – Beijing, China<br />URI Disambiguation in the Context of Linked Data<br />It’s all about Identity<br />T...
URI Disambiguation in the Context of Linked Data<br />DBLP Author Disambiguation Results<br />92% of authors with common n...
URI Disambiguation in the Context of Linked Data<br />DBpedia<br />DBpedia 3.0 improves disambiguation management by inclu...
URI Disambiguation in the Context of Linked Data<br />Possible Solutions<br />CRS: Consistent Reference Service<br />- Gro...
URI Disambiguation in the Context of Linked Data<br />Summary<br />Linked Data providers need to think about data consiste...
URI Disambiguation in the Context of Linked Data<br />Questions?<br />Further questions:<br />a.o.jaffri<br />hg	@ecs.soto...
Upcoming SlideShare
Loading in …5
×

URI Disambiguation in the Context of Linked Data

723 views
664 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
723
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Named graphs cannot be made in RDF, outside frameworkHow to decide which graph data comes from?
  • Explain more, slow downWe thought Tom Anderson was being funded by NSF
  • URI Disambiguation in the Context of Linked Data

    1. 1. http://dbpedia.org/resource/Tim_Berners-Lee<br />http://dbpedia.org/resource/Spain<br />http://acm.rkbexplorer.com/id/resource-P112732<br />URI Disambiguation in the Context of Linked Data<br />http://sws.geonames.org/2510769<br />http://acm.rkbexplorer.com/id/person-282197<br />http://id.ecs.soton.ac.uk/person/7113<br />http://www.w3.org/People/Berners-Lee/card#i<br />http://id.ecs.soton.ac.uk/person/21<br />http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007<br />http://citeseer.rkbexplorer.com/id/resource-CSP109020<br />http://southampton.rkbexplorer.com/id/person-00021<br />http://www4.wiwiss.fu-berlin.de/factbook/resource/Spain<br />
    2. 2. URI Disambiguation in the Context of Linked Data<br />Presentation Outline<br />Linked Data Repositories<br />Coreference on the Semantic Web<br />Author Disambiguation<br />DBLP Linked Data<br />DBLP Author Disambiguation<br />Disambiguation Results<br />DBpedia<br />Possible Solutions<br />Summary<br />LDOW2008 - Beijing, China<br />2<br />
    3. 3. URI Disambiguation in the Context of Linked Data<br />RKBexplorer.com<br />Contains URIs for more than 10 million entities<br />Over 25 Linked Data sites, including:<br />Data relating to people, projects, papers and institutions<br />A single entity has a number of URIs (even within the same repository)<br />Entities are linked using CRSes<br />LDOW2008 - Beijing, China<br />3<br />DBLP<br />
    4. 4. URI Disambiguation in the Context of Linked Data<br />Linked Data Repositories<br />Existing databases on the Web are being exposed as Linked Data (D2R, Virtuoso)<br />Databases contain inconsistencies and require constant curation<br />Datasets such as Wikipedia are being continually checked and updated, especially in the case of disambiguation (WikiProject_Disambiguation)<br />Linked Data repositories should also provide consistent data<br />LDOW2008 - Beijing, China<br />4<br />
    5. 5. URI Disambiguation in the Context of Linked Data<br />Disambiguation on the Semantic Web<br />Coreference on the Semantic Web is defined as being the situation where two or more URIs are used for a single non-information resource<br />URI usage can change with context<br />Non-Information resource equality is hard to define precisely<br />Examples<br />‘Hugh Glaser’ at Southampton vs. ‘Hugh Glaser’ at Imperial<br />‘Harry Potter and the Order of the Phoenix’ in Hardback vs. Softback<br /> ISBN: 978-0747561071 978-0747551003<br />5<br />LDOW2008 - Beijing, China<br />
    6. 6. URI Disambiguation in the Context of Linked Data<br />URI Multiplicity<br />URIs for ‘Spain’:<br />http://dbpedia.org/resource/Spain<br />http://ww4.wiwiss.fu-berlin.de/factbook/resource/Spain<br />http://sws.geonames.org/2510769<br />http://www4.wiwiss.fu-berlin.de/eurostat/resource/countries/Espa%C3%Bla<br />URIs for ‘Hugh Glaser’:<br />http://acm.rkbexplorer.com/id/resource-P112732 http://citeseer.rkbexplorer.com/id/resource-CSP109020 http://citeseer.rkbexplorer.com/id/resource-CSP109013 http://citeseer.rkbexplorer.com/id/resource-CSP109011 http://citeseer.rkbexplorer.com/id/resource-CSP109002 http://dblp.rkbexplorer.com/id/resource-27de9959 http://europa.eu/People/#person-0ff816fa http://resist.ecs.soton.ac.uk/wiki/User:hugh_glaser http://id.ecs.soton.ac.uk/people/21 <br />6<br />LDOW2008 - Beijing, China<br />
    7. 7. URI Disambiguation in the Context of Linked Data<br />Author Disambiguation<br />A known problem in the Information Science field<br />How to determine:<br />Hugh Glaser/H. Glaser/Glaser, H.<br /> are the same person?<br />How to determine:<br />Tom Anderson – Newcastle University<br />Tom Anderson – University of Washington <br />are different people?<br />7<br />LDOW2008 - Beijing, China<br />
    8. 8. URI Disambiguation in the Context of Linked Data<br />Existing Approaches<br />String Metrics<br />- Name Equivalence identification<br />- Record Linkage<br />- Citation Matching<br />Web Assisted<br />- Look up publications on author’s home page<br />- Use search engine results on publication title<br />Machine Learning<br />- k-way spectral clustering<br />- Use author name, co-author frequency and publication venue<br />8<br />LDOW2008 - Beijing, China<br />
    9. 9. URI Disambiguation in the Context of Linked Data<br />DBLP Linked Data<br />Converted from an XML dump of DBLP database<br />950 000 Publications<br />540 000 Authors<br />28 million triples<br />Updated Weekly<br />Linked to other datasets including RDF Book Mashup and RKBExplorer.com<br />9<br />LDOW2008 - Beijing, China<br />
    10. 10. URI Disambiguation in the Context of Linked Data<br />DBLP Author Disambiguation<br />49 names - 10 most common English surnames with 5 common first names<br />Authors disambiguated by looking at homepage, web publication, search engine results and institution<br />When in doubt, authors assumed to be the same if:<br />- The co-authors of any publication are the same<br />- The publication venue was the same<br />- The area of research was the same<br />10<br />LDOW2008 - Beijing, China<br />
    11. 11. 8<br />LDOW2008 – Beijing, China<br />URI Disambiguation in the Context of Linked Data<br />It’s all about Identity<br />Tom Anderson – http://www4.wiwiss.fu-berlin.de/dblp/resource/person/109074<br />Is dc:creator of <http://www4.wiwiss.fu berlin.de/dblp/resource/record/conf/dac/MorettiHNCKABDF01> <br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ftcs/SaeedLA91><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ftrtft/LemosSA92><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/hybrid/AndersonLFS92><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/iccbss/AndersonFRR03><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/iciap/TruccoARI05><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/icnp/ElySWSA01> <br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ifip/AndersonRR04><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/sc/BorchersASW95><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/seaai/AndersonH98> <br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/srds/Anderson86><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/words/AndersonFRR05><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/bell/LiuBFSRA04> <br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/cj/LemosSA92><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/Anderson01><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/Anderson03> <br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/ZorianASTI96> <br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/software/LemosSA95> <br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/ton/SavageWKA01><br />is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/tse/AndersonBHM85> <br />is dblp:editor of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/sigcomm/2006><br />Vice President O-in Design Automation inc. USA<br />Professor, University of Newcastle<br />Professor, Heriot Watt University<br />University of Washington<br />University of California, Berkely<br />Tom Andersen - University of Denmark<br />Lucent Technologies, Illinois<br />
    12. 12. URI Disambiguation in the Context of Linked Data<br />DBLP Author Disambiguation Results<br />92% of authors with common names had publications incorrectly merged<br />Worst case - 15 different authors with 1 URI<br />Many authors who are the same have publications under different names (Cliff Jones, C.B. Jones)<br />Inconsistency in data means inconsistency with linked data<br />It is incorrect to use owl:sameAs to link different authors who have the same URI<br />12<br />LDOW2008 - Beijing, China<br />
    13. 13. URI Disambiguation in the Context of Linked Data<br />DBpedia<br />DBpedia 3.0 improves disambiguation management by including the ‘disambiguates’ property<br />owl:sameAs linkage still inconsistent:<br /> <http://dbpedia.org/resource/Welsh > owl:sameAs<br /> <http://sw.cyc.com/2006/07/27/cyc/EthnicGroupOfWelsh> .<br /> <http://sw.cyc.com/2006/07/27/cyc/Welsh-TheWord> .<br /> <http://sw.cyc.com/2006/07/27/cyc/WelshLanguage> .<br /> <http://sw.cyc.com/2006/07/27/cyc/Welshing-Cheating> .<br /><http://dbpedia.org/resource/H.P._Lovecraft> owl:sameAs<br /> <http://sw.cyc.com/2006/07/27/cyc/HPLovecraft-Author> .<br /> <http://zitgist.com/music/artist/8047a401-5ca7-48dd-9d7c-2d2b822e51e6> .<br />13<br />LDOW2008 - Beijing, China<br />
    14. 14. URI Disambiguation in the Context of Linked Data<br />Possible Solutions<br />CRS: Consistent Reference Service<br />- Groups similar URIs into ‘bundles’<br />- Bundles can be made according to context<br />- Each KB can have one or more CRSes<br />OKKAM<br />- Coming up soon!<br />14<br />LDOW2008 - Beijing, China<br />
    15. 15. URI Disambiguation in the Context of Linked Data<br />Summary<br />Linked Data providers need to think about data consistency in the same way as database providers<br />Failure to manage coreference within datasets leads to incorrect linkage with other datasets<br />The network effect of the Web of Data means coreference needs to be even more carefully managed than in the Web of Documents<br />Systems are being developed to help manage coreference, the community needs to decide how to handle the problem<br />15<br />LDOW2008 - Beijing, China<br />
    16. 16. URI Disambiguation in the Context of Linked Data<br />Questions?<br />Further questions:<br />a.o.jaffri<br />hg @ecs.soton.ac.uk<br />icm<br />16<br />LDOW2008 - Beijing, China<br />

    ×