SlideShare a Scribd company logo
1 of 16
http://dbpedia.org/resource/Tim_Berners-Lee http://dbpedia.org/resource/Spain http://acm.rkbexplorer.com/id/resource-P112732 URI Disambiguation in the Context of Linked Data http://sws.geonames.org/2510769 http://acm.rkbexplorer.com/id/person-282197 http://id.ecs.soton.ac.uk/person/7113 http://www.w3.org/People/Berners-Lee/card#i http://id.ecs.soton.ac.uk/person/21 http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007 http://citeseer.rkbexplorer.com/id/resource-CSP109020 http://southampton.rkbexplorer.com/id/person-00021 http://www4.wiwiss.fu-berlin.de/factbook/resource/Spain
URI Disambiguation in the Context of Linked Data Presentation Outline Linked Data Repositories Coreference on the Semantic Web Author Disambiguation DBLP Linked Data DBLP Author Disambiguation Disambiguation Results DBpedia Possible Solutions Summary LDOW2008 - Beijing, China 2
URI Disambiguation in the Context of Linked Data RKBexplorer.com Contains URIs for more than 10 million entities Over 25 Linked Data sites, including: Data relating to people, projects, papers and institutions A single entity has a number of URIs (even within the same repository) Entities are linked using CRSes LDOW2008 - Beijing, China 3 DBLP
URI Disambiguation in the Context of Linked Data Linked Data Repositories Existing databases on the Web are being exposed as Linked Data (D2R, Virtuoso) Databases contain inconsistencies and require constant curation Datasets such as Wikipedia are being continually checked and updated, especially in the case of disambiguation (WikiProject_Disambiguation) Linked Data repositories should also provide consistent data LDOW2008 - Beijing, China 4
URI Disambiguation in the Context of Linked Data Disambiguation on the Semantic Web Coreference on the Semantic Web is defined as being the situation where two or more URIs are used for a single non-information resource URI usage can change with context Non-Information resource equality is hard to define precisely Examples ‘Hugh Glaser’ at Southampton vs. ‘Hugh Glaser’ at Imperial ‘Harry Potter and the Order of the Phoenix’ in Hardback vs. Softback            		ISBN:  978-0747561071		      978-0747551003 5 LDOW2008 - Beijing, China
URI Disambiguation in the Context of Linked Data URI Multiplicity URIs for ‘Spain’: http://dbpedia.org/resource/Spain http://ww4.wiwiss.fu-berlin.de/factbook/resource/Spain http://sws.geonames.org/2510769 http://www4.wiwiss.fu-berlin.de/eurostat/resource/countries/Espa%C3%Bla URIs for ‘Hugh Glaser’: http://acm.rkbexplorer.com/id/resource-P112732 http://citeseer.rkbexplorer.com/id/resource-CSP109020 http://citeseer.rkbexplorer.com/id/resource-CSP109013 http://citeseer.rkbexplorer.com/id/resource-CSP109011 http://citeseer.rkbexplorer.com/id/resource-CSP109002 http://dblp.rkbexplorer.com/id/resource-27de9959 http://europa.eu/People/#person-0ff816fa http://resist.ecs.soton.ac.uk/wiki/User:hugh_glaser http://id.ecs.soton.ac.uk/people/21  6 LDOW2008 - Beijing, China
URI Disambiguation in the Context of Linked Data Author Disambiguation A known problem in the Information Science field How to determine: Hugh Glaser/H. Glaser/Glaser, H. 	are the same person? How to determine: Tom Anderson – Newcastle University Tom Anderson – University of Washington  are different people? 7 LDOW2008 - Beijing, China
URI Disambiguation in the Context of Linked Data Existing Approaches String Metrics - Name Equivalence identification - Record Linkage - Citation Matching Web Assisted - Look up publications on author’s home page - Use search engine results on publication title Machine Learning - k-way spectral clustering - Use author name, co-author frequency and publication     venue 8 LDOW2008 - Beijing, China
URI Disambiguation in the Context of Linked Data DBLP Linked Data Converted from an XML dump of DBLP database 950 000 Publications 540 000 Authors 28 million triples Updated Weekly Linked to other datasets including RDF Book Mashup and RKBExplorer.com 9 LDOW2008 - Beijing, China
URI Disambiguation in the Context of Linked Data DBLP Author Disambiguation 49 names - 10 most common English surnames with 5 common first names Authors disambiguated by looking at homepage, web publication, search engine results and institution When in doubt, authors assumed to be the same if: - The co-authors of any publication are the same - The publication venue was the same - The area of research was the same 10 LDOW2008 - Beijing, China
8 LDOW2008 – Beijing, China URI Disambiguation in the Context of Linked Data It’s all about Identity Tom Anderson – http://www4.wiwiss.fu-berlin.de/dblp/resource/person/109074 Is dc:creator of <http://www4.wiwiss.fu berlin.de/dblp/resource/record/conf/dac/MorettiHNCKABDF01>  is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ftcs/SaeedLA91> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ftrtft/LemosSA92> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/hybrid/AndersonLFS92> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/iccbss/AndersonFRR03> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/iciap/TruccoARI05> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/icnp/ElySWSA01>  is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ifip/AndersonRR04> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/sc/BorchersASW95> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/seaai/AndersonH98>  is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/srds/Anderson86> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/words/AndersonFRR05> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/bell/LiuBFSRA04>  is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/cj/LemosSA92> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/Anderson01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/Anderson03>  is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/ZorianASTI96>  is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/software/LemosSA95>  is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/ton/SavageWKA01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/tse/AndersonBHM85>  is dblp:editor of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/sigcomm/2006> Vice President O-in Design Automation inc. USA Professor, University of Newcastle Professor, Heriot Watt University University of Washington University of California, Berkely Tom Andersen - University of Denmark Lucent Technologies, Illinois
URI Disambiguation in the Context of Linked Data DBLP Author Disambiguation Results 92% of authors with common names had publications incorrectly merged Worst case - 15 different authors with 1 URI Many authors who are the same have publications under different names (Cliff Jones, C.B. Jones) Inconsistency in data means inconsistency with linked data It is incorrect to use owl:sameAs to link different authors who have the same URI 12 LDOW2008 - Beijing, China
URI Disambiguation in the Context of Linked Data DBpedia DBpedia 3.0 improves disambiguation management by including the ‘disambiguates’ property owl:sameAs linkage still inconsistent: 	<http://dbpedia.org/resource/Welsh >		owl:sameAs 	<http://sw.cyc.com/2006/07/27/cyc/EthnicGroupOfWelsh>  . 	<http://sw.cyc.com/2006/07/27/cyc/Welsh-TheWord>  . 	<http://sw.cyc.com/2006/07/27/cyc/WelshLanguage>  . 	<http://sw.cyc.com/2006/07/27/cyc/Welshing-Cheating>  . <http://dbpedia.org/resource/H.P._Lovecraft>	owl:sameAs  <http://sw.cyc.com/2006/07/27/cyc/HPLovecraft-Author>  . 	<http://zitgist.com/music/artist/8047a401-5ca7-48dd-9d7c-2d2b822e51e6>  . 13 LDOW2008 - Beijing, China
URI Disambiguation in the Context of Linked Data Possible Solutions CRS: Consistent Reference Service - Groups similar URIs into ‘bundles’ - Bundles can be made according to context - Each KB can have one or more CRSes OKKAM - Coming up soon! 14 LDOW2008 - Beijing, China
URI Disambiguation in the Context of Linked Data Summary Linked Data providers need to think about data consistency in the same way as database providers Failure to manage coreference within datasets leads to incorrect linkage with other datasets The network effect of the Web of Data means coreference needs to be even more carefully managed than in the Web of Documents Systems are being developed to help manage coreference, the community needs to decide how to handle the problem 15 LDOW2008 - Beijing, China
URI Disambiguation in the Context of Linked Data Questions? Further questions: a.o.jaffri hg	@ecs.soton.ac.uk icm 16 LDOW2008 - Beijing, China

More Related Content

What's hot

Linked data and rdf
Linked  data and rdfLinked  data and rdf
Linked data and rdf
Daniel Nüst
 
Creating Linked Data 2/5 Semtech2011
Creating Linked Data 2/5 Semtech2011Creating Linked Data 2/5 Semtech2011
Creating Linked Data 2/5 Semtech2011
Juan Sequeda
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
Matthew Rowe
 
Libraries and Linked Data: Looking to the Future (1)
Libraries and Linked Data: Looking to the Future (1)Libraries and Linked Data: Looking to the Future (1)
Libraries and Linked Data: Looking to the Future (1)
ALATechSource
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)
ALATechSource
 

What's hot (20)

Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web Data
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
 
Linked Data: turning the web into a context graph
Linked Data: turning the web into a context graphLinked Data: turning the web into a context graph
Linked Data: turning the web into a context graph
 
Linked data and rdf
Linked  data and rdfLinked  data and rdf
Linked data and rdf
 
Creating Linked Data 2/5 Semtech2011
Creating Linked Data 2/5 Semtech2011Creating Linked Data 2/5 Semtech2011
Creating Linked Data 2/5 Semtech2011
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento Routing
 
(Open) Data on the Web, future directions at W3C.
(Open) Data on the Web, future directions at W3C.(Open) Data on the Web, future directions at W3C.
(Open) Data on the Web, future directions at W3C.
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
 
Exploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorialExploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorial
 
Dataincubator
DataincubatorDataincubator
Dataincubator
 
Semantic Web Good News
Semantic Web Good NewsSemantic Web Good News
Semantic Web Good News
 
Libraries and Linked Data: Looking to the Future (1)
Libraries and Linked Data: Looking to the Future (1)Libraries and Linked Data: Looking to the Future (1)
Libraries and Linked Data: Looking to the Future (1)
 
Linked open Vocabularies for Linked Open Data - the role of AGROVOC
Linked open Vocabularies for Linked Open Data - the role of AGROVOCLinked open Vocabularies for Linked Open Data - the role of AGROVOC
Linked open Vocabularies for Linked Open Data - the role of AGROVOC
 
Presentation at the ISTIC workshop on Knowleddge Organization
Presentation at the ISTIC workshop on Knowleddge OrganizationPresentation at the ISTIC workshop on Knowleddge Organization
Presentation at the ISTIC workshop on Knowleddge Organization
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)
 
Something about links
Something about linksSomething about links
Something about links
 
Database Researchers Map
Database Researchers MapDatabase Researchers Map
Database Researchers Map
 
Evolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic WebEvolutionary & Swarm Computing for the Semantic Web
Evolutionary & Swarm Computing for the Semantic Web
 

Viewers also liked

Viewers also liked (7)

Using interface encapsulation to listen to linked data predicates
Using interface encapsulation to listen to linked data predicatesUsing interface encapsulation to listen to linked data predicates
Using interface encapsulation to listen to linked data predicates
 
Action 85
Action 85Action 85
Action 85
 
SAFE2015 workshop at ISCRAM2015
SAFE2015 workshop at ISCRAM2015SAFE2015 workshop at ISCRAM2015
SAFE2015 workshop at ISCRAM2015
 
IOGDC Open Data Tutorial
IOGDC Open Data TutorialIOGDC Open Data Tutorial
IOGDC Open Data Tutorial
 
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
 
Flink Case Study: OKKAM
Flink Case Study: OKKAMFlink Case Study: OKKAM
Flink Case Study: OKKAM
 

Similar to URI Disambiguation in the Context of Linked Data

Linked data: spreading data over the web
Linked data: spreading data over the webLinked data: spreading data over the web
Linked data: spreading data over the web
shellac
 
¿ARCHIVO?
¿ARCHIVO?¿ARCHIVO?
¿ARCHIVO?
ESPOL
 
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
Ross Singer
 

Similar to URI Disambiguation in the Context of Linked Data (20)

Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve Meyer
 
The methods and practices of Linked Open Data
The methods and practices of Linked Open DataThe methods and practices of Linked Open Data
The methods and practices of Linked Open Data
 
BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIG
 
Linked data: spreading data over the web
Linked data: spreading data over the webLinked data: spreading data over the web
Linked data: spreading data over the web
 
¿ARCHIVO?
¿ARCHIVO?¿ARCHIVO?
¿ARCHIVO?
 
que hisciste el verano pasado
que hisciste el verano pasadoque hisciste el verano pasado
que hisciste el verano pasado
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic Web
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linking up your data
Linking up your dataLinking up your data
Linking up your data
 
Linked Data + Drupal for Oceanographic data management
Linked Data + Drupal for Oceanographic data managementLinked Data + Drupal for Oceanographic data management
Linked Data + Drupal for Oceanographic data management
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked Data
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked Data
 
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RES
 
Linked Data and Tools
Linked Data and ToolsLinked Data and Tools
Linked Data and Tools
 
Linked Data and Tools
Linked Data and ToolsLinked Data and Tools
Linked Data and Tools
 
Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

URI Disambiguation in the Context of Linked Data

  • 1. http://dbpedia.org/resource/Tim_Berners-Lee http://dbpedia.org/resource/Spain http://acm.rkbexplorer.com/id/resource-P112732 URI Disambiguation in the Context of Linked Data http://sws.geonames.org/2510769 http://acm.rkbexplorer.com/id/person-282197 http://id.ecs.soton.ac.uk/person/7113 http://www.w3.org/People/Berners-Lee/card#i http://id.ecs.soton.ac.uk/person/21 http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007 http://citeseer.rkbexplorer.com/id/resource-CSP109020 http://southampton.rkbexplorer.com/id/person-00021 http://www4.wiwiss.fu-berlin.de/factbook/resource/Spain
  • 2. URI Disambiguation in the Context of Linked Data Presentation Outline Linked Data Repositories Coreference on the Semantic Web Author Disambiguation DBLP Linked Data DBLP Author Disambiguation Disambiguation Results DBpedia Possible Solutions Summary LDOW2008 - Beijing, China 2
  • 3. URI Disambiguation in the Context of Linked Data RKBexplorer.com Contains URIs for more than 10 million entities Over 25 Linked Data sites, including: Data relating to people, projects, papers and institutions A single entity has a number of URIs (even within the same repository) Entities are linked using CRSes LDOW2008 - Beijing, China 3 DBLP
  • 4. URI Disambiguation in the Context of Linked Data Linked Data Repositories Existing databases on the Web are being exposed as Linked Data (D2R, Virtuoso) Databases contain inconsistencies and require constant curation Datasets such as Wikipedia are being continually checked and updated, especially in the case of disambiguation (WikiProject_Disambiguation) Linked Data repositories should also provide consistent data LDOW2008 - Beijing, China 4
  • 5. URI Disambiguation in the Context of Linked Data Disambiguation on the Semantic Web Coreference on the Semantic Web is defined as being the situation where two or more URIs are used for a single non-information resource URI usage can change with context Non-Information resource equality is hard to define precisely Examples ‘Hugh Glaser’ at Southampton vs. ‘Hugh Glaser’ at Imperial ‘Harry Potter and the Order of the Phoenix’ in Hardback vs. Softback ISBN: 978-0747561071 978-0747551003 5 LDOW2008 - Beijing, China
  • 6. URI Disambiguation in the Context of Linked Data URI Multiplicity URIs for ‘Spain’: http://dbpedia.org/resource/Spain http://ww4.wiwiss.fu-berlin.de/factbook/resource/Spain http://sws.geonames.org/2510769 http://www4.wiwiss.fu-berlin.de/eurostat/resource/countries/Espa%C3%Bla URIs for ‘Hugh Glaser’: http://acm.rkbexplorer.com/id/resource-P112732 http://citeseer.rkbexplorer.com/id/resource-CSP109020 http://citeseer.rkbexplorer.com/id/resource-CSP109013 http://citeseer.rkbexplorer.com/id/resource-CSP109011 http://citeseer.rkbexplorer.com/id/resource-CSP109002 http://dblp.rkbexplorer.com/id/resource-27de9959 http://europa.eu/People/#person-0ff816fa http://resist.ecs.soton.ac.uk/wiki/User:hugh_glaser http://id.ecs.soton.ac.uk/people/21 6 LDOW2008 - Beijing, China
  • 7. URI Disambiguation in the Context of Linked Data Author Disambiguation A known problem in the Information Science field How to determine: Hugh Glaser/H. Glaser/Glaser, H. are the same person? How to determine: Tom Anderson – Newcastle University Tom Anderson – University of Washington are different people? 7 LDOW2008 - Beijing, China
  • 8. URI Disambiguation in the Context of Linked Data Existing Approaches String Metrics - Name Equivalence identification - Record Linkage - Citation Matching Web Assisted - Look up publications on author’s home page - Use search engine results on publication title Machine Learning - k-way spectral clustering - Use author name, co-author frequency and publication venue 8 LDOW2008 - Beijing, China
  • 9. URI Disambiguation in the Context of Linked Data DBLP Linked Data Converted from an XML dump of DBLP database 950 000 Publications 540 000 Authors 28 million triples Updated Weekly Linked to other datasets including RDF Book Mashup and RKBExplorer.com 9 LDOW2008 - Beijing, China
  • 10. URI Disambiguation in the Context of Linked Data DBLP Author Disambiguation 49 names - 10 most common English surnames with 5 common first names Authors disambiguated by looking at homepage, web publication, search engine results and institution When in doubt, authors assumed to be the same if: - The co-authors of any publication are the same - The publication venue was the same - The area of research was the same 10 LDOW2008 - Beijing, China
  • 11. 8 LDOW2008 – Beijing, China URI Disambiguation in the Context of Linked Data It’s all about Identity Tom Anderson – http://www4.wiwiss.fu-berlin.de/dblp/resource/person/109074 Is dc:creator of <http://www4.wiwiss.fu berlin.de/dblp/resource/record/conf/dac/MorettiHNCKABDF01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ftcs/SaeedLA91> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ftrtft/LemosSA92> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/hybrid/AndersonLFS92> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/iccbss/AndersonFRR03> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/iciap/TruccoARI05> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/icnp/ElySWSA01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ifip/AndersonRR04> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/sc/BorchersASW95> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/seaai/AndersonH98> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/srds/Anderson86> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/words/AndersonFRR05> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/bell/LiuBFSRA04> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/cj/LemosSA92> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/Anderson01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/Anderson03> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/ZorianASTI96> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/software/LemosSA95> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/ton/SavageWKA01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/tse/AndersonBHM85> is dblp:editor of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/sigcomm/2006> Vice President O-in Design Automation inc. USA Professor, University of Newcastle Professor, Heriot Watt University University of Washington University of California, Berkely Tom Andersen - University of Denmark Lucent Technologies, Illinois
  • 12. URI Disambiguation in the Context of Linked Data DBLP Author Disambiguation Results 92% of authors with common names had publications incorrectly merged Worst case - 15 different authors with 1 URI Many authors who are the same have publications under different names (Cliff Jones, C.B. Jones) Inconsistency in data means inconsistency with linked data It is incorrect to use owl:sameAs to link different authors who have the same URI 12 LDOW2008 - Beijing, China
  • 13. URI Disambiguation in the Context of Linked Data DBpedia DBpedia 3.0 improves disambiguation management by including the ‘disambiguates’ property owl:sameAs linkage still inconsistent: <http://dbpedia.org/resource/Welsh > owl:sameAs <http://sw.cyc.com/2006/07/27/cyc/EthnicGroupOfWelsh> . <http://sw.cyc.com/2006/07/27/cyc/Welsh-TheWord> . <http://sw.cyc.com/2006/07/27/cyc/WelshLanguage> . <http://sw.cyc.com/2006/07/27/cyc/Welshing-Cheating> . <http://dbpedia.org/resource/H.P._Lovecraft> owl:sameAs <http://sw.cyc.com/2006/07/27/cyc/HPLovecraft-Author> . <http://zitgist.com/music/artist/8047a401-5ca7-48dd-9d7c-2d2b822e51e6> . 13 LDOW2008 - Beijing, China
  • 14. URI Disambiguation in the Context of Linked Data Possible Solutions CRS: Consistent Reference Service - Groups similar URIs into ‘bundles’ - Bundles can be made according to context - Each KB can have one or more CRSes OKKAM - Coming up soon! 14 LDOW2008 - Beijing, China
  • 15. URI Disambiguation in the Context of Linked Data Summary Linked Data providers need to think about data consistency in the same way as database providers Failure to manage coreference within datasets leads to incorrect linkage with other datasets The network effect of the Web of Data means coreference needs to be even more carefully managed than in the Web of Documents Systems are being developed to help manage coreference, the community needs to decide how to handle the problem 15 LDOW2008 - Beijing, China
  • 16. URI Disambiguation in the Context of Linked Data Questions? Further questions: a.o.jaffri hg @ecs.soton.ac.uk icm 16 LDOW2008 - Beijing, China

Editor's Notes

  1. Named graphs cannot be made in RDF, outside frameworkHow to decide which graph data comes from?
  2. Explain more, slow downWe thought Tom Anderson was being funded by NSF