Advertisement

Linked Open Data

Researcher (tenure track) at Centrum Wiskunde & Informatica
May. 20, 2016
Advertisement

More Related Content

Advertisement
Advertisement

Linked Open Data

  1. Linked Open Data SIKS course on Data Science May 20, 2016 Vught. Laura Hollink
  2. Why do we create and use Linked Open Data? Example questions from the humanities and social sciences How did the debate about the financial crisis in Greece develop?
  3. Searching the proceedings of the EU Parliament "Greece" in the plenary meetings of the European Parliament Year Nr.ofmentions 050100150200 1999 2000 2001 2001 2002 2003 2004 2005 2006 2006 2007 2008 2009 2010 2010 2011 2012 2013
  4. Searching through newspaper archives Mentions of “Griekenland” in the Dutch newspaper the Telegraaf.
  5. Search volumes on a search engine Query = “Greece” http://www.google.com/trends
  6. Search volumes on a search engine Query = “Greece” http://www.google.com/trends We need access to data. Analysing them gives us some useful insight. But to answer the question properly we would need to combine sources and do more complex queries.
  7. Why do we create and use Linked Open Data? Example question 2 Which political debate in the post-war period has attracted most media attention?
  8. “De Indonesische Quaestie"
  9. “De Indonesische Quaestie" To answer this question we need to go through all newspaper articles about all political debates. -> we need access to combined data sources, we need structured queries.
  10. Why do we create and use Linked Open Data?
  11. Why do we create and use Linked Open Data? Example question 3 What are the differences between different media? Example question 4 Has the coverage changed over time?
  12. Research goals and research questions Our goal is to build an infrastructure to answer these kinds of questions. 1. How do we automatically link heterogeneous datasets? 2. How do we interpret links between datasets of different quality and certainty? 3. What can we conclude from usage statistics on these datasets? 4. Can we design interfaces that allow scholars to study the datasets • including the links between them? • while assessing the reliability of the findings?
  13. Research goals and research questions Our goal is to build an infrastructure to answer these kinds of questions. 1. How do we automatically link heterogeneous datasets? 2. How do we interpret links between datasets of different quality and certainty? 3. What can we conclude from usage statistics on these datasets? 4. Can we design interfaces that allow scholars to study the datasets • including the links between them? • while assessing the reliability of the findings? Data Science - Big Data - Linked Open Data
  14. Table of Contents 1. What is Linked Open Data (LOD) 2. Creating LOD 1. How to discover links 2. How to represent links on the Web 3. How to evaluate links 3. Access to LOD (from both the server and the client perspective)
  15. What is Linked Open Data?
  16. What is Linked Open Data?
  17. What is Linked Open Data? A method of publishing structured data on the Web in such a way that it can be linked and queried by computers as well as humans.
  18. The Web of Documents
  19. The Web of Documents • Documents  identified  by  URIs  (html,  pdf,  images,  movies,  etc.)   • with  structured  information  for  humans  (tables,  headers)  and   • with  hyperlinks  between  them   • The  data  is  not  machine  readable,  meant  for  humans   • structure  is  implicit  (what  do  the  columns  of  a  table  mean?)   • links  are  not  typed  (what  is  the  relation  between  two  documents?)  
  20. The Web of Data
  21. The Web of Data • Everything  identified  by  URIs  (not  just  documents,  but  also  classes,   instances,  relations/links)   • The  data  is  machine  readable:     • in  formal  languages  (RDF,  RDFS,  OWL,  SKOS)     • which  enable  machines  to  do  reasoning,  i.e.  infer  new  statements   from  inserted  statements.  
  22. Compared to a database table… Amsterdam has population “1364422” City Schiphol is a has airport
  23. Thing Type Population Airport Amsterdam City 1364422 Schiphol …. … …. … Compared to a database table… Amsterdam has population “1364422” City Schiphol is a has airport
  24. Differences: • Statements can be distributed over the web • Non-unique naming assumption • Open World assumption • Everyone can say anything about anything Thing Type Population Airport Amsterdam City 1364422 Schiphol …. … …. … Compared to a database table… Amsterdam has population “1364422” City Schiphol is a has airport
  25. Examples of URIs on the Web of Data • documents: • http://vu.nl/index.html • http://example.org/cities#Leuven • real world objects (a book in the library, a person) • isbn://5031-4444-333 • http://eyaloren.org/foaf.rdf#me • concepts: • http://cyc.org/concept/Mammal • http://cyc.org/concept/Dog • www.w3.org/2006/03/wn/wn20/instances/synset-anniversary-noun-1 • relations: • http://purl.org/linkedpolitics/vocabulary/speaker
  26. RDF (the basics) • A W3C recommendation to describe resources on the Web of Data called “Resource description Framework” • See https://www.w3.org/RDF/ • RDF data model: triples!
  27. RDF (the basics) • A W3C recommendation to describe resources on the Web of Data called “Resource description Framework” • See https://www.w3.org/RDF/ • RDF data model: triples!
  28. RDF (the basics) • A W3C recommendation to describe resources on the Web of Data called “Resource description Framework” • See https://www.w3.org/RDF/ • RDF data model: triples! RDF example in Turtle syntax: <bob#me> a foaf:Person ; foaf:knows <alice#me> ; schema:birthDate "1990-07-04"^^xsd:date ; foaf:topic_interest wd:Q12418 .
  29. Vocabulary definition and reasoning with RDFS B C r A data level ontology / vocabulary / schema level
  30. Vocabulary definition and reasoning with RDFS A B C IF B rdfs:subClassOf A C rdfs:subClassOf B THEN C rdfs:subClassOf A B C r A data level ontology / vocabulary / schema level
  31. Vocabulary definition and reasoning with RDFS A B A B C IF B rdfs:subClassOf A C rdfs:subClassOf B THEN C rdfs:subClassOf A IF C rdfs:subClassOf B r rdf:type C THEN r rdf:type B B C r A data level ontology / vocabulary / schema level
  32. Vocabulary definition and reasoning with RDFS A B A B C IF B rdfs:subClassOf A C rdfs:subClassOf B THEN C rdfs:subClassOf A IF B rdfs:subClassOf A r rdf:type B THEN r rdf:type A <bob#me> rdf:type foaf:Person . foaf:Person rdfs:subClassOf foaf:Agent .
  33. Vocabulary definition and reasoning with RDFS A B A B C IF B rdfs:subClassOf A C rdfs:subClassOf B THEN C rdfs:subClassOf A IF B rdfs:subClassOf A r rdf:type B THEN r rdf:type A <bob#me> rdf:type foaf:Person . foaf:Person rdfs:subClassOf foaf:Agent . <bob#me> a foaf:Agent .
  34. Vocabulary definition and reasoning with RDFS A B A B C IF B rdfs:subClassOf A C rdfs:subClassOf B THEN C rdfs:subClassOf A IF B rdfs:subClassOf A r rdf:type B THEN r rdf:type A <bob#me> rdf:type foaf:Person . foaf:Person rdfs:subClassOf foaf:Agent . <bob#me> a foaf:Agent . Standard meaning
  35. Vocabulary definition and reasoning with RDFS IF p rdfs:range R A p B THEN B rdf:type R <bob#me> foaf:knows <alice#me> . foaf:knows rdfs:range foaf:Person .
  36. Vocabulary definition and reasoning with RDFS IF p rdfs:range R A p B THEN B rdf:type R <bob#me> foaf:knows <alice#me> . foaf:knows rdfs:range foaf:Person . <alice#me> rdf:type foaf:Person .
  37. Vocabulary definition and reasoning with RDFS IF p rdfs:range R A p B THEN B rdf:type R <bob#me> foaf:knows <alice#me> . foaf:knows rdfs:range foaf:Person . <alice#me> rdf:type foaf:Person . Standard meaning
  38. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/
  39. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant .
  40. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant .
  41. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what .
  42. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant
  43. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant
  44. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant Query: ?who :playedIn :Giant.
  45. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant Query: ?who :playedIn :Giant. Answer: :JamesDean
  46. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant Query: ?who :playedIn :Giant. Answer: :JamesDean
  47. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant Query: ?who :playedIn :Giant. Answer: :JamesDean Query: :JamesDean ?what :Giant.
  48. SPARQL (the basics) • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant Query: ?who :playedIn :Giant. Answer: :JamesDean Query: :JamesDean ?what :Giant. Answer: :playedIn
  49. Linked Open Data A method of publishing on the Web of Data: openly available, in RDF, with links to other datasets. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  50. Linked Open Data A method of publishing on the Web of Data: openly available, in RDF, with links to other datasets. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  51. Creating Linked Open Data in the Talk of Europe project: Discovering links, knowledge representation
  52. Creating Linked Open Data in the Talk of Europe project: Discovering links, knowledge representation
  53. The European Parliament as Linked Open Data Laura Hollink Centrum Wiskunde & Informatica, Amsterdam Astrid van Aggelen VU University Amsterdam Martijn Kleppe Erasmus University Rotterdam Henri Beunders Erasmus University Rotterdam Jill Briggeman Erasmus University Rotterdam Max Kemman University of Luxembourg
  54. Talk of Europe goals • To publish the entire plenary debates of the European Parliament as Linked Open Data • To improve access to the data • To enable large scale analysis across time spans. ‣To residents of the European Union access to the proceedings of the European parliament is a formal right. A. van Aggelen, L. Hollink, M. Kemman, M. Kleppe & H. Beunders. The debates of the European Parliament as Linked Open Data. Semantic Web Journal. In press, 2016.
  55. 1. Data in RDF
  56. 1. Data in RDF
  57. 1. Data in RDF 14M RDF statements about the 30K speeches in 23 languages by 3K speakers in 1K session days that were held in the EU parliament between 1999 and 2014
  58. 2. Links to external datasets •
  59. 2. Links to external datasets •
  60. 2. Links to external datasets •
  61. Example 1: speeches that contain a certain keyword Query: all speeches that contain the phrase “open data” …. So let us go for open data, let us go for utilisation of all the instruments available to that end! ….. …. but there too governments are encouraging the use of open data to increase transparency, accountability and citizen participation …. …. We already have many open data projects in the Member States and local authorities…..
  62. Example 2: speeches that contain a certain keyword by date "Slovenia" in the plenary meetings of the European Parliament Year Nr.ofmentions 020406080100 1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
  63. Example 2: speeches that contain a certain keyword by date "Slovenia" in the plenary meetings of the European Parliament Year Nr.ofmentions 020406080100 1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
  64. Example 2: speeches that contain a certain keyword by date Mentions of 'human rights' dates Frequency 0200400600800 1999 2000 2001 2003 2004 2005 2006 2007 2009 2010 2011 2012 2013
  65. Example 3: speeches that contain a certain keyword by country AT BE BG CY CZ DE DK EE ES FI FR GB GR HR HU IE IT LT LU LV MT NL PL PT RO SE SI SK Mentions of 'human rights' by country 01000200030004000500060007000
  66. Example 4: the number of speeches per EU country SELECT ?c (COUNT(?c) as ?count) WHERE { ?x rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/plenary/Speech>. ?x <http://purl.org/linkedpolitics/vocabulary#speaker> ?p. ?p <http://purl.org/linkedpolitics/vocabulary#countryOfRepresentation> ?c } GROUP BY ?c LIMIT 50
  67. Example 5: background info about the MEPs • MEPs that were not born in Europe. Members of Parliament
  68. Example 5: background info about the MEPs • MEPs that were not born in Europe. Members of Parliament
  69. Example 5: background info about the MEPs • MEPs that were not born in Europe. Members of Parliament
  70. Example 5: background info about the MEPs • MEPs that were not born in Europe. Members of Parliament
  71. Example 5: background info about the MEPs • MEPs that were not born in Europe. Members of Parliament
  72. Example 5: background info about the MEPs • MEPs that were not born in Europe. Members of Parliament Integrate data from the EU parliament with external datasets
  73. Linking Members of Parliament to Wikipedia / DBpedia
  74. Linking Members of Parliament to Wikipedia / DBpedia
  75. Linking Members of Parliament to Wikipedia / DBpedia
  76. Linking Members of Parliament to Wikipedia / DBpedia • String matching is the most important feature in the linking process. • “nearly all [alignment systems] use a string similarity metric” [12] • stopping and stemming is not helpful! Nor is using WordNet synonyms. [12] [12] Cheatham, M., & Hitzler, P. String similarity metrics for ontology alignment. ISWC 2013. http://www.dbpedia.org/resource/Judith_Sargentini
  77. Linking Members of Parliament to Wikipedia / DBpedia • String matching is the most important feature in the linking process. • “nearly all [alignment systems] use a string similarity metric” [12] • stopping and stemming is not helpful! Nor is using WordNet synonyms. [12] [12] Cheatham, M., & Hitzler, P. String similarity metrics for ontology alignment. ISWC 2013. http://www.dbpedia.org/resource/Judith_Sargentini
  78. How to relate a speech to a speaker and party? lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  79. How to relate a speech to a speaker and party? Why is this not a good solution? lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  80. How to relate a speech to a speaker and party? Why is this not a good solution? 1. A person might be a member of more than one party (at different times) lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  81. How to relate a speech to a speaker and party? Why is this not a good solution? 1. A person might be a member of more than one party (at different times) 2. Since there is no link between a speech and a party, queries for all speeches spoken by the members of a certain party become very complicated. lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  82. How to relate a speech to a speaker and party? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker
  83. How to relate a speech to a speaker and party? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:speaker rdf:type
  84. How to relate a speech to a speaker and party? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:speaker rdf:type "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:spokenAs lpv:speaker lpv:spokenAs rdf:type
  85. How to relate a speech to a speaker and party? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:speaker rdf:type "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:spokenAs lpv:speaker lpv:spokenAs rdf:type Note: this is a common “design pattern” referred to as n-ary relations or relations as classes
  86. Intermezzo: one-question Quiz Reasoning on the Web of Data Question: What can we conclude from this graph? A. Stihler is a member of exactly 3 parties B. Stihler is a member of at least 3 parties C. Stihler is a member of at most 3 parties D. None of the above E. All of the above F. Other, namely …. http://purl.org/linkedpolitics/EUmember_4545 "Catherine Stihler"foaf:name http://purl.org/linkedpolitics/EUParty/PES http://dbpedia.org/resource/ Party_of_European_Socialists http://dbpedia.org/resource/ Progressive_Alliance_of_Socialists_and_Democrats :memberOf :memberOf :memberOf
  87. Creating Linked Open Data in the PoliMedia project: Discovering links, knowledge representation, evaluation
  88. Creating Linked Open Data in the PoliMedia project: Discovering links, knowledge representation, evaluation
  89. Linking government data to news data
  90. Which political debate in the post-war period has attracted most media attention? What are the differences between different media? Has the coverage changed over time?
  91. Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Roughly 1.8 Million news bulletins between 1937-1984 (We only use 1945-1995) Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995) Transcriptions of all meetings of the European Parliament between 1999 and 2014.
  92. Links in PoliMedia is about • 3 Million links
  93. Discovering links between politics and news Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate
  94. Step 2: generate links Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate
  95. Step 2: generate links Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.
  96. Representation of links :speech123:newsArticle456 :isAbout
  97. Representation of links • Note: this is another example of the“design pattern” referred to as n-ary relations or relations as classes! • It allows us to save provenance information about the statements we create. :speech123:newsArticle456 :isAbout
  98. Representation of links • Note: this is another example of the“design pattern” referred to as n-ary relations or relations as classes! • It allows us to save provenance information about the statements we create. :speech123:newsArticle456 :isAbout :speech123 :newsArticle456 :link001 01-02-2013 :PoliMedia_Linking_Engine :quotes :concept1 :concept2 link type :madeBy:creationDate
  99. Evaluation of links
  100. Evaluation of links 1. Manually rating (a sample of) links • relatively cheap and easy to interpret • only precision, no recall
  101. Evaluation of links 1. Manually rating (a sample of) links • relatively cheap and easy to interpret • only precision, no recall 2. Comparison to a reference linkset • precision and recall • used in OAEI on the SEALS platform • more expensive if a reference alignment has to be created (but: crowd sourcing!)
  102. Evaluation of links 1. Manually rating (a sample of) links • relatively cheap and easy to interpret • only precision, no recall 2. Comparison to a reference linkset • precision and recall • used in OAEI on the SEALS platform • more expensive if a reference alignment has to be created (but: crowd sourcing!) 3. End-to-end evaluation (a.k.a. evaluating an application that uses the mappings) • arguably the best method! • need to have access to an application + users
  103. Evaluation of links: beyond precision / recall B C r A data level ontology / vocabulary / schema level
  104. Evaluation of links: beyond precision / recall Generalized precision and Generalized recall • Instead of a binary classification into correct/ incorrect mappings, take into account how wrong an link is: • where r(a) is the semantic distance between correspondence a and correspondence a’ in the reference alignment, A is the number of correspondences. Laura Hollink, Mark van Assem, Shenghui Wang, Antoine Isaac, Guus Schreiber. Two Variations on Ontology Alignment Evaluation: Methodological Issues.ESWC 2008. B C r A data level ontology / vocabulary / schema level
  105. Evaluation of links in PoliMedia How good are the links? • We ask 2 raters to manually score pairs of newspaper articles and speeches. • a pilot study showed that we needed more than a 2 point scale. • inter-rater agreement: 0.5 -> acceptable, but not high. • Precision: 80%
  106. Evaluation of links in PoliMedia Setting 1 Setting 2 Setting 3 0,48 0,62 0,8 How good are the links? • We ask 2 raters to manually score pairs of newspaper articles and speeches. • a pilot study showed that we needed more than a 2 point scale. • inter-rater agreement: 0.5 -> acceptable, but not high. • Precision: 80%
  107. Evaluation of links in PoliMedia Setting 1 Setting 2 Setting 3 0,48 0,62 0,8 How many links did we miss? • We ask the raters to manually search the KB archives for related articles. • Recall: 62% How good are the links? • We ask 2 raters to manually score pairs of newspaper articles and speeches. • a pilot study showed that we needed more than a 2 point scale. • inter-rater agreement: 0.5 -> acceptable, but not high. • Precision: 80%
  108. DEMO - PoliMedia search application
  109. Online database: “SPARQL endpoint” • A service to query a knowledge base using the SPARQL query language. “All speeches with more than 60 associated news items.”
  110. Access to Linked Open Data: how to serve and how to consume Linked Open Data
  111. Access to Linked Open Data: how to serve and how to consume Linked Open Data
  112. Access to LOD: 1. download a data dump
  113. Access to LOD: 1. download a data dump From server logs we know the query -some context of the requested URIs -variable names (?)
  114. Access to LOD 2: follow-your-nose
  115. Access to LOD 2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6
  116. Access to LOD 2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart
  117. Access to LOD 2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent dc:hasPart
  118. Access to LOD 2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker dc:hasPart lp:Martin_Schulz
  119. Access to LOD 2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker dc:hasPart lp:Martin_Schulz lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker owl:sameAs http:://dbpedia.org/ resource/Martin_Schulz dc:hasPart lp:Martin_Schulz
  120. Access to LOD 2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker dc:hasPart lp:Martin_Schulz lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker owl:sameAs http:://dbpedia.org/ resource/Martin_Schulz dc:hasPart lp:Martin_Schulz lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker owl:sameAs http:://dbpedia.org/ resource/Martin_Schulz dc:hasPart lp:Martin_Schulz dbp:children "2" lpv:speaker dbc:Officiers_of_the_Légion_d'honneur
  121. Access to LOD 2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker dc:hasPart lp:Martin_Schulz lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker owl:sameAs http:://dbpedia.org/ resource/Martin_Schulz dc:hasPart lp:Martin_Schulz lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker owl:sameAs http:://dbpedia.org/ resource/Martin_Schulz dc:hasPart lp:Martin_Schulz dbp:children "2" lpv:speaker dbc:Officiers_of_the_Légion_d'honneur From server logs we know the requested URI: GET /Martin_Schulz HTTP/1.0 Accept: application/rdf+xml
  122. Count the agenda items in which at least one MEP from France spoke out. Access to LOD: 3. SPARQL SELECT (COUNT (DISTINCT ?ai) as ?count) WHERE { ?ai rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/ plenary/AgendaItem ?ai dcterms:hasPart ?speech. ?speech lpv:speaker ?speaker. ?speaker lpv:countryOfRepresentation ?country. ?country rdfs:label ?label. filter(?label="France"@en) }
  123. From server logs we know the query -some context of the requested URIs -variable names (?)
  124. Access to LOD: 4. Linked Data Fragments xxx.xxx.xxx.xxx - - [17/Oct/2014:07:43:02 +0000] 
 "GET /2014/en?subject=&predicate=&object=dbpedia%3AAustin HTTP/1.1" 200 1309 "http://fragments.dbpedia.org/2014/en" …
  125. Access to LOD: 4. Linked Data Fragments xxx.xxx.xxx.xxx - - [17/Oct/2014:07:43:02 +0000] 
 "GET /2014/en?subject=&predicate=&object=dbpedia%3AAustin HTTP/1.1" 200 1309 "http://fragments.dbpedia.org/2014/en" … From server logs we know the triple patterns that were requested -some context of the requested URIs -variable names (?)
  126. What do we know about usage of Linked Open Data?
  127. What do we know about usage of Linked Open Data?
  128. 1. Yearly datasets of server logs released for research purposes, 2011-2016 Luczak-Roesch, Markus, Aljaloud, Saud, Berendt, Bettina and Hollink, Laura (2016) USEWOD 2016 Research Dataset. doi:10.5258/SOTON/385344 2. Yearly workshops for researchers on Usage Data and the Web of Data, 2011-2016 Laura Hollink, Markus Luczak-Roesch, Bettina Berendt, et al. http://usewod.org/ USEWOD2011 2016 Linked Open Data query log analysis?
  129. 1. Yearly datasets of server logs released for research purposes, 2011-2016 Luczak-Roesch, Markus, Aljaloud, Saud, Berendt, Bettina and Hollink, Laura (2016) USEWOD 2016 Research Dataset. doi:10.5258/SOTON/385344 2. Yearly workshops for researchers on Usage Data and the Web of Data, 2011-2016 Laura Hollink, Markus Luczak-Roesch, Bettina Berendt, et al. http://usewod.org/ USEWOD2011 2016 Linked Open Data query log analysis? Licensing + Anonymization: replace all IPs with a country code and an identifier
  130. What has been found so far? • Efficient index generation [1] • Caching [2] • Auto-completion [3] • Hardware scaling at peak times [4] • modularisation of data [4] [1] Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de la Fuente, P. (2011). An empirical study of real-world SPARQL queries. USEWOD2011 [2] Lorey, J., & Naumann, F. Caching and prefetching strategies for sparql queries. USEWOD2013 [3] K. Kramer,R.Q. Dividino, and G. Gröner. SPACE: SPARQL Index for Efficient Autocompletion. ISWC2013 (Posters & Demos) [4] Luczak-Rösch, M., & Bischoff, M. (2011). Statistical analysis of web of data usage. EvoDyn2011 [5] Rietveld, L., & Hoekstra, R. Man vs. Machine: Differences in SPARQL Queries. USEWOD2014 [6] Huelss, J., & Paulheim, H. What SPARQL Query Logs Tell and do not Tell about Semantic Relatedness in LOD. NoISE @ ESWC 2015 Issues: • what is the difference between queries by machines and humans? [5] • what is the meaning of repeated queries by tools? Bots? • a lot of the usage is invisible due to data dump download [6]
  131. Reflection: to what extend can we now answer these questions? How did the debate about the financial crisis in Greece develop? Which political event has attracted most media attention? What are the differences between different media? Has the coverage changed over time?
  132. Reflection: to what extend can we now answer these questions? How did the debate about the financial crisis in Greece develop? Which political event has attracted most media attention? What are the differences between different media? Has the coverage changed over time? Yes, but: • what is the influence of the selection of newspapers available at the National Library? • what was the quality of the digitisation process (OCR)? • How good is our linking approach (based on automatically detected entities and topics)? ➡ How to handle these uncertainties is one of our research questions! We call this Tool Criticism
  133. Resources: PoliMedia demo: http://polimedia.nl/ PoliMedia project video: https://youtu.be/u24oRCj7xrQ Talk of Europe project: http://talkofeurope.eu/ Talk of Europe data: purl.org/linkedpolitics Talk of Europe project video: https://youtu.be/GxA53gkCe0o USEWOD workshop: http://usewod.org/ My website: http://homepages.cwi.nl/~hollink/ I’d be happy to answer your questions!
Advertisement