The Linked Data Lifecycle

729 views

Published on

ICCL Summer 2013 (http://www.computational-logic.org/content/events/iccl-ss-2013/) lectures on the Linked Data Life-Cycle

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
729
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Linked Data Lifecycle

  1. 1. The Linked Data Life-Cycle Jens Lehmann Quan Nguyen Sebastian Hellmann Claus Stadler Lorenz Bühmann contributors: Sören Auer Anja Jentzsch Christina Unger Richard Cyganiak Dimitris Kontokostas Daniel Gerber Axel Ngonga 2013-08-23 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 1 / 252
  2. 2. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 2 / 252
  3. 3. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 3 / 252
  4. 4. The Linked Data Principles The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. Linked Data principles: 1 Use URIs as names for things. 2 Use HTTP URIs, so that people can look up those names. 3 When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL). 4 Include links to other URIs, so that they can discover more things. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 4 / 252
  5. 5. LOD Cloud Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 5 / 252
  6. 6. Linked Data Principles Detailed: 1 + 2 1 URI references to identify not just Web documents and digital content, but also real world objects and abstract concepts tangible things: people, places abstract things: relationship type of knowing somebody 2 HTTP URIs enable re-use of Web architecture Linked Data gives emphasis to the Web in Semantic Web Resource dereferencing Re-use of standard tools for security, load-balancing etc. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 6 / 252
  7. 7. Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: machines Lehmann, Bühmann (Univ. Leipzig) HTML for humans, RDF for The Linked Data Life-Cycle 2013-08-23 7 / 252
  8. 8. Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: machines HTML for humans, RDF for Achievable using an HTTP mechanism called content negotiation Basic idea: HTTP client sends HTTP headers with each request to indicate what kinds of documents they prefer Servers can inspect headers and select appropriate response Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 7 / 252
  9. 9. Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: machines HTML for humans, RDF for Achievable using an HTTP mechanism called content negotiation Basic idea: HTTP client sends HTTP headers with each request to indicate what kinds of documents they prefer Servers can inspect headers and select appropriate response Two strategies: 303 URIs Hash URIs Both ensure that objects and the documents that describe them are not confused + humans and machines can retrieve appropriate representations Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 7 / 252
  10. 10. 303 URIs 303 Redirect: instead of sending the object itself over the network, the server responds to the client with the HTTP response code 303 See Other and the URI of a Web document which describes the real-world object Second step: client dereferences new URI and gets a Web document describing the real-world object Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 8 / 252
  11. 11. Hash URIs Hash URI strategy builds on characteristic that URIs may contain a special part ( fragment identier) separated from their base part by a hash symbol (#) HTTP protocol requires the fragment part to be stripped o before requesting the URI from the server → a URI that includes a hash cannot be retrieved directly and therefore does not necessarily identify a Web document Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 9 / 252
  12. 12. Hash versus 303 Hash Uris (+) Reduced number of necessary HTTP round-trips → reduces access latency (-) Descriptions of all resources sharing the same non-fragment URI part are always returned to the client together → can lead to large amounts of data being unnecessarily transmitted to the client 303 Uris (+) Flexible because the redirection target can be congured separately for each resource (usually points to a single document for each resource, but could also summarise several resources) (-) Requires two HTTP requests to retrieve a single description of a real-world object Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 10 / 252
  13. 13. Principles Detailed: 4 Links If an RDF triple connects URIs in dierent namespaces/datasets, is is called a link (no unique syntactical denition of link exists) Basic idea of Linked Data: apply the general hyperlink-based architecture of the World Wide Web to the task of sharing structured data on global scale Research challenge: ecient creation of links with high precision and recall Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 11 / 252
  14. 14. Why Linked Data? Problem: Try to search for these things on the current Web: Apartments near German-Russian bilingual childcare in Leipzig. ERP service providers with oces in Vienna and London. Researchers working on multimedia topics in Eastern Europe. Information is available on the Web, but opaque to current Web search. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 12 / 252
  15. 15. Why Linked Data? Problem: Try to search for these things on the current Web: Apartments near German-Russian bilingual childcare in Leipzig. ERP service providers with oces in Vienna and London. Researchers working on multimedia topics in Eastern Europe. Information is available on the Web, but opaque to current Web search. Solution: complement text on Web pages with structured linked open data intelligently combine/integrate such structured information from dierent sources: Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 13 / 252
  16. 16. How to get there? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 14 / 252
  17. 17. Tim Berners-Lee's 5-star plan Tim Berners-Lee's 5-star plan for an open web of data Make data available on the Web under an open license Make it available as structured data Use a non-proprietary format Use URIs to identify things Link your data to other people's data to provide context Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 15 / 252
  18. 18. The 0th star Data catalog with good metadata Make your data ndable Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 16 / 252
  19. 19. Data on the Web, Open License ���������� ���� �������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 17 / 252
  20. 20. Data on the Web, Open License Open vs. Closed: Data used to be closed by default In the future, it may be open by default. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 18 / 252
  21. 21. Data on the Web, Open License Publishers: sharing data to make it more visible Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 19 / 252
  22. 22. Data on the Web, Open License E-Commerce: Data sharing for increasing trac Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 20 / 252
  23. 23. Data on the Web, Open License Community: Collaboratively created databases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 21 / 252
  24. 24. Good reasons against opening data Privacy Competitive advantage Producing data and charging for it as business model Can't get license from upstream Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 22 / 252
  25. 25. Structured Data Enabling re-use: Delivering data to end users in dierent forms Combining data with other data 3rd party analysis of data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 23 / 252
  26. 26. Structured Data Formats: Good for re-use / Structured: MS Excel, CSV, XML, JSON, Microdata Not so good for re-use: Pure websites, MS Word Bad for re-use: PDF Really bad for re-use: Only charts/maps without numbers Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 24 / 252
  27. 27. �������� �������������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 25 / 252
  28. 28. Non-Proprietary Formats Specialist tools often have specialist formats Few people have the tools Expensive Dicult to re-use (Geospatial tools, statistics packages, etc.) Non-proprietary: CSV (dead simple) XML JSON RDF (good for 4+5 stars) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 26 / 252
  29. 29. URIs as Identiers ������������������������������������������������������������������������ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 27 / 252
  30. 30. URIs as Identiers ������������������������������������������������������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 28 / 252
  31. 31. URIs as Identiers URI-Design: prefer stable, implementation independent URIs Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 29 / 252
  32. 32. URIs as Identiers Turning local identiers into URIsWhy? Make them globally unique Clarify auhority Make them resolvable Make them linkable Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 30 / 252
  33. 33. Links to Other Data Hyperlinks are the soul of the Web. The Web of Data is no dierent. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 31 / 252
  34. 34. Links to Other Data Hyperlinks are the soul of the Web. The Web of Data is no dierent. ���� ����� ������� ����������������������������� �������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 31 / 252
  35. 35. Summary Linked Data Principles: 1 Use URIs to name things (not only documents, but also people, locations, concepts, etc.) 2 To enable agents (human users and machine agents alike) to look up those names, 3 use HTTP URIs When someone looks up a URI, provide useful information (structured data in RDF, SPARQL). 4 Include links to other URIs allowing agents to discover more things Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 32 / 252
  36. 36. Summary Linked Data Principles: 1 Use URIs to name things (not only documents, but also people, locations, concepts, etc.) 2 To enable agents (human users and machine agents alike) to look up those names, 3 use HTTP URIs When someone looks up a URI, provide useful information (structured data in RDF, SPARQL). links to other URIs allowing agents to discover more things 5-Star-Data: 4 Include Five-star plan for realising an emerging web of data, dataset by dataset 2 stars: re-usable data 3 stars: open standards 4+5 stars: connect data silos Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 32 / 252
  37. 37. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 33 / 252
  38. 38. DBpedia Community eort to extract structured information from Wikipedia and to make this information available on the Web Allows to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data Semi-structured Wiki markup Lehmann, Bühmann (Univ. Leipzig) → structured information The Linked Data Life-Cycle 2013-08-23 34 / 252
  39. 39. Wikipedia Limitations Simple Questions hard to answer with Wikipedia: What have Innsbruck and Leipzig in common? Who are mayors of central European towns elevated more than 1000m? Which movies are starring both Brad Pitt and Angelina Jolie? All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 35 / 252
  40. 40. Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguation ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 36 / 252
  41. 41. DBpedia Information Extraction Framework DBpedia Information Extraction Framework (DIEF) Started in 2007 Hosted on Sourceforge and Github Initially written in PHP but fully re-written Written in Scala and Java Around 40 Contributors See https://www.ohloh.net/p/dbpedia for detailed overview Can potentially be adapted to other MediaWikis Currently Wiktionary Lehmann, Bühmann (Univ. Leipzig) http://wiktionary.dbpedia.org The Linked Data Life-Cycle 2013-08-23 37 / 252
  42. 42. DIEF - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 38 / 252
  43. 43. DIEF - Raw Infobox Extractor WikiText syntax {{Infobox Korean settlement |title = Busan Metropolitan City ... |area_km2 = 763.46 |pop = 3635389 |region = [[Yeongnam]] }} RDF serialization dbp:Busan dbp:title Busan Metropolitan City dbp:Busan dbp:area_km2 763.46^xsd:oat dbp:Busan dbp:pop 3635389^xsd:int dbp:Busan dbp:region dbp:Yeongnam Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 39 / 252
  44. 44. DIEF - Raw Infobox Extractor/Diversity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 40 / 252
  45. 45. DIEF - Raw Infobox extractor/Diversity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 41 / 252
  46. 46. DIEF - Mapping-Based Infobox Extractor Cleaner data: Combine what belongs together (birth_place, birthplace) Separate what is dierent (bornIn, birthplace) Correct handling of datatypes Mappings Wiki: http://mappings.dbpedia.org Everybody can contribute to new mappings or improve existing ones ≈ 170 editors Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 42 / 252
  47. 47. DIEF - Mapping-Based Infobox Extractor Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 43 / 252
  48. 48. URI/IRI schemes http://{lang.}dbpedia.org is the main domain For every article there exists a DBpedia resource in the form: http://lang.dbpedia.org/resource/{ArticleName} Properties from the raw infobox extractor use the http://{lang.}dbpedia.org/property/namespace Ontology is global for all languages and under http://dbpedia.org/ontology/namespace Note: that for English language no language code is used http://dbpedia.org as main domain http://dbpedia.org/resource/{title} for articles http://dbpedia.org/property/{title} for properties Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 44 / 252
  49. 49. Linked Data Publication via 303 Redirects http://dbpedia.org/resource/Dresden - URI of the city of Dresden http://dbpedia.org/page/Dresden - information resource describing the city of Dresden in HTML format http://dbpedia.org/data/Dresden - information resource describing the city of Dresden in RDF/XML format further formats supported, e.g. http://dbpedia.org/data/Dresden.n3 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle for N3 2013-08-23 45 / 252
  50. 50. DBpedia Links Data set Predicate Amsterdam Museum owl:sameAs BBC Wildlife Finder owl:sameAs Book Mashup rdf:type Count Tool 627 S 444 S 9 100 owl:sameAs Bricklink dc:publisher 10 100 CORDIS owl:sameAs 314 S Dailymed owl:sameAs 894 S DBLP Bibliography owl:sameAs 196 S DBTune owl:sameAs 838 S Diseasome owl:sameAs 2 300 S Drugbank owl:sameAs 4 800 S EUNIS owl:sameAs 3 100 S Eurostat (Linked Stats) owl:sameAs 253 S Eurostat (WBSG) owl:sameAs 137 CIA World Factbook owl:sameAs 545 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle S 2013-08-23 46 / 252
  51. 51. DBpedia Links Data set Predicate ickr wrappr dbp:hasPhoto- Count Tool 3 800 000 C 3 600 000 C Collection Freebase owl:sameAs GADM owl:sameAs 1 900 GeoNames owl:sameAs 86 500 S GeoSpecies owl:sameAs 16 000 S GHO owl:sameAs 196 L Project Gutenberg owl:sameAs 2 500 S Italian Public Schools owl:sameAs 5 800 S LinkedGeoData owl:sameAs 103 600 S LinkedMDB owl:sameAs 13 800 S MusicBrainz owl:sameAs 23 000 New York Times owl:sameAs 9 700 OpenCyc owl:sameAs 27 100 C OpenEI (Open Energy) owl:sameAs 678 S Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 47 / 252
  52. 52. DBpedia Links Data set Predicate Revyu owl:sameAs 6 Sider owl:sameAs 2 000 TCMGeneDIT owl:sameAs 904 UMBEL rdf:type US Census owl:sameAs WikiCompany owl:sameAs WordNet dbp:wordnet_type YAGO2 rdf:type Sum Count Tool S 896 400 12 600 8 300 467 100 18 100 000 27 211 732 (S: Silk, L: LIMES, C: custom script, missing: no regeneration) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 48 / 252
  53. 53. DBpedia Links - Query Example Compare funding per year (from FTS) and country with the gross domestic product of a country (from DBpedia) SELECT ∗ { { SELECT ? f t s y e a r ? dbpcountry ? com rdf : type ? com fts ? year fts −o : y e a r ? year rdfs : label (SUM( ? amount ) −o : Commitment . fts ? ftscountry o w l : sameAs SELECT ? d b p c o u n t r y ? dbpcountry ? gdpyear . } ? gdpnominal . . { ? dbpcountry rdf : type ? dbpcountry dbp : g d p N o m i n a l ? dbpcountry } { . ? ftsyear −o : d e t a i l A m o u n t ? amount . ? b e n e f i t f t s −o : b e n e f i c i a r y ? b e n e f i c i a r y ? b e n e f i c i a r y f t s −o : c o u n t r y ? f t s c o u n t r y ? benefit AS ? f u n d i n g ) . d bo : C o u n t r y dbp : g d p N o m i n a l Y e a r } { . ? gdpnominal ? gdpyear . . } FILTER ((? ftsyear Lehmann, Bühmann (Univ. Leipzig) = s t r (? gdpyear ) ) } The Linked Data Life-Cycle 2013-08-23 49 / 252
  54. 54. Infrastructure DBpedia has two extraction modes: Wikipedia-database-dump-based extraction DBpedia Live synchronisation (more later) DBpedia Dumps: The DBpedia Dump archive is located in: http://downloads.dbpedia.org/ Latest downloads is described in: http://dbpedia.org/Downloads Ocial Endpoint (by OpenLink): http://dbpedia.org/sparql Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 50 / 252
  55. 55. Query Answering Back to our Wikipedia questions: What have Innsbruck and Leipzig in common? Who are mayors of central European towns elevated more than 1000m? Which movies are starring both Brad Pitt and Angelina Jolie? All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants Using the data extracted from Wikipedia and the public SPARQL endpoint DBpedia can answer these questions. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 51 / 252
  56. 56. DBpedia Live DBpedia dumps are generated on a bi-annual basis Wikipedia has around 100,000 150,000 page edits per day DBpedia Live pulls page updates in real-time and extraction results update the triple store In practice, a 5 minute update delay increases performance by 15% Links http://live.dbpedia.org/sparql Documentation: http://wiki.dbpedia.org/DBpediaLive Statistics: http://live.dbpedia.org/LiveStats/ SPARQL Endpoint: Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 52 / 252
  57. 57. DBpedia Live - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 53 / 252
  58. 58. DBpedia Internationalization (I18n) DBpedia Internationalization Committee founded: http://wiki.dbpedia.org/Internationalization Available DBpedia language editions in: Korean, Greek, German, Polish, Russian, Dutch, Portuguese, Spanish, Italian, Japanese, French Use the corresponding Wikipedia language edition for input Mappings available for 23 languages Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 54 / 252
  59. 59. DBpedia I18n - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 55 / 252
  60. 60. Applications: Disambiguation Named entity recognition and disambiguation Tools such as: DBpedia Spotlight, AlchemyAPI, Semantic API, Open Calais, Zemanta and Apache Stanbol Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 56 / 252
  61. 61. Applications: Question Answering DBpedia is the primary target for several QA systems in the Question Answering over Linked Data (QALD) workshop series IBM Watson relied also on DBpedia Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 57 / 252
  62. 62. Applications: Faceted Browsing Neofonie Browser gFacet OpenLink faceted browser (fct) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 58 / 252
  63. 63. Applications: Search and Querying Query Builder RelFinder SemLens Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 59 / 252
  64. 64. Applications: Digital Libraries Archives Virtual International Authority Files (VIAF) project as Linked Data VIAF added a total of 250,000 reciprocal authority links to Wikipedia. DBpedia can also provide: Context information for bibliographic and archive records (e.g. an author's demographics, a lm's homepage, an image etc.) Stable and curated identiers for linking. The broad range of Wikipedia topics can form the basis for a thesaurus for subject indexing. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 60 / 252
  65. 65. Applications: DBpedia Mobile DBpedia Mobile is a location-centric DBpedia client application for mobile devices consisting of a map view, the Marbles Linked Data Browser and a GPS-enabled launcher application. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 61 / 252
  66. 66. Applications: DBpedia Wiktionary Wiktionary is a Wikimedia project: http://wiktionary.org 171 languages, 3M words for English. Extracted Using the DBpedia Information Extraction Framework Easily congurable for every Wiktionary language edition Pre-congured for German, Greek, English, Russian and French. http://Wiktionary.dbpedia.org 100 milion triples Lemon model Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 62 / 252
  67. 67. Other Applications See http://wiki.dbpedia.org/Applications Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle for a more complete list 2013-08-23 63 / 252
  68. 68. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 64 / 252
  69. 69. Linked Data - Achievements and Challenges Achievements: 1 2 3 data commons (50B facts) vibrant, global RTD community Industrial uptake begins (e.g. Extension of the Web with a BBC, Thomson Reuters, Eli Lilly, Challenges: 1 Coherence: 2 4 5 Governmental adoption in sight Establishing Linked Data as a deployment path for the Semantic Web. Quality: partly low quality data and inconsistencies 3 NY Times, Facebook, Google, Yahoo) Relatively few, expensively maintained links Performance: Still substantial penalties compared to relational 4 Data consumption: large-scale processing, schema mapping and data fusion still in its infancy 5 Usability: Missing direct end-user tools and network eect. These issues are closely related and should ultimately lead to an ecosystem of interlinked knowledge! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 65 / 252
  70. 70. Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Quality Analysis Evolution / Repair Extraction Search/ Browsing/ Exploration Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 66 / 252
  71. 71. Extraction Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 67 / 252
  72. 72. Extraction From unstructured sources Formats: plain text Methods: NLP, text mining, ontology learning From semi-structured sources Formats: wiki markup, tags Tools: DBpedia framework (Wikipedia, Wictionary) From structured sources Formats: databases, spreadsheets, XML RDB2RDF tools: Sparqlify, D2R, Triplify CSV converters: RDF extension of Google Rene Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 68 / 252
  73. 73. Extraction Challenges From unstructured sources Improve F-Measure of existing NLP approaches (OpenCalais, Ontos API) Develop standardized, LOD enabled interfaces between NLP tools (NLP2RDF) From semi-structured sources Ecient bi-directional synchronization From structured sources Declarative syntax and semantics of data model transformations (W3C WG RDB2RDF) Orthogonal challenges Using LOD as background knowledge Provenance Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 69 / 252
  74. 74. 1234567859A8BC74DE96 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 70 / 252
  75. 75. RDF Data Management From unstructured sources SPARQL RDF access still by a factor 2-10 slower than relational data management Performance increases steadily Comprehensive, well-supported open-soure and commercial implementations are available: OpenLink's Virtuoso (os+commercial) OWLIM-Lite (free), OWLIM-SE, OWLIM-Enterprise Talis (hosted) Bigdata (distributed) Allegrograph (commercial) Mulgara (os) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 71 / 252
  76. 76. Storage and Querying Challenges Reduce the performance gap between relational and RDF data management SPARQL Query extensions: Spatial/semantic/temporal data management View maintenance / adaptive reorganization based on common access patterns More realistic benchmarks Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 72 / 252
  77. 77. Authoring Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 73 / 252
  78. 78. Authoring Integrated in Existing Environments: Tiki Data oriented: RDFauthor, rdfEditor Schema oriented: Protégé, TopBraid Composer, NeOn Toolkit, Swoop, Neologism, Knoodl Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 74 / 252
  79. 79. Authoring: Semantic Wikis 1 Semantic (Text) Wikis Authoring of semantically annotated texts Semantic MediaWiki, KiWi, (Wikipedia+DBpedia) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 75 / 252
  80. 80. Authoring: Semantic Wikis 1 Semantic (Text) Wikis Authoring of semantically annotated texts Semantic MediaWiki, KiWi, (Wikipedia+DBpedia) 2 Semantic Data Wikis Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL) OntoWiki Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 75 / 252
  81. 81. 1234235 123345647347829A2B8CDDB2EFCC22F Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 76 / 252
  82. 82. Interlinking Data Web is an uncontrolled environment proliferation of equivalent or similar entities need for links / merging Currently only few RDF triples are links Manual Link Discovery: Sindice Integration, LODStats, Semantic Pingback Tool supported / Semi-Automatic: SILK, LIMES, COMA, RDF-AI Usually via mapping specications / heuristics Machine Learning / Automatic: RAVEN, EAGLE, SILK GP Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 77 / 252
  83. 83. Interlinking Challenges Apply work in the de-duplication/record linkage literature Consider the open world nature of Linked Data Use LOD background knowledge Zero-conguration linking Explore active learning approaches, which integrate users in a feedback loop Maintain a 24/7 linking service: Linked Open Data Around-The-Clock project (http://latc-project.eu/) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 78 / 252
  84. 84. 1234567829 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 79 / 252
  85. 85. Enrichment Currently, lack of knowledge bases with sophisticated schema information and instance data adhering to this schema Goal: powerful reasoning, consistency checking and querying Manual: Via ontology editors, DBpedia mappings (Semi-)Automatic: DL-Learner, Statistical Schema Induction Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 80 / 252
  86. 86. Enrichment: Example Given: knowledge base with property birthPlace (i.e. triples using that property) but no information on the semantics of birthPlace Possibly enrichment: ObjectProperty: birthPlace Characteristics: Functional Domain: Person Range: Place SubPropertyOf: hasBeenAt Benets: axioms serve as documentation for purpose and correct usage of schema elements additional implicit information can be inferred improve the applicability of schema debugging techniques Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 81 / 252
  87. 87. Repair Ontology Debugging: OWL reasoning to detect inconsistencies and satisable classes + detect the most likely sources for the problems basic task: provide feedback to user for resolving undesired entailments justication J ⊆O of an entailment is a minimal set of axioms from which the entailment can be drawn Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 82 / 252
  88. 88. 1234567 89347A5A Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 83 / 252
  89. 89. Linked Data Quality Analysis Quality on the Data Web is varying a lot Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia) Quality = Fitness for use Often not necessary to x all problems, but to know about them 30+ quality dimensions dened in recent survey Research Challenge Establish measures for assessing the authority, provenance, reliability of Data Web resources Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 84 / 252
  90. 90. Evolution Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle © CC-BY-SA by alasis on flickr) 2013-08-23 85 / 252
  91. 91. KB Evolution Tasks: Performing knowledge base changes / refactoring Ensuring consistency of related knowledge Managing changes, e.g. undo operations Update materialized inferred data upon changes Update materialised links to other data upon changes Tools: Protégé - PROMPT and change management plugins EvoPat - easily re-usable and sharable evolution patterns dened via SPARQL PatOMat - ontology transformation framework Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 86 / 252
  92. 92. 1234567895A Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 87 / 252
  93. 93. Exploration RDF data can be complex (as discussed by Pascal Hitzler) Exploration phase aims to make data accessible to non-experts Options: Faceted Browsing Question Answering Query Builders Visualisation of statistical or geospatial data ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 88 / 252
  94. 94. Catalogus Professorum Lipsiensis Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 89 / 252
  95. 95. Visual Query Builder Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 90 / 252
  96. 96. Relationship Finder in CPL Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 91 / 252
  97. 97. Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Quality Analysis Evolution / Repair Extraction Search/ Browsing/ Exploration Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 92 / 252
  98. 98. Make the Web a Linked Data Washing Machine Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 93 / 252
  99. 99. Tool Support for Life-Cycle? Many SW tools support one or more life-cycle stages Linked Data Stack (http://stack.linkeddata.org) provides a consolidated repository of such tools Each tool is a Debian package Lightweight integration between tools via common vocabularies and SPARQL Demonstrator interfaces for showing tools in combination Developed by LOD2 and GeoKnow EU projects Geo Know Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 94 / 252
  100. 100. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 95 / 252
  101. 101. Knowledge Extraction Knowledge Extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. Resulting knowledge needs to be in a machine-readable and machine-interpretable format and facilitate inferencing Similar to Information Extraction (NLP) and ETL (Data Warehouse), but main dierence: extraction result goes beyond the creation of structured information or the transformation into a relational schema Requires re-use of existing formal knowledge (reusing ontologies) or the generation of a schema based on the source data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 96 / 252
  102. 102. Categorisation of Approaches Source - Examples: plain text, relational databases, XML, CSV Exposition - How is the extracted knowledge made explicit? How can you query and perform inference? Synchronization - Is the knowledge extraction process executed once to produce a dump or is the result synchronized with the source? Are changes to the result written back (Bi-directional)? Reuse of Vocabularies - Can popular ontologies (Good Relations, FOAF, . . . ) be re-used to simplify global data integration? Automatisation - manual, semi-automatic, automatic Domain Ontology Required - Does the approach require a pre-dened ontology or can it create a schema from the source (e.g. ontology learning)? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 97 / 252
  103. 103. Extraction from Structured Sources to RDF Simple mappings from RDB tables/views to RDF Direct mapping of the model of relational databases to RDF → OWL class → Instance s of Table Row this class → Triple (s ,p ,o ) http://www.w3.org/TR/rdb-direct-mapping/ Cell with value o in column p Details: Complex mappings of relational databases to RDF Additional renements can be employed to 1:1 mapping to improve the usefulness of RDF output Extract or learn an OWL schema from the given database schema Map the schema and its contents to a pre-existing domain ontology Powerful mapping languages: R2RML, SML XML XML tree structure can be directly converted to RDF graph structure Complex mappings possible, e.g. via XSLT processors Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 98 / 252
  104. 104. Extraction from Natural Language Sources 80% of the information in business documents is in unstructured natural language 1 (-) Increased complexity and decreased quality of extraction (+) Potential for a massive acquisition of extracted knowledge Traditional Information Extraction (IE) Recognize and categorise elements in text Techniques: Named Entity Recognition (NER), Coreference Resolution (CO), . . . Ontology Learning (OL) from Text Learn whole ontologies from natural language text Usually (semi-)automatic extracted 1 Wimalasuriya, Dou. Ontology-based information extraction: [. . . ] Journal of Information Science Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 99 / 252
  105. 105. LinkedGeoData + Sparqlify Example: LinkedGeoData Knowledge Extraction Project using Sparqlify Structure Motivation OpenStreetMap LGD Architecture Mapping Access (How LinkedGeoData is published) Use Cases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 100 / 252
  106. 106. Motivation Ease information integration tasks that require spatial knowledge, such as Oerings of bakeries next door Map of distributed branches of a company Historical sights along a bicycle track LOD cloud contains data sets with spatial features e.g. Geonames, DBpedia, US census, EuroStat But: they are restricted to popular or large entities like countries, famous places etc. or specic regions Therefore they lack Lehmann, Bühmann (Univ. Leipzig) buildings, roads, mailboxes, etc. The Linked Data Life-Cycle 2013-08-23 101 / 252
  107. 107. OpenStreetMap - Datamodel Basic entities are: Nodes Latitude, Longitude. Ways Sequence of nodes. Relations Associations between any number of nodes, ways and relations. Every member in a relation plays a certain role. Each entity may be described with tags (= key-value pairs) A way is closed if the ID of the last referenced node equals that of the rst one. Whether a closed way denotes a linear ring or a polygon (i.e. whether the enclosed area is part of the respective OSM entity) depends on the tags. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 102 / 252
  108. 108. Example: Leipzig's Zoo Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 103 / 252
  109. 109. Comparison: Leipzig's Zoo (OpenStreetMap) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 104 / 252
  110. 110. Comparison: Leipzig's Zoo (GoogleMaps) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 105 / 252
  111. 111. LGD Architecture Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 106 / 252
  112. 112. Tag Mappings Key-value pairs will be assigned to RDF ressources Each pair (k , v ) can be annotated with datatypes, language tags, classes Mappings are themselves tables Example table: k lgd_map_literal name name:en alt_label note ... Lehmann, Bühmann (Univ. Leipzig) property rdfs:label rdfs:label skos:altLabel rdfs:comment ... The Linked Data Life-Cycle lang en ... 2013-08-23 107 / 252
  113. 113. View Denition RDF mapping of the data from a PostgreSQL database Create View lgd_nodes As Construct { ?n a lgdm:Node . ?n geom:geometry ?g . ?g ogc:asWKT ?o . } With ?n = uri(lgd:node, ?id) ?g = uri(lgd-geom:node, ?id) ?o = typedLiteral(?geom, ogc:wktLiteral) From nodes Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 108 / 252
  114. 114. Sparqlify SPARQL-SQL Rewriter Rewrites SPARQL Queries according to the view denition Platform module oers SPARQL Endpoint and Linked Data interface https: //github.com/AKSW/Sparqlify Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 109 / 252
  115. 115. Rest-API Oers REST methods for frequent queries Based on SPARQL (Virtuoso) endpoint Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 110 / 252
  116. 116. Downloads RDF dataset for download Generated using Construct { ?s ?p ?o } http: //downloads.linkedgeodata.org Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 111 / 252
  117. 117. Ontology Enriched classes and properties with multilingual labels from TranslateWiki http://translatewiki.net Imported icons for 90 classes from the freely available icon collection from the SJJB Management http://www.sjjb.co.uk/mapicons/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 112 / 252
  118. 118. SML Mapping Examples The following slides demonstrate how to map relational data to RDF with the Sparqlication Mapping Language (SML). Thereby, these prexes are used: prex rdfs ogc geom lgd lgd-geom IRI Prexes http://www.w3.org/2000/01/rdf-schema# http://www.opengis.net/ont/geosparql# http://geovocab.org/geometry# http://linkedgeodata.org/triplify/ http://linkedgeodata.org/geometry/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 113 / 252
  119. 119. SML - Mapping Example I: The Goal (1/4) Input Table id 1 2 How to map tables to RDF? nodes How to introduce the geom commonly used POINT(0 0) POINT(1 1) distinction in GIS between feature and geometry? Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 114 / 252
  120. 120. SML - Mapping Example I: SML Syntax Outline (2/4) Input Table id 1 2 nodes geom POINT(0 0) POINT(1 1) Create View myNodesView As Construct { ... } With ... From ... Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 115 / 252
  121. 121. SML - Mapping Example I: Construct and From (3/4) Input Table id 1 2 nodes geom POINT(0 0) POINT(1 1) Create View myNodesView As Construct { ?n geom:geometry ?g . ?g ogc:asWKT ?o } With ... From nodes Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 116 / 252
  122. 122. SML - Mapping Example I: Complete! (4/4) Input Table id 1 2 nodes geom POINT(0 0) POINT(1 1) Create View myNodesView As Construct { ?n geom:geometry ?g . ?g ogc:asWKT ?o } With ?n = uri(lgd:node, ?id) ?g = uri(lgd-geom:node, ?id) ?o = typedLiteral(?geom, ogc:wktLiteral) From nodes Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 117 / 252
  123. 123. SML Mapping Examples A more complex example, which demonstrates the use of an SQL mapping table and an SQL helper view. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 118 / 252
  124. 124. SML - Mapping Example II: The Goal (1/8) Input Table id 1 1 1 1 1 node_tags k name name:en amenity addr:street addr:city v Universitaet Leipzig University of Leipzig university Augustusplatz Leipzig Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix lgd: http://linkedgeodata.org/triplify/ . lgd:node1 rdfs:label Universitaet Leipzig . lgd:node1 rdfs:label University of Leipzig@en . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 119 / 252
  125. 125. SML - Mapping Example II: Source Data (2/8) OSM Table id 1 1 1 1 1 node_tags k name name:en amenity addr:street addr:city v Universitaet Leipzig University of Leipzig university Augustusplatz Leipzig Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 120 / 252
  126. 126. SML - Mapping Example II: Mapping Table (3/8) OSM Table id 1 1 1 1 1 node_tags k name name:en amenity addr:street addr:city RDF Mapping Table v Universitaet Leipzig University of Leipzig university Augustusplatz Leipzig Lehmann, Bühmann (Univ. Leipzig) k lgd_map_literal name name:en alt_label note ... The Linked Data Life-Cycle property rdfs:label rdfs:label skos:altLabel rdfs:comment ... lang en ... 2013-08-23 121 / 252
  127. 127. SML - Mapping Example II: Helper View (4/8) OSM Table id 1 1 1 1 1 node_tags k name name:en amenity addr:street addr:city RDF Mapping Table v Universitaet Leipzig University of Leipzig university Augustusplatz Leipzig k lgd_map_literal name name:en alt_label note ... property rdfs:label rdfs:label skos:altLabel rdfs:comment ... lang en ... Helper View lgd_node_tags_literal id property v lang 1 rdfs:label Universitaet Leipzig 1 rdfs:label University of Leipzig en ... ... ... ... SELECT id, property, v, lang FROM node_tags, lgd_map_literal WHERE node_tags.k = lgd_map_literal.k Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 122 / 252
  128. 128. SML - Mapping Example II: SML View (5/8) Logical Table id 1 1 ... SML View lgd_node_tags_literal property rdfs:label rdfs:label ... v Univ. L. Univ. of L. ... Lehmann, Bühmann (Univ. Leipzig) lang en ... Create View lgd_node_tags_text As Construct { The Linked Data Life-Cycle 2013-08-23 123 / 252
  129. 129. SML - Mapping Example II: SML View (6/8) Logical Table id 1 1 ... SML View lgd_node_tags_literal property rdfs:label rdfs:label ... v Univ. L. Univ. of L. ... Lehmann, Bühmann (Univ. Leipzig) lang en ... Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ... From lgd_node_tags_literal The Linked Data Life-Cycle 2013-08-23 124 / 252
  130. 130. SML - Mapping Example II: SML View (7/8) Logical Table id 1 1 ... SML View lgd_node_tags_literal property rdfs:label rdfs:label ... v Univ. L. Univ. of L. ... Lehmann, Bühmann (Univ. Leipzig) lang en ... Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ?s = uri(lgd:node, ?id) ?p = uri(?property) ?o = plainLiteral(?v, ?lang) From lgd_node_tags_literal The Linked Data Life-Cycle 2013-08-23 125 / 252
  131. 131. SML - Mapping Example II: SML View (8/8) Logical Table SML View + Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ?s = uri(lgd:node, ?id) ?p = uri(?property) ?o = plainLiteral(?v, ?lang) From lgd_node_tags_literal id 1 1 ... lgd_node_tags_literal property rdfs:label rdfs:label ... v Univ. L. Univ. of L. ... lang en ... Resulting RDF @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix lgd: http://linkedgeodata.org/triplify/ . lgd:node1 rdfs:label Universitaet Leipzig . lgd:node1 rdfs:label University of Leipzig@en . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 126 / 252
  132. 132. Further Tag Mappings lgd_map_dataype k seats unisex k datatype integer boolean lgd_map_property website property foaf:homepage lgd_map_resource_k k highway property rdf:type lgd_map_resource_kv k waterway v river object lgdo:HighwayThing property rdf:type Lehmann, Bühmann (Univ. Leipzig) object lgdo:River The Linked Data Life-Cycle 2013-08-23 127 / 252
  133. 133. LGD Edit Tool Multi User Tag Mapping WebApp Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 128 / 252
  134. 134. Resources Sparqlify http://sparqlify.org LinkedGeoData http://linkedgeodata.org Tag Mappings https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/ org/aksw/linkedgeodata/sql/Mappings.sql SML View Denitions https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/ org/aksw/linkedgeodata/sml/LinkedGeoData-Triplify-IndividualViews.sml Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 129 / 252
  135. 135. Statistics (15 August 2013) Complete OSM planet le corresponds to Virtual access via Sparqlify ∼ 20.000.000.000 triples Downloads limited to selected classes. 292.780.188 Triples 153.613.243 triples of Nodes 139.166.945 triples of Ways Relations not yet available for download Among them 532.812 PlaceOfWorship 82.788 RailwayStation 72.091 Toilets 71.613 Town 19.937 City Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 130 / 252
  136. 136. Access Materialized Sparql Endpoint (based on Virtuoso DB, download datasets loaded) http://linkedgeodata.org/sparql http://linkedgeodata.org/snorql Virtual Sparql Endpoint (based on Sparqlify, access to 20B triples, limited SPARQL 1.0 support) http://linkedgeodata.org/vsparql http://linkedgeodata.org/vsnorql Rest Interface (based on the Virtual Sparql Endpoint) Supports limited queries (e.g. circular/rectangular area, ltering by labels) Downloads http://downloads.linkedgeodata.org Monthly updates on the above datasets envisioned Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 131 / 252
  137. 137. Use Cases Augmented Reality Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 132 / 252
  138. 138. Use Cases Generic Browsing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 133 / 252
  139. 139. Use Cases Generic Browsing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 134 / 252
  140. 140. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 135 / 252
  141. 141. Why Link Discovery? 1 Fourth Linked Data principle 2 Links are central for Cross-ontology QA Data Integration Reasoning Federated Queries ... 3 2011 topology of the LOD Cloud: 31+ billion triples ≈ 0.5 billion links owl:sameAs in most cases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 136 / 252
  142. 142. Why is it dicult? 1 Time complexity Large number of triples Quadratic a-priori runtime 69 days for mapping cities from DBpedia to Geonames (1ms per comparison) decades for linking DBpedia and LGD ... Denition (Link Discovery) Given sets S and T of resources and relation Task: Find M = {(s , t ) ∈ S × T : R(s , t )} R Common approaches: Find M Find M = {(s , t ) ∈ S × T : σ(s , t ) ≥ θ} = {(s , t ) ∈ S × T : δ(s , t ) ≤ θ} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 137 / 252
  143. 143. Why is it dicult? 2 Complexity of specications Combination of several attributes required for high precision Tedious discovery of most adequate mapping Dataset-dependent similarity functions Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 138 / 252
  144. 144. LIMES Framework Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 139 / 252
  145. 145. Runtime Optimization Reduce the number of comparisons C (A) all σ /θ values for links) ≥ |M | (assuming we need Maximize reduction ratio: RR (A) Lehmann, Bühmann (Univ. Leipzig) =1− C (A) |S ||T | The Linked Data Life-Cycle 2013-08-23 140 / 252
  146. 146. Runtime Optimization Reduce the number of comparisons C (A) all σ /θ values for links) ≥ |M | (assuming we need Maximize reduction ratio: RR (A) =1− C (A) |S ||T | Question Can we devise lossless approaches with guaranteed RR? Advantages Space management Runtime prediction Resource scheduling Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 140 / 252
  147. 147. RR Guarantee Best achievable reduction ratio: RRmax Lehmann, Bühmann (Univ. Leipzig) =1− The Linked Data Life-Cycle |M | |S ||T | 2013-08-23 141 / 252
  148. 148. RR Guarantee Best achievable reduction ratio: RRmax Approach H(α) =1− |M | |S ||T | fullls RR guarantee criterion, i: ∀r RRmax , ∃α : RR (H(α)) ≥ r Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 141 / 252
  149. 149. RR Guarantee Best achievable reduction ratio: RRmax Approach H(α) =1− |M | |S ||T | fullls RR guarantee criterion, i: ∀r RRmax , ∃α : RR (H(α)) ≥ r Here, we use relative reduction ratio (RRR ): RRR (A) Lehmann, Bühmann (Univ. Leipzig) = RRmax RR (A) The Linked Data Life-Cycle 2013-08-23 141 / 252
  150. 150. Goal Formal Goal Devise H(α) : ∀r 1, ∃α : RRR (H(α)) ≤ r Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 142 / 252
  151. 151. Restrictions Minkowski Distance δ(s , t ) = p n 1 i= Lehmann, Bühmann (Univ. Leipzig) |si − ti |p , p ≥ 2 The Linked Data Life-Cycle 2013-08-23 143 / 252
  152. 152. Space Tiling HYPPO δ(s , t ) ≤ θ describes a hypersphere Approximate hypersphere by using a hypercube Easy to compute No loss of recall (blocking) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 144 / 252
  153. 153. Space Tiling Set width of single hypercube to Lehmann, Bühmann (Univ. Leipzig) ∆ = θ/α The Linked Data Life-Cycle 2013-08-23 145 / 252
  154. 154. Space Tiling Set width of single hypercube to Tile Ω=S ∪T (c1 , . . . , c ) ∈ N points ω ∈ Ω : ∀i ∈ {1 . . . n}, c ∆ ≤ ω (c + 1)∆ Coordinates: Contains ∆ = θ/α into the adjacent cubes C Lehmann, Bühmann (Univ. Leipzig) n n i The Linked Data Life-Cycle i i 2013-08-23 145 / 252
  155. 155. Space Tiling Set width of single hypercube to Tile Ω=S ∪T (c1 , . . . , c ) ∈ N points ω ∈ Ω : ∀i ∈ {1 . . . n}, c ∆ ≤ ω (c + 1)∆ Coordinates: Contains ∆ = θ/α into the adjacent cubes C Lehmann, Bühmann (Univ. Leipzig) n n i The Linked Data Life-Cycle i i 2013-08-23 145 / 252
  156. 156. HYPPO Combine (2α + 1)n hypercubes around C (ω) to approximate hypersphere RRR (HYPPO (α)) n 2 = (αα+(1)) nS n lim RRR (HYPPO (α)) α→∞ Lehmann, Bühmann (Univ. Leipzig) n = S2 n) ( The Linked Data Life-Cycle 2013-08-23 146 / 252
  157. 157. HYPPO RRR(HYPPO) for p Lehmann, Bühmann (Univ. Leipzig) = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50 The Linked Data Life-Cycle 2013-08-23 147 / 252
  158. 158. HYPPO RRR(HYPPO) for p = 2, lim RRR (HYPPO (α)) α→∞ lim RRR (HYPPO (α)) α→∞ lim RRR (HYPPO (α)) α→∞ Lehmann, Bühmann (Univ. Leipzig) n = 2, 3, 4 and 2 ≤ α ≤ 50 4 = π ≈ 1.27 (n = 2) 6 = π ≈ 1.91 (n = 3) 32 = π2 ≈ 3.24 (n = 4) The Linked Data Life-Cycle 2013-08-23 147 / 252
  159. 159. HR3 : Idea index (C , ω) =  0 if n  i= Lehmann, Bühmann (Univ. Leipzig) ∃i : |ci − c (ω)i | ≤ 1, 1 ≤ i ≤ n, (|ci − c (ω)i | − 1)p 1 The Linked Data Life-Cycle else, 2013-08-23 148 / 252
  160. 160. HR3 : Idea Compare C (ω) with C i index (C , ω) α = 4, p = 2 Lehmann, Bühmann (Univ. Leipzig) ≤ αp The Linked Data Life-Cycle 2013-08-23 149 / 252
  161. 161. HR3 : Idea Lemma ∀s ∈ S : index (C , s ) αp implies that all t ∈C are non-matches Claims No loss of recall 3 (α)) lim RRR (HR α→∞ Lehmann, Bühmann (Univ. Leipzig) =1 The Linked Data Life-Cycle 2013-08-23 150 / 252
  162. 162. HR3 : Lemma 3 Lemma ∀α 1 p 3 (2α)) RRR (HR RRR (HR3 (α)) = 2, α = 4 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 151 / 252
  163. 163. HR3 : Proof Lemma ∀α 1 p RRR (HR 3 (2α)) RRR (HR 3 (α)) = 2, α = 8 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 152 / 252
  164. 164. HR3 : Proof Lemma ∀α 1 p RRR (HR 3 (2α)) RRR (HR 3 (α)) = 2, α = 25 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 153 / 252
  165. 165. HR3 : Proof Lemma ∀α 1 p RRR (HR 3 (2α)) RRR (HR 3 (α)) = 2, α = 50 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 154 / 252
  166. 166. HR3 : Idea Theorem 3 (α)) lim RRR (HR α→∞ =1 Claims No loss of recall 3 (α)) lim RRR (HR α→∞ Lehmann, Bühmann (Univ. Leipzig) =1 The Linked Data Life-Cycle 2013-08-23 155 / 252
  167. 167. HR3 : Experiments Compare HR3 with LIMES 0.5's HYPPO and SILK 2.5.1 Experimental Setup: Deduplicating DBpedia places by minimum elevation, elevation and maximum elevation (θ = 49m, 99m). Geonames and LinkedGeoData by longitude and latitude (θ = 1◦ , 9◦ ) 64-bit computer with a 2.8GHz i7 processor with 8GB RAM. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 156 / 252
  168. 168. HR3 : Experiments (Comparisons) Experiment 2: Deduplicating DBpedia places, 6 0.64 × 10 θ = 99m less comparisons Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 157 / 252
  169. 169. HR3 : Experiments (Comparisons) Experiment 4: Linking Geonames and LinkedGeoData, 4.3 × 106 θ = 9◦ less comparisons Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 158 / 252
  170. 170. HR3 : Experiments (Runtime) θ = 49, 99m ◦ Geonames and LGD, θ = 1, 9 Experiment 1, 2: DBpedia, Experiment 3, 4: 10 Runtime (s) 10 10 10 10 4 3 HR3 HYPPO SILK 2 1 0 Exp. 1 Lehmann, Bühmann (Univ. Leipzig) Exp. 2 Exp. 3 The Linked Data Life-Cycle Exp. 4 2013-08-23 159 / 252
  171. 171. HR3 : Summary Mission New category of algorithms for link discovery Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 160 / 252
  172. 172. HR3 : Summary Mission New category of algorithms for link discovery Presented HR3 Link discovery in ane spaces with Minkowski measures Outperforms the state of the art (runtime, comparisons) Optimal reduction ratio Integrated in LIMES Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 160 / 252
  173. 173. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  174. 174. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  175. 175. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  176. 176. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  177. 177. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  178. 178. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  179. 179. Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  180. 180. Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Question Can we do better by using more information? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  181. 181. Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Question Can we do better by using more information? Higher F-measure Often slower Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  182. 182. Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  183. 183. Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  184. 184. Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  185. 185. Similarity of Candidates = (s , t ) can (σ1 (x ), . . . , σn (x )) ∈ [0, 1]n . Link candidate x be regarded as vector Similarity of link candidates x and y : sim (x , y ) 1 = n 1 + i= . (1) (σi (x ) − σi (y ))2 1 Allows exploiting both intra- and inter-class similarity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 165 / 252
  186. 186. Graph Clustering Rationale: Approach Use intra-class similarity Cluster elements of S + and S − independently Choose one element per cluster as representative Present oracle with most informative representatives e S+ 0.9 a 0.25 0.8 c 0.8 0.8 b h 0.8 f Lehmann, Bühmann (Univ. Leipzig) d 0.9 0.25 l 0.8 i 0.9 0.8 0.8 g k 0.25 The Linked Data Life-Cycle S2013-08-23 166 / 252
  187. 187. BorderFlow G = (V , E , ω) with V = S+ or V = S− ω(x , y ) = sim(x , y ) Keep best ec edges for each x Lehmann, Bühmann (Univ. Leipzig) ∈V The Linked Data Life-Cycle 2013-08-23 167 / 252
  188. 188. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) Lehmann, Bühmann (Univ. Leipzig) = Ω(b (X ),X ) Ω(b (X ),n(X )) The Linked Data Life-Cycle 2013-08-23 168 / 252
  189. 189. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) Lehmann, Bühmann (Univ. Leipzig) = Ω(b (X ),X ) Ω(b (X ),n(X )) The Linked Data Life-Cycle 2013-08-23 168 / 252
  190. 190. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b (X ),X ) Ω(b (X ),n(X )) http://sourceforge.net/projects/cugar-framework/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 168 / 252
  191. 191. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) Lehmann, Bühmann (Univ. Leipzig) = Ω(b (X ),X ) Ω(b (X ),n(X )) The Linked Data Life-Cycle 2013-08-23 169 / 252
  192. 192. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b (X ),X ) Ω(b (X ),n(X )) http://sourceforge.net/projects/cugar-framework/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 169 / 252
  193. 193. Conclusion Can be combined with arbitrary active learning ML algorithms Was experimentally combined with EAGLE (genetic programming) and RAVEN (linear classier) and shown to outperform the plain informativeness function in terms of F-measure Choice of example important to minimise user eort Contact me for detailed experimental results Longer runtimes (up to 2×) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 170 / 252
  194. 194. Summary Linking crucial task in the web of data Tow key problems 1 Ecient execution of link specications 2 Creation of link specication Presented HR3 to handle the rst problem Presented COALA as building block for the second problem Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 171 / 252
  195. 195. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 172 / 252
  196. 196. Motivation rise in the availability and usage of knowledge bases still a lack of knowledge bases that consist of sophisticated schema information and instance data adhering to this schema e.g. in the life sciences several knowledge bases only consist of schema information to a large extent, a collection of facts without a clear structure (e.g. information extracted from databases) combination of sophisticated schema and instance data would allow powerful reasoning, consistency checking, and improved querying → create schemata based on existing data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 173 / 252
  197. 197. Example dbr : Brad_Pitt : birthPlace a dbr : Angela_Merkel : birthPlace a : birthPlace a d b r : Shawnee , _Oklahoma a Suggestions: a a : Place . : Place . birthPlace ObjectProperty : birthPlace Characteristics : Range : d b r : Ulm ; : Person . : Place . d b r : Hamburg Domain : d b r : Hamburg ; : Person . dbr : A l b e r t _ E i n s t e i n d b r : Ulm d b r : Shawnee , _Oklahoma ; : Person . Functional Person Place SubPropertyOf : hasBeenAt Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 174 / 252
  198. 198. Benets of an expressive schema Axioms serve as documentation for the purpose and correct usage of schema elements Additional implicit information can be inferred Improve querying optimisations Improve/allow the application of schema debugging techniques Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 175 / 252
  199. 199. Each person was only born at one place?! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 176 / 252
  200. 200. birthPlace birthPlace
  201. 201. = birthPlace birthPlace
  202. 202. = birthPlace birthPlace birthPlace is functional
  203. 203. = birthPlace birthPlace birthPlace is functional
  204. 204. = birthPlace birthPlace SELECT ? s WHERE { ? s dbo : b i r t h P l a c e ? o1 . ? s dbo : b i r t h P l a c e ? o2 . FILTER ( ? o1 != ? o2 ) } } birthPlace is functional Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 177 / 252
  205. 205. Where was Julia Nannie Wallace born? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 178 / 252
  206. 206. Julia Nannie Wallace was born in Lacrosse? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 179 / 252
  207. 207. No, Julia Nannie Wallace was born in La Crosse! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 180 / 252
  208. 208. birthPlace
  209. 209. rdf:type birthPlace Sport
  210. 210. rdf:type birthPlace birthPlace range Place Sport
  211. 211. rdf:type Sport birthPlace rdf:type birthPlace range Place Place
  212. 212. rdf:type Sport = birthPlace rdf:type birthPlace range Place Place disjointWith Sport Place
  213. 213. rdf:type Sport = birthPlace rdf:type birthPlace range Place Place disjointWith Sport Place
  214. 214. rdf:type City birthPlace rdf:type birthPlace range Place Place disjointWith Sport Place
  215. 215. rdf:type City birthPlace rdf:type Place SELECT ? s ? p l a c e WHERE { ? s dbo : b i r t h P l a c e ? p l a c e . ? place r d f : type / r d f s : subClassOf ∗ ? type1 . ? t y p e 2 r d f s : s u b C l a birthPlace :range Place s s O f ∗ dbo P l a c e . ? t y p e 1 owl : d i s j o i n t W i t h ? t y p e 2 . } Place disjointWith Sport Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 181 / 252
  216. 216. 3 Steps to get a schema 3-Phase Enrichment Learning Approach: SPARQL Endpoint Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 182 / 252
  217. 217. 3 Steps to get a schema 3-Phase Enrichment Learning Approach: (only executed once per knowledge base) SPARQL Endpoint Input: Entity URI, 1. obtain schema Axiom Type, information Knowledge Base (SPARQL Endpoint) Lehmann, Bühmann (Univ. Leipzig) Background Knowledge The Linked Data Life-Cycle 2013-08-23 183 / 252
  218. 218. 3 Steps to get a schema 3-Phase Enrichment Learning Approach: Input: Entity URI, 1. obtain schema Axiom Type, information Knowledge Base (SPARQL Endpoint) Lehmann, Bühmann (Univ. Leipzig) (sample data if necessary) Reasoner (optional invocation) (only executed once per knowledge base) SPARQL Endpoint Background Knowledge 2. obtain axiom type and entity specific data Background Knowledge + Relevant Instance Data The Linked Data Life-Cycle 2013-08-23 184 / 252
  219. 219. 3 Steps to get a schema 3-Phase Enrichment Learning Approach: Input: Entity URI, 1. obtain schema Axiom Type, information Knowledge Base (SPARQL Endpoint) Lehmann, Bühmann (Univ. Leipzig) (sample data if necessary) Reasoner Learner DL-Learner Enrichment Ontology (optional invocation) (only executed once per knowledge base) SPARQL Endpoint Background Knowledge 2. obtain axiom type and entity specific data Background 3. run machine learning Knowledge algorithm + Relevant Instance Data The Linked Data Life-Cycle 2013-08-23 List of Axiom Suggestions + Metadata 185 / 252
  220. 220. 3 Steps to get a schema 3-Phase Enrichment Learning Approach: Input: Entity URI, 1. obtain schema Axiom Type, information Knowledge Base (SPARQL Endpoint) Lehmann, Bühmann (Univ. Leipzig) (sample data if necessary) Reasoner Learner DL-Learner Enrichment Ontology (optional invocation) (only executed once per knowledge base) iterate over all axiom types and schema entities for full enrichment SPARQL Endpoint Background Knowledge 2. obtain axiom type and entity specific data Background 3. run machine learning Knowledge algorithm + Relevant Instance Data The Linked Data Life-Cycle 2013-08-23 List of Axiom Suggestions + Metadata 186 / 252
  221. 221. Starting Point http://dbpedia.org/sparql http://dbpedia.org/ontology/author SPARQL endpoint: Entity URI: Axiom Type: Object Property Domain Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 187 / 252
  222. 222. Step 1 - Obtaining Schema Information Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
  223. 223. Step 1 - Obtaining Schema Information CONSTRUCT WHERE { ? sub r d f s : s u b C l a s s O f ? sup . } ORDER BY DESC( ? sub ) LIMIT 1000 OFFSET 1000 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
  224. 224. Step 1 - Obtaining Schema Information CONSTRUCT WHERE { ? sub r d f s : s u b C l a s s O f ? sup . } ORDER BY DESC( ? sub ) LIMIT 1000 OFFSET 1000 dbo : D i s e a s e dbo : Book dbo : WrittenWork dbo : Work dbo : P h i l o s o p h e r dbo : P e r s o n dbo : Agent dbo : S p o r t dbo : A c t i v i t y dbo : F i s h rdfs rdfs rdfs rdfs rdfs rdfs rdfs rdfs rdfs rdfs Lehmann, Bühmann (Univ. Leipzig) : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf owl : Thing . dbo : WrittenWork . dbo : Work . owl : Thing . dbo : P e r s o n . dbo : Agent . owl : Thing . dbo : A c t i v i t y . owl : Thing . dbo : Animal . The Linked Data Life-Cycle 2013-08-23 188 / 252
  225. 225. Step 2 - Obtain axiom type and entity specic data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  226. 226. Step 2 - Obtain axiom type and entity specic data SELECT ? t y p e (COUNT( DISTINCT ? s ) AS ? c n t ) WHERE { ? s dbo : a u t h o r ? o . ? s a ? type . } GROUP BY ? t y p e ORDER BY DESC( ? c n t ) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  227. 227. Step 2 - Obtain axiom type and entity specic data SELECT ? t y p e (COUNT( DISTINCT ? s ) AS ? c n t ) WHERE { ? s dbo : a u t h o r ? o . ? s a ? type . } GROUP BY ? t y p e ORDER BY DESC( ? c n t ) type cnt owl:Thing 30284 dbo:Work 30284 schema:CreativeWork 30284 dbo:WrittenWork 25730 dbo:Book 24673 schema:Book 24673 dbo:TelevisionShow 2567 dbo:Play 1057 . . . Lehmann, Bühmann (Univ. Leipzig) . . . The Linked Data Life-Cycle 2013-08-23 189 / 252
  228. 228. Step 2 - Obtain axiom type and entity specic data CONSTRUCT WHERE { ? i n d dbo : a u t h o r ? o . ? ind a ? type . } ORDER BY DESC( ? i n d ) LIMIT 1000 OFFSET 2000 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  229. 229. Step 2 - Obtain axiom type and entity specic data CONSTRUCT WHERE { ? i n d dbo : a u t h o r ? o . ? ind a ? type . } ORDER BY DESC( ? i n d ) LIMIT 1000 OFFSET 2000 . . . d b p e d i a : The_Adventures_of_Tom_Sawyer dbo : a u t h o r d b p e d i a : Mark_Twain ; rdf : type dbo : Book . d b p e d i a : The_Zombie_Survival_Guide dbo : a u t h o r d b p e d i a : Max_Brooks ; rdf : type dbo : WrittenWork . d b p e d i a : Web_Therapy dbo : a u t h o r d b p e d i a : Lisa_Kudrow ; rdf : type dbo : Book . . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  230. 230. Step 3 - Scoring d b p e d i a : The_Adventures_of_Tom_Sawyer dbo : a u t h o r d b p e d i a : Mark_Twain ; rdf : type dbo : Book . d b p e d i a : The_Zombie_Survival_Guide dbo : a u t h o r d b p e d i a : Max_Brooks ; rdf : type dbo : WrittenWork . d b p e d i a : Web_Therapy dbo : a u t h o r d b p e d i a : Lisa_Kudrow ; rdf : type dbo : Book . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  231. 231. Step 3 - Scoring d b p e d i a : The_Adventures_of_Tom_Sawyer dbo : a u t h o r d b p e d i a : Mark_Twain ; rdf : type dbo : Book . d b p e d i a : The_Zombie_Survival_Guide dbo : a u t h o r d b p e d i a : Max_Brooks ; rdf : type dbo : WrittenWork . d b p e d i a : Web_Therapy dbo : a u t h o r d b p e d i a : Lisa_Kudrow ; rdf : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 1 3 ≈ 33.3% 2013-08-23 190 / 252
  232. 232. Step 3 - Scoring d b p e d i a : The_Adventures_of_Tom_Sawyer dbo : a u t h o r d b p e d i a : Mark_Twain ; rdf : type dbo : Book . d b p e d i a : The_Zombie_Survival_Guide dbo : a u t h o r d b p e d i a : Max_Brooks ; rdf : type dbo : WrittenWork . d b p e d i a : Web_Therapy dbo : a u t h o r d b p e d i a : Lisa_Kudrow ; rdf : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= dbo : Book 1 3 ≈ 33.3% r d f s : s u b C l a s s O f dbo : WrittenWork . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  233. 233. Step 3 - Scoring d b p e d i a : The_Adventures_of_Tom_Sawyer dbo : a u t h o r d b p e d i a : Mark_Twain ; rdf : type dbo : Book . d b p e d i a : The_Zombie_Survival_Guide dbo : a u t h o r d b p e d i a : Max_Brooks ; rdf : type dbo : WrittenWork . d b p e d i a : Web_Therapy dbo : a u t h o r d b p e d i a : Lisa_Kudrow ; rdf : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= dbo : Book 1 3 ≈ 33.3% r d f s : s u b C l a s s O f dbo : WrittenWork . Score(Domain(dbo:author, dbo:WrittenWork))= Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 3 3 = 100% 2013-08-23 190 / 252
  234. 234. Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  235. 235. Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Solution: Average of 95% condence interval (Wald method) s p = m+2 +4 min(1, p + 1.96 · p ·(1−p ) ) max(0, p − 1.96 · m +4 − #success m − #total s p ·(1−p ) ) m +4 In 95% of the intervals the true value is between ... and ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  236. 236. Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Solution: Average of 95% condence interval (Wald method) s p = m+2 +4 min(1, p + 1.96 · p ·(1−p ) ) max(0, p − 1.96 · m +4 − #success m − #total s p ·(1−p ) ) m +4 In 95% of the intervals the true value is between ... and ... Score(Domain(dbo:author, dbo:Book))≈ 57.3% Score(Domain(dbo:author, dbo:WrittenWork))≈ 69.1% Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  237. 237. More Complex Axioms Pattern Based Knowledge Base Enrichment, ISWC 2013 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 192 / 252
  238. 238. Outlook and Summary Schema in the Linked Data Web often shallow support knowledge engineers → tools needed to Showed some techniques for learning OWL axioms on large knowledge bases available as SPARQL endpoints More complex aioms require: OWL-SPARQL rewriting or Fragment extraction Small- and medium sized knowledge bases can be handled via techniques from Inductive Logic Programming All algorithms implemented in DL-Learner framework (http://dl-learner.org) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 193 / 252
  239. 239. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 194 / 252
  240. 240. Motivation increasing number of knowledge bases in the Semantic Web (see e.g. LOD cloud) maintenance of knowledge bases with expressive semantics is challenging Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 195 / 252
  241. 241. (Automatically) Detectable Ontology Problems Common problems: Syntactic Problems Structural Problems Semantic Problems (focus of talk) Task Based Problems: Reasoning Related Problems Linked Data Related Problems Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 196 / 252
  242. 242. Syntactic Problems Syntactic errors are mainly violations of conventions of the language in which the ontology is modelled. Example (Validity of XML) ? x m l v e r s i o n = 1 . 0 ? r d f : R D F x m l n s : r d f = h t t p : / /www . w3 . o r g /1999/02/22 − r d f − s y n t a x −n s# x m l n s : d c= h t t p : / / p u r l . o r g / d c / e l e m e n t s / 1 . 1 / r d f : D e s c r i p t i o n r d f : a b o u t = h t t p : / /www . w3 . o r g / d c : t i t l e W o r l d Wide Web C o n s o r t i u m/ d c : t i t l e / r d f : R D F FatalError: The element type rdf:Description must be terminated by the matching end-tag /rdf:Description.[Line = 7, Column = 3] Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 197 / 252
  243. 243. Structural Problems Problems in the taxonomy Example (Circularities) A Lehmann, Bühmann (Univ. Leipzig) B, B C, C A The Linked Data Life-Cycle 2013-08-23 198 / 252
  244. 244. Reasoning Related Problems Problems which negatively aect the performance of reasoning over expressive knowledge bases Example (A named concept is equivalent to an AllValues restriction) A ≡ ∀r .C Reasoning complexity: Universal restriction does not require to have a property value but only restricts the values for existing property values Any concept B for which instances cannot have r -llers satises the restriction, i.e. B ∀r .C , and becomes a subclass of A Typically leads to unintended inferences and additional inferences may eventually slow down reasoning performance Can be checked via Pellint (part of Pellet) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 199 / 252
  245. 245. Linked Data Related Problems Problems which are the specic to publishing RDF using the Linked Data principles Incorrect implementation of content negotiation Mixing up information and non-information resources Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 200 / 252
  246. 246. Semantic Problems Logical contradictions in the underlying knowledge base Example (Unsatisable classes) O = {A B C, C ¬B } |= A ⊥ Example (Inconsistent ontology) O = {A B C, C ¬B , A(x )} |= ⊥ Usually handled by Ontology Debugging Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 201 / 252
  247. 247. Ontology Debugging Problem: We have undesirable entailments Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  248. 248. Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  249. 249. Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Question: Which axioms? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  250. 250. Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Question: Which axioms? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  251. 251. Justication Justication For an ontology O and an entailment η where O |= η , a set of axioms J η in O if J ⊆ O, J |= η and if J ⊂ J then J |= η . is a justication for Minimal subsets of an ontology that are sucient for a given entailment to hold Synonyms: MUPS (Minimal Unsatisability Preserving Sub-TBoxes), MinAs (Minimal Axiom sets), Kernels Observations: there can be multiple justications for a single entailment an axiom can be part of multiple justications Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 203 / 252
  252. 252. Justication - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C (2) (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  253. 253. Justication - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C (2) |= A ⊥ (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  254. 254. Justication - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C J1 = {1, 2, 3} (2) |= A ⊥ (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  255. 255. Justication - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C J1 = {1, 2, 3} (2) |= A ⊥ J2 = {5, 6} (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  256. 256. Justication - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C (5) F J1 = {1, 2, 3} (2) (6) |= A ⊥ J2 = {5, 6} J3 = {3, 4} } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  257. 257. Justication Based Repair For a repair, at least one axiom from every justication needs to be removed. For a repair plan, all justications are needed. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 205 / 252
  258. 258. Justication Algorithms Single justication: Glass Box: Modifying underlying reasoning algorithm (tableau tracing) Black-Box: Using reasoner as oracle All justications: Reiter's Hitting Set Tree Algorithm (HST) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 206 / 252
  259. 259. Black-Box Expansion-Contraction Strategy Expansion: Add axioms to empty set until entailment holds Contraction: Remove axioms from set such that set becomes minimal CHAPTER 3. COMPUTING JUSTIFICATIONS 54 and entailment still can be derived. Expansion Contraction Key: Axiom Axiom in justification Selected axiom Figure 3.1: A Depiction of a Black-Box Expand-Contract Strategy Source: M. Horridge:Justication 3.2 Based Explanation Black-Box Algorithms for Computing Sin- in Ontologies(PhD Thesis) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 207 / 252
  260. 260. Hitting Set Tree Algorithm from eld of Model Based Diagnosis given a faulty system (ontology), it constructs nite tree whose nodes are labelled with conict sets (justications), and whose edges are labelled with components (axioms) nds all minimal hitting sets, which represent diagnoses for the conict sets in the system diagnosis = repair Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 208 / 252
  261. 261. CHAPTER 3. COMPUTING JUSTIFICATIONS 63 Hitting Set Tree Algorithm - Example O = {A B Figure 3.2: An Example of a Hitting Set Tree B D A ∃R .C ∃R . J2 = {A D} J1 = {A |= A ∃R.C, ∃R. A ∃R.C {} B, B D} D A B B D J2 = {A D} ∃R. {} D A ∃R.C {} ∃R. ∃R.C, ∃R. D} D {} Source: M. Horridge:Justication Based Explanation in Ontologies(PhD bottom right hand successor to the node labelled with J2 and whose successor Thesis) Lehmann, Bühmann (Univ. Leipzig) 2013-08-23 209 / 252 edge is labelled with ∃R. The Linked Data Life-Cycle by considering O S where D was generated
  262. 262. Justication Scenarios A user can be faced with the following situations: Small number of small justications Easy and pleasant to inspect Small number of large justications Better than alternatives Large number of justications Pretty hopeless with current mechanisms Idea: Find source of unsatisability Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 210 / 252
  263. 263. Root Unsatisability - Denitions A root UC is a class whose unsatisability does not depend on another class, otherwise it is a derived UC. A derived UC for which there is some justication that is not a strict superset of a justication for another UC is a partial derived UC. Root Unsatisable Class A class A is a root unsatisable class if there is no justication such that J J |= A is a strict superset of a justication for some other ⊥ unsatisable class. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 211 / 252
  264. 264. Root Unsatisability - Approaches Approaches: 1: compute all justications for each unsatisable class and apply the denition → computationally often too expensive 2: heuristics for structural analysis of axioms Debugging Unsatisable Classes in OWL Ontologies, Kalyanpur, Parsia, Sirin, Hendler, J. Web Sem, 2005. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 212 / 252
  265. 265. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C (2) (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  266. 266. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ (3) (4) E A C (2) (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  267. 267. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) |= A ⊥ J2 = {5, 6} J3 = {3, 4} (4) E A C (2) J1 = {1, 2, 3} (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  268. 268. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ |= B J2 = {5, 6} ⊥ (3) J3 = {3, 4} (4) E A C (2) J1 = {1, 2, 3} (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  269. 269. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ |= B ⊥ (3) J2 = {5, 6} J3 = {3, 4} (4) E A C (2) J1 = {1, 2, 3} (5) F (6) J4 = {1, 2} } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  270. 270. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ |= B ⊥ (3) J2 = {5, 6} J3 = {3, 4} (4) E A C (2) J1 = {1, 2, 3} (5) F (6) J4 = {1, 2} root } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  271. 271. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ |= B ⊥ (3) J2 = {5, 6} partial J4 = {1, 2} root J3 = {3, 4} (4) E A C (2) J1 = {1, 2, 3} (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  272. 272. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ |= B ⊥ (3) J2 = {5, 6} J3 = {3, 4} partial (J4 ⊂ J1 ) (4) E A C (2) J1 = {1, 2, 3} (5) F (6) J4 = {1, 2} root } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  273. 273. Axiom Relevance resolving justication requires to delete or edit axioms ranking methods highlight the most probable causes for problems methods: frequency syntactic relevance semantic relevance Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 214 / 252
  274. 274. Repair Consequences after repairing process, axioms have been deleted or modied → desired entailments may be lost or new entailments obtained → user can decide to preserve them (including inconsistencies!) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 215 / 252
  275. 275. SPARQL Endpoint Support Previously mentioned approaches are implemented in the ORE tool (http://ore-tool.net) ORE supports using SPARQL endpoints implements an incremental load procedure knowledge base is loaded in small chunks: count number of axioms by type priority based loading procedure e.g. disjointness axioms have higher priority than class assertion axioms uses Pellet incremental reasoning Learning of OWL Class Descriptions on Very Large Knowledge Bases, Hellmann, Lehmann, Auer, Int. Journal Semantic Web Inf. Syst, 2009 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 216 / 252
  276. 276. SPARQL Endpoint Support II algorithm performs sanity checks, e.g. SPARQL queries which probe for typical inconsistent axiom sets can fetch additional Linked Data dierent termination criteria Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 217 / 252
  277. 277. SPARQL Endpoint Support II algorithm performs sanity checks, e.g. SPARQL queries which probe for typical inconsistent axiom sets can fetch additional Linked Data dierent termination criteria overall: ORE allows to apply state-of-the-art ontology debugging methods on a larger scale than was possible previously aims at stronger support for the web aspect of the Semantic Web and the high popularity of Web of Data initiative Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 217 / 252
  278. 278. DBpedia Live Demo Inconsistency in DBpedia Live: Individual: dbr:Purify_(album) Facts: dbo:artist dbr:Axis_of_Advance Individual: dbr:Axis_of_Advance Types: dbo:Organisation Class: dbo:Organisation DisjointWith dbo:Person ObjectProperty: dbo:artist Range: dbo:Person Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 218 / 252

×