The Linked Data Life-Cycle
Upcoming SlideShare
Loading in...5
×
 

The Linked Data Life-Cycle

on

  • 922 views

Presentation of the Linked Data Lifecycle given at the ICCL Summer School 2013.

Presentation of the Linked Data Lifecycle given at the ICCL Summer School 2013.

Statistics

Views

Total Views
922
Views on SlideShare
922
Embed Views
0

Actions

Likes
2
Downloads
38
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Linked Data Life-Cycle The Linked Data Life-Cycle Presentation Transcript

  • The Linked Data Life-Cycle Jens Lehmann Lorenz Bühmann contributors: Quan Nguyen Sören Auer Richard Cyganiak Daniel Gerber Sebastian Hellmann Anja Jentzsch Dimitris Kontokostas Axel Ngonga Claus Stadler Christina Unger 2013-08-23 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 1 / 252
  • Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 2 / 252
  • Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 3 / 252 View slide
  • The Linked Data Principles The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. Linked Data principles: 1 Use URIs as names for things. 2 Use HTTP URIs, so that people can look up those names. 3 When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL). 4 Include links to other URIs, so that they can discover more things. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 4 / 252 View slide
  • LOD Cloud Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 5 / 252
  • Linked Data Principles Detailed: 1 + 2 1 URI references to identify not just Web documents and digital content, but also real world objects and abstract concepts tangible things: people, places abstract things: relationship type of knowing somebody 2 HTTP URIs enable re-use of Web architecture  Linked Data gives emphasis to the Web in Semantic Web Resource dereferencing Re-use of standard tools for security, load-balancing etc. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 6 / 252
  • Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: HTML for humans, RDF for machines Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 7 / 252
  • Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: HTML for humans, RDF for machines Achievable using an HTTP mechanism called content negotiation Basic idea: HTTP client sends HTTP headers with each request to indicate what kinds of documents they prefer Servers can inspect headers and select appropriate response Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 7 / 252
  • Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: HTML for humans, RDF for machines Achievable using an HTTP mechanism called content negotiation Basic idea: HTTP client sends HTTP headers with each request to indicate what kinds of documents they prefer Servers can inspect headers and select appropriate response Two strategies: 303 URIs Hash URIs Both ensure that objects and the documents that describe them are not confused + humans and machines can retrieve appropriate representations Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 7 / 252
  • 303 URIs 303 Redirect: instead of sending the object itself over the network, the server responds to the client with the HTTP response code 303 See Other and the URI of a Web document which describes the real-world object Second step: client dereferences new URI and gets a Web document describing the real-world object Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 8 / 252
  • Hash URIs Hash URI strategy builds on characteristic that URIs may contain a special part (fragment identier) separated from their base part by a hash symbol (#) HTTP protocol requires the fragment part to be stripped o before requesting the URI from the server → a URI that includes a hash cannot be retrieved directly and therefore does not necessarily identify a Web document Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 9 / 252
  • Hash versus 303 Hash Uris (+) Reduced number of necessary HTTP round-trips → reduces access latency (-) Descriptions of all resources sharing the same non-fragment URI part are always returned to the client together → can lead to large amounts of data being unnecessarily transmitted to the client 303 Uris (+) Flexible because the redirection target can be congured separately for each resource (usually points to a single document for each resource, but could also summarise several resources) (-) Requires two HTTP requests to retrieve a single description of a real-world object Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 10 / 252
  • Principles Detailed: 4 Links If an RDF triple connects URIs in dierent namespaces/datasets, is is called a link (no unique syntactical denition of link exists) Basic idea of Linked Data: apply the general hyperlink-based architecture of the World Wide Web to the task of sharing structured data on global scale Research challenge: ecient creation of links with high precision and recall Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 11 / 252
  • Why Linked Data? Problem: Try to search for these things on the current Web: Apartments near German-Russian bilingual childcare in Leipzig. ERP service providers with oces in Vienna and London. Researchers working on multimedia topics in Eastern Europe. Information is available on the Web, but opaque to current Web search. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 12 / 252
  • Why Linked Data? Problem: Try to search for these things on the current Web: Apartments near German-Russian bilingual childcare in Leipzig. ERP service providers with oces in Vienna and London. Researchers working on multimedia topics in Eastern Europe. Information is available on the Web, but opaque to current Web search. Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from dierent sources: Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 13 / 252
  • How to get there? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 14 / 252
  • Tim Berners-Lee's 5-star plan Tim Berners-Lee's 5-star plan for an open web of data Make data available on the Web under an open license Make it available as structured data Use a non-proprietary format Use URIs to identify things Link your data to other people's data to provide context Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 15 / 252
  • The 0th star Data catalog with good metadata Make your data ndable Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 16 / 252
  • Data on the Web, Open License ���������� ���� �������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 17 / 252
  • Data on the Web, Open License Open vs. Closed: Data used to be closed by default In the future, it may be open by default. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 18 / 252
  • Data on the Web, Open License Publishers: sharing data to make it more visible Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 19 / 252
  • Data on the Web, Open License E-Commerce: Data sharing for increasing trac Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 20 / 252
  • Data on the Web, Open License Community: Collaboratively created databases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 21 / 252
  • Good reasons against opening data Privacy Competitive advantage Producing data and charging for it as business model Can't get license from upstream Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 22 / 252
  • Structured Data Enabling re-use: Delivering data to end users in dierent forms Combining data with other data 3rd party analysis of data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 23 / 252
  • Structured Data Formats: Good for re-use / Structured: MS Excel, CSV, XML, JSON, Microdata Not so good for re-use: Pure websites, MS Word Bad for re-use: PDF Really bad for re-use: Only charts/maps without numbers Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 24 / 252
  • �������� �������������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 25 / 252
  • Non-Proprietary Formats Specialist tools often have specialist formats Few people have the tools Expensive Dicult to re-use (Geospatial tools, statistics packages, etc.) Non-proprietary: CSV (dead simple) XML JSON RDF (good for 4+5 stars) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 26 / 252
  • URIs as Identiers ������������������������������������������������������������������������ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 27 / 252
  • URIs as Identiers ������������������������������������������������������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 28 / 252
  • URIs as Identiers URI-Design: prefer stable, implementation independent URIs Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 29 / 252
  • URIs as Identiers Turning local identiers into URIsWhy? Make them globally unique Clarify auhority Make them resolvable Make them linkable Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 30 / 252
  • Links to Other Data Hyperlinks are the soul of the Web. The Web of Data is no dierent. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 31 / 252
  • Links to Other Data Hyperlinks are the soul of the Web. The Web of Data is no dierent. ������� ����������������������������� �������� ���� ����� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 31 / 252
  • Summary Linked Data Principles: 1 Use URIs to name things (not only documents, but also people, locations, concepts, etc.) 2 To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs 3 When someone looks up a URI, provide useful information (structured data in RDF, SPARQL). 4 Include links to other URIs allowing agents to discover more things Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 32 / 252
  • Summary Linked Data Principles: 1 Use URIs to name things (not only documents, but also people, locations, concepts, etc.) 2 To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs 3 When someone looks up a URI, provide useful information (structured data in RDF, SPARQL). 4 Include links to other URIs allowing agents to discover more things 5-Star-Data: Five-star plan for realising an emerging web of data, dataset by dataset 2 stars: re-usable data 3 stars: open standards 4+5 stars: connect data silos Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 32 / 252
  • Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 33 / 252
  • DBpedia Community eort to extract structured information from Wikipedia and to make this information available on the Web Allows to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data Semi-structured Wiki markup → structured information Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 34 / 252
  • Wikipedia Limitations Simple Questions  hard to answer with Wikipedia: What have Innsbruck and Leipzig in common? Who are mayors of central European towns elevated more than 1000m? Which movies are starring both Brad Pitt and Angelina Jolie? All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 35 / 252
  • Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguation ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 36 / 252
  • DBpedia Information Extraction Framework DBpedia Information Extraction Framework (DIEF) Started in 2007 Hosted on Sourceforge and Github Initially written in PHP but fully re-written Written in Scala and Java Around 40 Contributors See https://www.ohloh.net/p/dbpedia for detailed overview Can potentially be adapted to other MediaWikis Currently Wiktionary http://wiktionary.dbpedia.org Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 37 / 252
  • DIEF - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 38 / 252
  • DIEF - Raw Infobox Extractor WikiText syntax {{Infobox Korean settlement |title = Busan Metropolitan City ... |area_km2 = 763.46 |pop = 3635389 |region = [[Yeongnam]] }} RDF serialization dbp:Busan dbp:title "Busan Metropolitan City" dbp:Busan dbp:area_km2 "763.46"^xsd:oat dbp:Busan dbp:pop "3635389"^xsd:int dbp:Busan dbp:region dbp:Yeongnam Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 39 / 252
  • DIEF - Raw Infobox Extractor/Diversity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 40 / 252
  • DIEF - Raw Infobox extractor/Diversity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 41 / 252
  • DIEF - Mapping-Based Infobox Extractor Cleaner data: Combine what belongs together (birth_place, birthplace) Separate what is dierent (bornIn, birthplace) Correct handling of datatypes Mappings Wiki: http://mappings.dbpedia.org Everybody can contribute to new mappings or improve existing ones ≈ 170 editors Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 42 / 252
  • DIEF - Mapping-Based Infobox Extractor Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 43 / 252
  • URI/IRI schemes http://{lang.}dbpedia.org is the main domain For every article there exists a DBpedia resource in the form: http://lang.dbpedia.org/resource/{ArticleName} Properties from the raw infobox extractor use the http://{lang.}dbpedia.org/property/namespace Ontology is global for all languages and under http://dbpedia.org/ontology/namespace Note: that for English language no language code is used http://dbpedia.org as main domain http://dbpedia.org/resource/{title} for articles http://dbpedia.org/property/{title} for properties Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 44 / 252
  • Linked Data Publication via 303 Redirects http://dbpedia.org/resource/Dresden - URI of the city of Dresden http://dbpedia.org/page/Dresden - information resource describing the city of Dresden in HTML format http://dbpedia.org/data/Dresden - information resource describing the city of Dresden in RDF/XML format further formats supported, e.g. http://dbpedia.org/data/Dresden.n3 for N3 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 45 / 252
  • DBpedia Links Data set Predicate Count Tool Amsterdam Museum owl:sameAs 627 S BBC Wildlife Finder owl:sameAs 444 S Book Mashup rdf:type 9 100 owl:sameAs Bricklink dc:publisher 10 100 CORDIS owl:sameAs 314 S Dailymed owl:sameAs 894 S DBLP Bibliography owl:sameAs 196 S DBTune owl:sameAs 838 S Diseasome owl:sameAs 2 300 S Drugbank owl:sameAs 4 800 S EUNIS owl:sameAs 3 100 S Eurostat (Linked Stats) owl:sameAs 253 S Eurostat (WBSG) owl:sameAs 137 CIA World Factbook owl:sameAs 545 S Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 46 / 252
  • DBpedia Links Data set Predicate Count Tool ickr wrappr dbp:hasPhoto- 3 800 000 C Collection Freebase owl:sameAs 3 600 000 C GADM owl:sameAs 1 900 GeoNames owl:sameAs 86 500 S GeoSpecies owl:sameAs 16 000 S GHO owl:sameAs 196 L Project Gutenberg owl:sameAs 2 500 S Italian Public Schools owl:sameAs 5 800 S LinkedGeoData owl:sameAs 103 600 S LinkedMDB owl:sameAs 13 800 S MusicBrainz owl:sameAs 23 000 New York Times owl:sameAs 9 700 OpenCyc owl:sameAs 27 100 C OpenEI (Open Energy) owl:sameAs 678 S Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 47 / 252
  • DBpedia Links Data set Predicate Count Tool Revyu owl:sameAs 6 Sider owl:sameAs 2 000 S TCMGeneDIT owl:sameAs 904 UMBEL rdf:type 896 400 US Census owl:sameAs 12 600 WikiCompany owl:sameAs 8 300 WordNet dbp:wordnet_type 467 100 YAGO2 rdf:type 18 100 000 Sum 27 211 732 (S: Silk, L: LIMES, C: custom script, missing: no regeneration) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 48 / 252
  • DBpedia Links - Query Example Compare funding per year (from FTS) and country with the gross domestic product of a country (from DBpedia) SELECT ∗ { { SELECT ? f t s y e a r ? d b p c o u n t r y (SUM( ? amount ) AS ? f u n d i n g ) { ?com r d f : t y p e f t s −o : Commitment . ?com f t s −o : y e a r ? y e a r . ? y e a r r d f s : l a b e l ? f t s y e a r . ? b e n e f i t f t s −o : d e t a i l A m o u n t ? amount . ? b e n e f i t f t s −o : b e n e f i c i a r y ? b e n e f i c i a r y . ? b e n e f i c i a r y f t s −o : c o u n t r y ? f t s c o u n t r y . ? f t s c o u n t r y owl : sameAs ? d b p c o u n t r y . } } { SELECT ? d b p c o u n t r y ? g d p y e a r ? g d p n o m i n a l { ? d b p c o u n t r y r d f : t y p e dbo : C o u n t r y . ? d b p c o u n t r y dbp : gdpNominal ? g d p n o m i n a l . ? d b p c o u n t r y dbp : gdpNominalYear ? g d p y e a r . } } FILTER ( ( ? f t s y e a r = s t r ( ? g d p y e a r ) ) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 49 / 252
  • Infrastructure DBpedia has two extraction modes: Wikipedia-database-dump-based extraction DBpedia Live synchronisation (more later) DBpedia Dumps: The DBpedia Dump archive is located in: http://downloads.dbpedia.org/ Latest downloads is described in: http://dbpedia.org/Downloads Ocial Endpoint (by OpenLink): http://dbpedia.org/sparql Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 50 / 252
  • Query Answering Back to our Wikipedia questions: What have Innsbruck and Leipzig in common? Who are mayors of central European towns elevated more than 1000m? Which movies are starring both Brad Pitt and Angelina Jolie? All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants Using the data extracted from Wikipedia and the public SPARQL endpoint DBpedia can answer these questions. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 51 / 252
  • DBpedia Live DBpedia dumps are generated on a bi-annual basis Wikipedia has around 100,000  150,000 page edits per day DBpedia Live pulls page updates in real-time and extraction results update the triple store In practice, a 5 minute update delay increases performance by 15% Links SPARQL Endpoint: http://live.dbpedia.org/sparql Documentation: http://wiki.dbpedia.org/DBpediaLive Statistics: http://live.dbpedia.org/LiveStats/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 52 / 252
  • DBpedia Live - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 53 / 252
  • DBpedia Internationalization (I18n) DBpedia Internationalization Committee founded: http://wiki.dbpedia.org/Internationalization Available DBpedia language editions in: Korean, Greek, German, Polish, Russian, Dutch, Portuguese, Spanish, Italian, Japanese, French Use the corresponding Wikipedia language edition for input Mappings available for 23 languages Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 54 / 252
  • DBpedia I18n - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 55 / 252
  • Applications: Disambiguation Named entity recognition and disambiguation Tools such as: DBpedia Spotlight, AlchemyAPI, Semantic API, Open Calais, Zemanta and Apache Stanbol Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 56 / 252
  • Applications: Question Answering DBpedia is the primary target for several QA systems in the Question Answering over Linked Data (QALD) workshop series IBM Watson relied also on DBpedia Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 57 / 252
  • Applications: Faceted Browsing Neofonie Browser gFacet OpenLink faceted browser (fct) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 58 / 252
  • Applications: Search and Querying Query Builder RelFinder SemLens Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 59 / 252
  • Applications: Digital Libraries & Archives Virtual International Authority Files (VIAF) project as Linked Data VIAF added a total of 250,000 reciprocal authority links to Wikipedia. DBpedia can also provide: Context information for bibliographic and archive records (e.g. an author's demographics, a lm's homepage, an image etc.) Stable and curated identiers for linking. The broad range of Wikipedia topics can form the basis for a thesaurus for subject indexing. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 60 / 252
  • Applications: DBpedia Mobile DBpedia Mobile is a location-centric DBpedia client application for mobile devices consisting of a map view, the Marbles Linked Data Browser and a GPS-enabled launcher application. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 61 / 252
  • Applications: DBpedia Wiktionary Wiktionary is a Wikimedia project: http://wiktionary.org 171 languages, 3M words for English. Extracted Using the DBpedia Information Extraction Framework Easily congurable for every Wiktionary language edition Pre-congured for German, Greek, English, Russian and French. http://Wiktionary.dbpedia.org 100 milion triples Lemon model Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 62 / 252
  • Other Applications See http://wiki.dbpedia.org/Applications for a more complete list Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 63 / 252
  • Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 64 / 252
  • Linked Data - Achievements and Challenges Achievements: 1 Extension of the Web with a data commons (50B facts) 2 vibrant, global RTD community 3 Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly, NY Times, Facebook, Google, Yahoo) 4 Governmental adoption in sight 5 Establishing Linked Data as a deployment path for the Semantic Web. Challenges: 1 Coherence: Relatively few, expensively maintained links 2 Quality: partly low quality data and inconsistencies 3 Performance: Still substantial penalties compared to relational 4 Data consumption: large-scale processing, schema mapping and data fusion still in its infancy 5 Usability: Missing direct end-user tools and network eect. These issues are closely related and should ultimately lead to an ecosystem of interlinked knowledge! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 65 / 252
  • Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 66 / 252
  • Extraction Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 67 / 252
  • Extraction From unstructured sources Formats: plain text Methods: NLP, text mining, ontology learning From semi-structured sources Formats: wiki markup, tags Tools: DBpedia framework (Wikipedia, Wictionary) From structured sources Formats: databases, spreadsheets, XML RDB2RDF tools: Sparqlify, D2R, Triplify CSV converters: RDF extension of Google Rene Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 68 / 252
  • Extraction Challenges From unstructured sources Improve F-Measure of existing NLP approaches (OpenCalais, Ontos API) Develop standardized, LOD enabled interfaces between NLP tools (NLP2RDF) From semi-structured sources Ecient bi-directional synchronization From structured sources Declarative syntax and semantics of data model transformations (W3C WG RDB2RDF) Orthogonal challenges Using LOD as background knowledge Provenance Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 69 / 252
  • 1234567859A8BC74DE96 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 70 / 252
  • RDF Data Management From unstructured sources SPARQL RDF access still by a factor 2-10 slower than relational data management Performance increases steadily Comprehensive, well-supported open-soure and commercial implementations are available: OpenLink's Virtuoso (os+commercial) OWLIM-Lite (free), OWLIM-SE, OWLIM-Enterprise Talis (hosted) Bigdata (distributed) Allegrograph (commercial) Mulgara (os) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 71 / 252
  • Storage and Querying Challenges Reduce the performance gap between relational and RDF data management SPARQL Query extensions: Spatial/semantic/temporal data management View maintenance / adaptive reorganization based on common access patterns More realistic benchmarks Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 72 / 252
  • Authoring Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 73 / 252
  • Authoring Integrated in Existing Environments: Tiki Data oriented: RDFauthor, rdfEditor Schema oriented: Protégé, TopBraid Composer, NeOn Toolkit, Swoop, Neologism, Knoodl Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 74 / 252
  • Authoring: Semantic Wikis 1 Semantic (Text) Wikis Authoring of semantically annotated texts Semantic MediaWiki, KiWi, (Wikipedia+DBpedia) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 75 / 252
  • Authoring: Semantic Wikis 1 Semantic (Text) Wikis Authoring of semantically annotated texts Semantic MediaWiki, KiWi, (Wikipedia+DBpedia) 2 Semantic Data Wikis Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL) OntoWiki Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 75 / 252
  • 123345647347829A2B8CDDB2EFCC22F 1234235 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 76 / 252
  • Interlinking Data Web is an uncontrolled environment  proliferation of equivalent or similar entities  need for links / merging Currently only few RDF triples are links Manual Link Discovery: Sindice Integration, LODStats, Semantic Pingback Tool supported / Semi-Automatic: SILK, LIMES, COMA, RDF-AI Usually via mapping specications / heuristics Machine Learning / Automatic: RAVEN, EAGLE, SILK GP Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 77 / 252
  • Interlinking Challenges Apply work in the de-duplication/record linkage literature Consider the open world nature of Linked Data Use LOD background knowledge Zero-conguration linking Explore active learning approaches, which integrate users in a feedback loop Maintain a 24/7 linking service: Linked Open Data Around-The-Clock project (http://latc-project.eu/) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 78 / 252
  • 1234567829 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 79 / 252
  • Enrichment Currently, lack of knowledge bases with sophisticated schema information and instance data adhering to this schema Goal: powerful reasoning, consistency checking and querying Manual: Via ontology editors, DBpedia mappings (Semi-)Automatic: DL-Learner, Statistical Schema Induction Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 80 / 252
  • Enrichment: Example Given: knowledge base with property birthPlace (i.e. triples using that property) but no information on the semantics of birthPlace Possibly enrichment: ObjectProperty: birthPlace Characteristics: Functional Domain: Person Range: Place SubPropertyOf: hasBeenAt Benets: axioms serve as documentation for purpose and correct usage of schema elements additional implicit information can be inferred improve the applicability of schema debugging techniques Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 81 / 252
  • Repair Ontology Debugging: OWL reasoning to detect inconsistencies and satisable classes + detect the most likely sources for the problems basic task: provide feedback to user for resolving undesired entailments justication J ⊆ O of an entailment is a minimal set of axioms from which the entailment can be drawn Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 82 / 252
  • 1234567 89347A5A Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 83 / 252
  • Linked Data Quality Analysis Quality on the Data Web is varying a lot Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia) Quality = Fitness for use Often not necessary to x all problems, but to know about them 30+ quality dimensions dened in recent survey Research Challenge Establish measures for assessing the authority, provenance, reliability of Data Web resources Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 84 / 252
  • Evolution © CC-BY-SA by alasis on flickr) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 85 / 252
  • KB Evolution Tasks: Performing knowledge base changes / refactoring Ensuring consistency of related knowledge Managing changes, e.g. undo operations Update materialized inferred data upon changes Update materialised links to other data upon changes Tools: Protégé - PROMPT and change management plugins EvoPat - easily re-usable and sharable evolution patterns dened via SPARQL PatOMat - ontology transformation framework Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 86 / 252
  • 1234567895A Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 87 / 252
  • Exploration RDF data can be complex (as discussed by Pascal Hitzler) Exploration phase aims to make data accessible to non-experts Options: Faceted Browsing Question Answering Query Builders Visualisation of statistical or geospatial data . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 88 / 252
  • Catalogus Professorum Lipsiensis Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 89 / 252
  • Visual Query Builder Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 90 / 252
  • Relationship Finder in CPL Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 91 / 252
  • Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 92 / 252
  • Make the Web a Linked Data Washing Machine Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 93 / 252
  • Tool Support for Life-Cycle? Many SW tools support one or more life-cycle stages Linked Data Stack (http://stack.linkeddata.org) provides a consolidated repository of such tools Each tool is a Debian package Lightweight integration between tools via common vocabularies and SPARQL Demonstrator interfaces for showing tools in combination Developed by LOD2 and GeoKnow EU projects Geo Know Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 94 / 252
  • Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 95 / 252
  • Knowledge Extraction Knowledge Extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. Resulting knowledge needs to be in a machine-readable and machine-interpretable format and facilitate inferencing Similar to Information Extraction (NLP) and ETL (Data Warehouse), but main dierence: extraction result goes beyond the creation of structured information or the transformation into a relational schema Requires re-use of existing formal knowledge (reusing ontologies) or the generation of a schema based on the source data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 96 / 252
  • Categorisation of Approaches Source - Examples: plain text, relational databases, XML, CSV Exposition - How is the extracted knowledge made explicit? How can you query and perform inference? Synchronization - Is the knowledge extraction process executed once to produce a dump or is the result synchronized with the source? Are changes to the result written back (Bi-directional)? Reuse of Vocabularies - Can popular ontologies (Good Relations, FOAF, . . . ) be re-used to simplify global data integration? Automatisation - manual, semi-automatic, automatic Domain Ontology Required - Does the approach require a pre-dened ontology or can it create a schema from the source (e.g. ontology learning)? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 97 / 252
  • Extraction from Structured Sources to RDF Simple mappings from RDB tables/views to RDF Direct mapping of the model of relational databases to RDF Table → OWL class Row → Instance s of this class Cell with value o in column p → Triple (s,p,o) Details: http://www.w3.org/TR/rdb-direct-mapping/ Complex mappings of relational databases to RDF Additional renements can be employed to 1:1 mapping to improve the usefulness of RDF output Extract or learn an OWL schema from the given database schema Map the schema and its contents to a pre-existing domain ontology Powerful mapping languages: R2RML, SML XML XML tree structure can be directly converted to RDF graph structure Complex mappings possible, e.g. via XSLT processors Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 98 / 252
  • Extraction from Natural Language Sources 80% of the information in business documents is in unstructured natural language 1 (-) Increased complexity and decreased quality of extraction (+) Potential for a massive acquisition of extracted knowledge Traditional Information Extraction (IE) Recognize and categorise elements in text Techniques: Named Entity Recognition (NER), Coreference Resolution (CO), . . . Ontology Learning (OL) from Text Learn whole ontologies from natural language text Usually (semi-)automatic extracted 1 Wimalasuriya, Dou. "Ontology-based information extraction: [. . . ]" Journal of Information Science Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 99 / 252
  • LinkedGeoData + Sparqlify Example: LinkedGeoData Knowledge Extraction Project using Sparqlify Structure Motivation OpenStreetMap LGD Architecture Mapping Access (How LinkedGeoData is published) Use Cases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 100 / 252
  • Motivation Ease information integration tasks that require spatial knowledge, such as Oerings of bakeries next door Map of distributed branches of a company Historical sights along a bicycle track LOD cloud contains data sets with spatial features e.g. Geonames, DBpedia, US census, EuroStat But: they are restricted to popular or large entities like countries, famous places etc. or specic regions Therefore they lack buildings, roads, mailboxes, etc. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 101 / 252
  • OpenStreetMap - Datamodel Basic entities are: Nodes Latitude, Longitude. Ways Sequence of nodes. Relations Associations between any number of nodes, ways and relations. Every member in a relation plays a certain role. Each entity may be described with tags (= key-value pairs) A way is closed if the ID of the last referenced node equals that of the rst one. Whether a closed way denotes a linear ring or a polygon (i.e. whether the enclosed area is part of the respective OSM entity) depends on the tags. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 102 / 252
  • Example: Leipzig's Zoo Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 103 / 252
  • Comparison: Leipzig's Zoo (OpenStreetMap) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 104 / 252
  • Comparison: Leipzig's Zoo (GoogleMaps) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 105 / 252
  • LGD Architecture Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 106 / 252
  • Tag Mappings Key-value pairs will be assigned to RDF ressources Each pair (k, v ) can be annotated with datatypes, language tags, classes Mappings are themselves tables Example table: lgd_map_literal k property lang name rdfs:label name:en rdfs:label en alt_label skos:altLabel note rdfs:comment . . . . . . . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 107 / 252
  • View Denition RDF mapping of the data from a PostgreSQL database Create View lgd_nodes As Construct { ?n a lgdm:Node . ?n geom:geometry ?g . ?g ogc:asWKT ?o . } With ?n = uri(lgd:node, ?id) ?g = uri(lgd-geom:node, ?id) ?o = typedLiteral(?geom, ogc:wktLiteral) From nodes Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 108 / 252
  • Sparqlify SPARQL-SQL Rewriter Rewrites SPARQL Queries according to the view denition Platform module oers SPARQL Endpoint and Linked Data interface https: //github.com/AKSW/Sparqlify Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 109 / 252
  • Rest-API Oers REST methods for frequent queries Based on SPARQL (Virtuoso) endpoint Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 110 / 252
  • Downloads RDF dataset for download Generated using Construct { ?s ?p ?o } http: //downloads.linkedgeodata.org Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 111 / 252
  • Ontology Enriched classes and properties with multilingual labels from TranslateWiki http://translatewiki.net Imported icons for 90 classes from the freely available icon collection from the SJJB Management http://www.sjjb.co.uk/mapicons/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 112 / 252
  • SML Mapping Examples The following slides demonstrate how to map relational data to RDF with the Sparqlication Mapping Language (SML). Thereby, these prexes are used: Prexes prex IRI rdfs http://www.w3.org/2000/01/rdf-schema# ogc http://www.opengis.net/ont/geosparql# geom http://geovocab.org/geometry# lgd http://linkedgeodata.org/triplify/ lgd-geom http://linkedgeodata.org/geometry/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 113 / 252
  • SML - Mapping Example I: The Goal (1/4) Input Table nodes id geom 1 POINT(0 0) 2 POINT(1 1) How to map tables to RDF? How to introduce the commonly used distinction in GIS between feature and geometry? Aimed for RDF Output @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT "POINT(0 0)"^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT "POINT(1 1)"^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 114 / 252
  • SML - Mapping Example I: SML Syntax Outline (2/4) Input Table nodes id geom 1 POINT(0 0) 2 POINT(1 1) Create View myNodesView As Construct { ... } With ... From ... Aimed for RDF Output @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT "POINT(0 0)"^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT "POINT(1 1)"^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 115 / 252
  • SML - Mapping Example I: Construct and From (3/4) Input Table nodes id geom 1 POINT(0 0) 2 POINT(1 1) Create View myNodesView As Construct { ?n geom:geometry ?g . ?g ogc:asWKT ?o } With ... From nodes Aimed for RDF Output @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT "POINT(0 0)"^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT "POINT(1 1)"^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 116 / 252
  • SML - Mapping Example I: Complete! (4/4) Input Table nodes id geom 1 POINT(0 0) 2 POINT(1 1) Create View myNodesView As Construct { ?n geom:geometry ?g . ?g ogc:asWKT ?o } With ?n = uri(lgd:node, ?id) ?g = uri(lgd-geom:node, ?id) ?o = typedLiteral(?geom, ogc:wktLiteral) From nodes Aimed for RDF Output @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT "POINT(0 0)"^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT "POINT(1 1)"^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 117 / 252
  • SML Mapping Examples A more complex example, which demonstrates the use of an SQL mapping table and an SQL helper view. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 118 / 252
  • SML - Mapping Example II: The Goal (1/8) Input Table node_tags id k v 1 name Universitaet Leipzig 1 name:en University of Leipzig 1 amenity university 1 addr:street Augustusplatz 1 addr:city Leipzig Aimed for RDF Output @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix lgd: <http://linkedgeodata.org/triplify/> . lgd:node1 rdfs:label "Universitaet Leipzig" . lgd:node1 rdfs:label "University of Leipzig"@en . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 119 / 252
  • SML - Mapping Example II: Source Data (2/8) OSM Table node_tags id k v 1 name Universitaet Leipzig 1 name:en University of Leipzig 1 amenity university 1 addr:street Augustusplatz 1 addr:city Leipzig Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 120 / 252
  • SML - Mapping Example II: Mapping Table (3/8) OSM Table RDF Mapping Table node_tags id k v 1 name Universitaet Leipzig 1 name:en University of Leipzig 1 amenity university 1 addr:street Augustusplatz 1 addr:city Leipzig lgd_map_literal k property lang name rdfs:label name:en rdfs:label en alt_label skos:altLabel note rdfs:comment . . . . . . . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 121 / 252
  • SML - Mapping Example II: Helper View (4/8) OSM Table RDF Mapping Table node_tags id k v 1 name Universitaet Leipzig 1 name:en University of Leipzig 1 amenity university 1 addr:street Augustusplatz 1 addr:city Leipzig lgd_map_literal k property lang name rdfs:label name:en rdfs:label en alt_label skos:altLabel note rdfs:comment . . . . . . . . . Helper View lgd_node_tags_literal id property v lang 1 rdfs:label Universitaet Leipzig 1 rdfs:label University of Leipzig en . . . . . . . . . . . . SELECT id, property, v, lang FROM node_tags, lgd_map_literal WHERE node_tags.k = lgd_map_literal.k Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 122 / 252
  • SML - Mapping Example II: SML View (5/8) Logical Table SML View lgd_node_tags_literal id property v lang 1 rdfs:label Univ. L. 1 rdfs:label Univ. of L. en . . . . . . . . . . . . Create View lgd_node_tags_text As Construct { Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 123 / 252
  • SML - Mapping Example II: SML View (6/8) Logical Table SML View lgd_node_tags_literal id property v lang 1 rdfs:label Univ. L. 1 rdfs:label Univ. of L. en . . . . . . . . . . . . Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ... From lgd_node_tags_literal Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 124 / 252
  • SML - Mapping Example II: SML View (7/8) Logical Table SML View lgd_node_tags_literal id property v lang 1 rdfs:label Univ. L. 1 rdfs:label Univ. of L. en . . . . . . . . . . . . Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ?s = uri(lgd:node, ?id) ?p = uri(?property) ?o = plainLiteral(?v, ?lang) From lgd_node_tags_literal Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 125 / 252
  • SML - Mapping Example II: SML View (8/8) Logical Table SML View + lgd_node_tags_literal id property v lang 1 rdfs:label Univ. L. 1 rdfs:label Univ. of L. en . . . . . . . . . . . . Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ?s = uri(lgd:node, ?id) ?p = uri(?property) ?o = plainLiteral(?v, ?lang) From lgd_node_tags_literal Resulting RDF @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix lgd: <http://linkedgeodata.org/triplify/> . lgd:node1 rdfs:label "Universitaet Leipzig" . lgd:node1 rdfs:label "University of Leipzig"@en . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 126 / 252
  • Further Tag Mappings lgd_map_dataype k datatype seats integer unisex boolean lgd_map_property k property website foaf:homepage lgd_map_resource_k k property object highway rdf:type lgdo:HighwayThing lgd_map_resource_kv k v property object waterway river rdf:type lgdo:River Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 127 / 252
  • LGD Edit Tool Multi User Tag Mapping WebApp Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 128 / 252
  • Resources Sparqlify http://sparqlify.org LinkedGeoData http://linkedgeodata.org Tag Mappings https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/ org/aksw/linkedgeodata/sql/Mappings.sql SML View Denitions https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/ org/aksw/linkedgeodata/sml/LinkedGeoData-Triplify-IndividualViews.sml Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 129 / 252
  • Statistics (15 August 2013) Complete OSM planet le corresponds to ∼ 20.000.000.000 triples Virtual access via Sparqlify Downloads limited to selected classes. 292.780.188 Triples 153.613.243 triples of Nodes 139.166.945 triples of Ways Relations not yet available for download Among them 532.812 PlaceOfWorship 82.788 RailwayStation 72.091 Toilets 71.613 Town 19.937 City Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 130 / 252
  • Access Materialized Sparql Endpoint (based on Virtuoso DB, download datasets loaded) http://linkedgeodata.org/sparql http://linkedgeodata.org/snorql Virtual Sparql Endpoint (based on Sparqlify, access to 20B triples, limited SPARQL 1.0 support) http://linkedgeodata.org/vsparql http://linkedgeodata.org/vsnorql Rest Interface (based on the Virtual Sparql Endpoint) Supports limited queries (e.g. circular/rectangular area, ltering by labels) Downloads http://downloads.linkedgeodata.org Monthly updates on the above datasets envisioned Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 131 / 252
  • Use Cases Augmented Reality Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 132 / 252
  • Use Cases Generic Browsing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 133 / 252
  • Use Cases Generic Browsing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 134 / 252
  • Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 135 / 252
  • Why Link Discovery? 1 Fourth Linked Data principle 2 Links are central for Cross-ontology QA Data Integration Reasoning Federated Queries ... 3 2011 topology of the LOD Cloud: 31+ billion triples ≈ 0.5 billion links owl:sameAs in most cases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 136 / 252
  • Why is it dicult? 1 Time complexity Large number of triples Quadratic a-priori runtime 69 days for mapping cities from DBpedia to Geonames (1ms per comparison) decades for linking DBpedia and LGD . . . Denition (Link Discovery) Given sets S and T of resources and relation R Task: Find M = {(s, t) ∈ S × T : R(s, t)} Common approaches: Find M = {(s, t) ∈ S × T : σ(s, t) ≥ θ} Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 137 / 252
  • Why is it dicult? 2 Complexity of specications Combination of several attributes required for high precision Tedious discovery of most adequate mapping Dataset-dependent similarity functions Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 138 / 252
  • LIMES Framework Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 139 / 252
  • Runtime Optimization Reduce the number of comparisons C (A) ≥ |M | (assuming we need all σ/θ values for links) Maximize reduction ratio: RR(A) = 1 − C (A) |S||T | Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 140 / 252
  • Runtime Optimization Reduce the number of comparisons C (A) ≥ |M | (assuming we need all σ/θ values for links) Maximize reduction ratio: RR(A) = 1 − C (A) |S||T | Question Can we devise lossless approaches with guaranteed RR? Advantages Space management Runtime prediction Resource scheduling Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 140 / 252
  • RR Guarantee Best achievable reduction ratio: RRmax = 1 − |M | |S||T | Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 141 / 252
  • RR Guarantee Best achievable reduction ratio: RRmax = 1 − |M | |S||T | Approach H(α) fullls RR guarantee criterion, i: ∀r < RRmax, ∃α : RR(H(α)) ≥ r Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 141 / 252
  • RR Guarantee Best achievable reduction ratio: RRmax = 1 − |M | |S||T | Approach H(α) fullls RR guarantee criterion, i: ∀r < RRmax, ∃α : RR(H(α)) ≥ r Here, we use relative reduction ratio (RRR): RRR(A) = RRmax RR(A) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 141 / 252
  • Goal Formal Goal Devise H(α) : ∀r > 1, ∃α : RRR(H(α)) ≤ r Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 142 / 252
  • Restrictions Minkowski Distance δ(s, t) = p n i=1 |si − ti |p , p ≥ 2 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 143 / 252
  • Space Tiling HYPPO δ(s, t) ≤ θ describes a hypersphere Approximate hypersphere by using a hypercube Easy to compute No loss of recall (blocking) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 144 / 252
  • Space Tiling Set width of single hypercube to ∆ = θ/α Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 145 / 252
  • Space Tiling Set width of single hypercube to ∆ = θ/α Tile Ω = S ∪ T into the adjacent cubes C Coordinates: (c1, . . . , cn ) ∈ Nn Contains points ω ∈ Ω : ∀i ∈ {1 . . . n}, ci ∆ ≤ ωi < (ci + 1)∆ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 145 / 252
  • Space Tiling Set width of single hypercube to ∆ = θ/α Tile Ω = S ∪ T into the adjacent cubes C Coordinates: (c1, . . . , cn ) ∈ Nn Contains points ω ∈ Ω : ∀i ∈ {1 . . . n}, ci ∆ ≤ ωi < (ci + 1)∆ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 145 / 252
  • HYPPO Combine (2α + 1)n hypercubes around C (ω) to approximate hypersphere RRR(HYPPO(α)) = (2α+1)n αnS(n) lim α→∞ RRR(HYPPO(α)) = 2n S(n) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 146 / 252
  • HYPPO RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 147 / 252
  • HYPPO RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50 lim α→∞ RRR(HYPPO(α)) = 4 π ≈ 1.27 (n = 2) lim α→∞ RRR(HYPPO(α)) = 6 π ≈ 1.91 (n = 3) lim α→∞ RRR(HYPPO(α)) = 32 π2 ≈ 3.24 (n = 4) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 147 / 252
  • HR3 : Idea index(C , ω) =    0 if ∃i : |ci − c(ω)i | ≤ 1, 1 ≤ i ≤ n, n i=1 (|ci − c(ω)i | − 1)p else, Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 148 / 252
  • HR3 : Idea Compare C (ω) with C i index(C , ω) ≤ αp α = 4, p = 2 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 149 / 252
  • HR3 : Idea Lemma ∀s ∈ S : index(C , s) > αp implies that all t ∈ C are non-matches Claims No loss of recall lim α→∞ RRR(HR3(α)) = 1 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 150 / 252
  • HR3 : Lemma 3 Lemma ∀α > 1 RRR(HR3(2α)) < RRR(HR3(α)) p = 2, α = 4 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 151 / 252
  • HR3 : Proof Lemma ∀α > 1 RRR(HR 3(2α)) < RRR(HR 3(α)) p = 2, α = 8 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 152 / 252
  • HR3 : Proof Lemma ∀α > 1 RRR(HR 3(2α)) < RRR(HR 3(α)) p = 2, α = 25 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 153 / 252
  • HR3 : Proof Lemma ∀α > 1 RRR(HR 3(2α)) < RRR(HR 3(α)) p = 2, α = 50 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 154 / 252
  • HR3 : Idea Theorem lim α→∞ RRR(HR3(α)) = 1 Claims No loss of recall lim α→∞ RRR(HR3(α)) = 1 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 155 / 252
  • HR3 : Experiments Compare HR3 with LIMES 0.5's HYPPO and SILK 2.5.1 Experimental Setup: Deduplicating DBpedia places by minimum elevation, elevation and maximum elevation (θ = 49m, 99m). Geonames and LinkedGeoData by longitude and latitude (θ = 1 ◦ , 9 ◦ ) 64-bit computer with a 2.8GHz i7 processor with 8GB RAM. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 156 / 252
  • HR3 : Experiments (Comparisons) Experiment 2: Deduplicating DBpedia places, θ = 99m 0.64 × 10 6 less comparisons Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 157 / 252
  • HR3 : Experiments (Comparisons) Experiment 4: Linking Geonames and LinkedGeoData, θ = 9 ◦ 4.3 × 10 6 less comparisons Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 158 / 252
  • HR3 : Experiments (Runtime) Experiment 1, 2: DBpedia, θ = 49, 99m Experiment 3, 4: Geonames and LGD, θ = 1, 9 ◦ Exp. 1 Exp. 2 Exp. 3 Exp. 4 10 0 10 1 10 2 10 3 10 4 Runtime(s) HR3 HYPPO SILK Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 159 / 252
  • HR3 : Summary Mission New category of algorithms for link discovery Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 160 / 252
  • HR3 : Summary Mission New category of algorithms for link discovery Presented HR3 Link discovery in ane spaces with Minkowski measures Outperforms the state of the art (runtime, comparisons) Optimal reduction ratio Integrated in LIMES Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 160 / 252
  • Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  • Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  • Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  • Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  • Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  • Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  • Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  • Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Question Can we do better by using more information? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  • Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Question Can we do better by using more information? Higher F-measure Often slower Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  • Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  • Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  • Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  • Similarity of Candidates Link candidate x = (s, t) can be regarded as vector (σ1(x), . . . , σn(x)) ∈ [0, 1]n . Similarity of link candidates x and y : sim(x, y ) = 1 1 + n i=1 (σi (x) − σi (y ))2 . (1) Allows exploiting both intra- and inter-class similarity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 165 / 252
  • Graph Clustering Rationale: Use intra-class similarity Approach Cluster elements of S + and S − independently Choose one element per cluster as representative Present oracle with most informative representatives 0.8 0.9 0.8 S+ S- 0.8 0.9 0.8 0.25 0.25 0.9 0.8 0.8 0.8 0.25 a b c d e d f g h i k l Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 166 / 252
  • BorderFlow G = (V , E , ω) with V = S + or V = S − ω(x, y ) = sim(x, y ) Keep best ec edges for each x ∈ V Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 167 / 252
  • BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X ) Ω(b(X ),n(X )) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 168 / 252
  • BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X ) Ω(b(X ),n(X )) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 168 / 252
  • BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X ) Ω(b(X ),n(X )) http://sourceforge.net/projects/cugar-framework/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 168 / 252
  • BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X ) Ω(b(X ),n(X )) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 169 / 252
  • BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X ) Ω(b(X ),n(X )) http://sourceforge.net/projects/cugar-framework/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 169 / 252
  • Conclusion Can be combined with arbitrary active learning ML algorithms Was experimentally combined with EAGLE (genetic programming) and RAVEN (linear classier) and shown to outperform the plain informativeness function in terms of F-measure Choice of example important to minimise user eort Contact me for detailed experimental results Longer runtimes (up to 2×) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 170 / 252
  • Summary Linking crucial task in the web of data Tow key problems 1 Ecient execution of link specications 2 Creation of link specication Presented HR3 to handle the rst problem Presented COALA as building block for the second problem Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 171 / 252
  • Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 172 / 252
  • Motivation rise in the availability and usage of knowledge bases still a lack of knowledge bases that consist of sophisticated schema information and instance data adhering to this schema e.g. in the life sciences several knowledge bases only consist of schema information to a large extent, a collection of facts without a clear structure (e.g. information extracted from databases) combination of sophisticated schema and instance data would allow powerful reasoning, consistency checking, and improved querying → create schemata based on existing data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 173 / 252
  • Example d b r : B r a d _ P i t t : b i r t h P l a c e d b r : Shawnee , _Oklahoma ; a : P e r s o n . d b r : Angela_Merkel : b i r t h P l a c e d b r : Hamburg ; a : P e r s o n . d b r : A l b e r t _ E i n s t e i n : b i r t h P l a c e d b r : Ulm ; a : P e r s o n . d b r : Shawnee , _Oklahoma a : P l a c e . d b r : Ulm a : P l a c e . d b r : Hamburg a : P l a c e . Suggestions: birthPlace O b j e c t P r o p e r t y : b i r t h P l a c e C h a r a c t e r i s t i c s : F u n c t i o n a l Domain : P e r s o n Range : P l a c e S u b P r o p e r t y O f : hasBeenAt Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 174 / 252
  • Benets of an expressive schema Axioms serve as documentation for the purpose and correct usage of schema elements Additional implicit information can be inferred Improve querying optimisations Improve/allow the application of schema debugging techniques Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 175 / 252
  • Each person was only born at one place?! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 176 / 252
  • birthPlace birthPlace
  • birthPlace birthPlace =
  • birthPlace birthPlace = birthPlace is functional
  • birthPlace birthPlace = birthPlace is functional
  • birthPlace birthPlace = birthPlace is functional SELECT ? s WHERE { ? s dbo : b i r t h P l a c e ?o1 . ? s dbo : b i r t h P l a c e ?o2 . FILTER(? o1 != ?o2 )} } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 177 / 252
  • Where was Julia Nannie Wallace born? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 178 / 252
  • Julia Nannie Wallace was born in Lacrosse? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 179 / 252
  • No, Julia Nannie Wallace was born in La Crosse! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 180 / 252
  • birthPlace
  • birthPlace Sport rdf:type
  • birthPlace Sport rdf:type birthPlace range Place
  • birthPlace Sport rdf:type birthPlace range Place Place rdf:type
  • birthPlace Sport rdf:type birthPlace range Place Place rdf:type Place disjointWith Sport =
  • birthPlace Sport rdf:type birthPlace range Place Place rdf:type Place disjointWith Sport =
  • birthPlace rdf:type birthPlace range Place Place rdf:type Place disjointWith Sport City
  • birthPlace rdf:type birthPlace range Place Place rdf:type Place disjointWith Sport City SELECT ? s ? place WHERE { ? s dbo : b i r t h P l a c e ? place . ? place rdf : type / r d f s : subClassOf ∗ ? type1 . ? type2 r d f s : subClassOf ∗ dbo : Place . ? type1 owl : d i s j o i n t W i t h ? type2 . } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 181 / 252
  • 3 Steps to get a schema SPARQL Endpoint Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) 3-Phase Enrichment Learning Approach: Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 182 / 252
  • 3 Steps to get a schema 1. obtain schema information SPARQL Endpoint Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) Background Knowledge 3-Phase Enrichment Learning Approach: (onlyexecutedonce perknowledgebase) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 183 / 252
  • 3 Steps to get a schema 1. obtain schema information Reasoner SPARQL Endpoint Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) Background Knowledge Background Knowledge + Relevant Instance Data (optional invocation) 2. obtain axiom type and entity specific data 3-Phase Enrichment Learning Approach: (onlyexecutedonce perknowledgebase) (sampledata ifnecessary) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 184 / 252
  • 3 Steps to get a schema 1. obtain schema information Reasoner SPARQL Endpoint Enrichment Ontology Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) Background Knowledge Background Knowledge + Relevant Instance Data List of Axiom Suggestions + Metadata (optional invocation) 2. obtain axiom type and entity specific data 3. run machine learning algorithm 3-Phase Enrichment Learning Approach: (onlyexecutedonce perknowledgebase) (sampledata ifnecessary) Learner DL-Learner Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 185 / 252
  • 3 Steps to get a schema 1. obtain schema information Reasoner SPARQL Endpoint Enrichment Ontology Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) Background Knowledge Background Knowledge + Relevant Instance Data List of Axiom Suggestions + Metadata (optional invocation) 2. obtain axiom type and entity specific data 3. run machine learning algorithm 3-Phase Enrichment Learning Approach: (onlyexecutedonce perknowledgebase) iterate over all axiom types and schema entities for full enrichment (sampledata ifnecessary) Learner DL-Learner Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 186 / 252
  • Starting Point SPARQL endpoint: http://dbpedia.org/sparql Entity URI: http://dbpedia.org/ontology/author Axiom Type: Object Property Domain Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 187 / 252
  • Step 1 - Obtaining Schema Information Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
  • Step 1 - Obtaining Schema Information CONSTRUCT WHERE { ?sub r d f s : subClassOf ?sup . } ORDER BY DESC(? sub ) LIMIT 1000 OFFSET 1000 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
  • Step 1 - Obtaining Schema Information CONSTRUCT WHERE { ?sub r d f s : subClassOf ?sup . } ORDER BY DESC(? sub ) LIMIT 1000 OFFSET 1000 dbo : Disease r d f s : subClassOf owl : Thing . dbo : Book r d f s : subClassOf dbo : WrittenWork . dbo : WrittenWork r d f s : subClassOf dbo : Work . dbo : Work r d f s : subClassOf owl : Thing . dbo : Philosopher r d f s : subClassOf dbo : Person . dbo : Person r d f s : subClassOf dbo : Agent . dbo : Agent r d f s : subClassOf owl : Thing . dbo : Sport r d f s : subClassOf dbo : A c t i v i t y . dbo : A c t i v i t y r d f s : subClassOf owl : Thing . dbo : Fish r d f s : subClassOf dbo : Animal . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
  • Step 2 - Obtain axiom type and entity specic data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  • Step 2 - Obtain axiom type and entity specic data SELECT ? type (COUNT(DISTINCT ? s ) AS ? cnt ) WHERE { ? s dbo : author ?o . ? s a ? type . } GROUP BY ? type ORDER BY DESC(? cnt ) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  • Step 2 - Obtain axiom type and entity specic data SELECT ? type (COUNT(DISTINCT ? s ) AS ? cnt ) WHERE { ? s dbo : author ?o . ? s a ? type . } GROUP BY ? type ORDER BY DESC(? cnt ) type cnt owl:Thing 30284 dbo:Work 30284 schema:CreativeWork 30284 dbo:WrittenWork 25730 dbo:Book 24673 schema:Book 24673 dbo:TelevisionShow 2567 dbo:Play 1057 . . . . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  • Step 2 - Obtain axiom type and entity specic data CONSTRUCT WHERE { ? ind dbo : author ?o . ? ind a ? type . } ORDER BY DESC(? ind ) LIMIT 1000 OFFSET 2000 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  • Step 2 - Obtain axiom type and entity specic data CONSTRUCT WHERE { ? ind dbo : author ?o . ? ind a ? type . } ORDER BY DESC(? ind ) LIMIT 1000 OFFSET 2000 ... dbpedia : The_Adventures_of_Tom_Sawyer dbo : author dbpedia : Mark_Twain ; rdf : type dbo : Book . dbpedia : The_Zombie_Survival_Guide dbo : author dbpedia : Max_Brooks ; rdf : type dbo : WrittenWork . dbpedia : Web_Therapy dbo : author dbpedia : Lisa_Kudrow ; rdf : type dbo : Book . ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  • Step 3 - Scoring dbpedia : The_Adventures_of_Tom_Sawyer dbo : author dbpedia : Mark_Twain ; r d f : type dbo : Book . dbpedia : The_Zombie_Survival_Guide dbo : author dbpedia : Max_Brooks ; r d f : type dbo : WrittenWork . dbpedia : Web_Therapy dbo : author dbpedia : Lisa_Kudrow ; r d f : type dbo : Book . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  • Step 3 - Scoring dbpedia : The_Adventures_of_Tom_Sawyer dbo : author dbpedia : Mark_Twain ; r d f : type dbo : Book . dbpedia : The_Zombie_Survival_Guide dbo : author dbpedia : Max_Brooks ; r d f : type dbo : WrittenWork . dbpedia : Web_Therapy dbo : author dbpedia : Lisa_Kudrow ; r d f : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= 1 3 ≈ 33.3% Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  • Step 3 - Scoring dbpedia : The_Adventures_of_Tom_Sawyer dbo : author dbpedia : Mark_Twain ; r d f : type dbo : Book . dbpedia : The_Zombie_Survival_Guide dbo : author dbpedia : Max_Brooks ; r d f : type dbo : WrittenWork . dbpedia : Web_Therapy dbo : author dbpedia : Lisa_Kudrow ; r d f : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= 1 3 ≈ 33.3% dbo : Book r d f s : subClassOf dbo : WrittenWork . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  • Step 3 - Scoring dbpedia : The_Adventures_of_Tom_Sawyer dbo : author dbpedia : Mark_Twain ; r d f : type dbo : Book . dbpedia : The_Zombie_Survival_Guide dbo : author dbpedia : Max_Brooks ; r d f : type dbo : WrittenWork . dbpedia : Web_Therapy dbo : author dbpedia : Lisa_Kudrow ; r d f : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= 1 3 ≈ 33.3% dbo : Book r d f s : subClassOf dbo : WrittenWork . Score(Domain(dbo:author, dbo:WrittenWork))= 3 3 = 100% Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  • Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  • Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Solution: Average of 95% condence interval (Wald method) p = s+2 m+4 s − #success m − #total min(1, p + 1.96 · p ·(1−p ) m+4 ) max(0, p − 1.96 · p ·(1−p ) m+4 ) In 95% of the intervals the true value is between ... and ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  • Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Solution: Average of 95% condence interval (Wald method) p = s+2 m+4 s − #success m − #total min(1, p + 1.96 · p ·(1−p ) m+4 ) max(0, p − 1.96 · p ·(1−p ) m+4 ) In 95% of the intervals the true value is between ... and ... Score(Domain(dbo:author, dbo:Book))≈ 57.3% Score(Domain(dbo:author, dbo:WrittenWork))≈ 69.1% Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  • More Complex Axioms "Pattern Based Knowledge Base Enrichment", ISWC 2013 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 192 / 252
  • Outlook and Summary Schema in the Linked Data Web often shallow → tools needed to support knowledge engineers Showed some techniques for learning OWL axioms on large knowledge bases available as SPARQL endpoints More complex aioms require: OWL-SPARQL rewriting or Fragment extraction Small- and medium sized knowledge bases can be handled via techniques from Inductive Logic Programming All algorithms implemented in DL-Learner framework (http://dl-learner.org) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 193 / 252
  • Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 194 / 252
  • Motivation increasing number of knowledge bases in the Semantic Web (see e.g. LOD cloud) maintenance of knowledge bases with expressive semantics is challenging Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 195 / 252
  • (Automatically) Detectable Ontology Problems Common problems: Syntactic Problems Structural Problems Semantic Problems (focus of talk) Task Based Problems: Reasoning Related Problems Linked Data Related Problems Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 196 / 252
  • Syntactic Problems Syntactic errors are mainly violations of conventions of the language in which the ontology is modelled. Example (Validity of XML) <? xml v e r s i o n=" 1 . 0 " ?> <r d f : R D F x m l n s : r d f=" h t t p : //www. w3 . o r g /1999/02/22 − r d f − s y n t a x −n s#" x m l n s : d c=" h t t p : // p u r l . o r g / dc / e l e m e n t s / 1 . 1 / "> < r d f : D e s c r i p t i o n r d f : a b o u t=" h t t p : //www. w3 . o r g / "> < d c : t i t l e>World Wide Web C o n s o r t i u m</ d c : t i t l e> </ r d f : R D F> FatalError: The element type rdf:Description must be terminated by the matching end-tag </rdf:Description>.[Line = 7, Column = 3] Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 197 / 252
  • Structural Problems Problems in the taxonomy Example (Circularities) A B, B C , C A Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 198 / 252
  • Reasoning Related Problems Problems which negatively aect the performance of reasoning over expressive knowledge bases Example (A named concept is equivalent to an AllValues restriction) A ≡ ∀r .C Reasoning complexity: Universal restriction does not require to have a property value but only restricts the values for existing property values Any concept B for which instances cannot have r -llers satises the restriction, i.e. B ∀r .C , and becomes a subclass of A Typically leads to unintended inferences and additional inferences may eventually slow down reasoning performance Can be checked via Pellint (part of Pellet) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 199 / 252
  • Linked Data Related Problems Problems which are the specic to publishing RDF using the Linked Data principles Incorrect implementation of content negotiation Mixing up information and non-information resources Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 200 / 252
  • Semantic Problems Logical contradictions in the underlying knowledge base Example (Unsatisable classes) O = {A B C , C ¬B} |= A ⊥ Example (Inconsistent ontology) O = {A B C , C ¬B, A(x)} |= ⊥ Usually handled by Ontology Debugging Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 201 / 252
  • Ontology Debugging Problem: We have undesirable entailments Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  • Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  • Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Question: Which axioms? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  • Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Question: Which axioms? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  • Justication Justication For an ontology O and an entailment η where O |= η, a set of axioms J is a justication for η in O if J ⊆ O, J |= η and if J ⊂ J then J |= η. Minimal subsets of an ontology that are sucient for a given entailment to hold Synonyms: MUPS (Minimal Unsatisability Preserving Sub-TBoxes), MinAs (Minimal Axiom sets), Kernels Observations: there can be multiple justications for a single entailment an axiom can be part of multiple justications Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 203 / 252
  • Justication - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  • Justication - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  • Justication - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  • Justication - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  • Justication - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  • Justication Based Repair For a repair, at least one axiom from every justication needs to be removed. For a repair plan, all justications are needed. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 205 / 252
  • Justication Algorithms Single justication: Glass Box: Modifying underlying reasoning algorithm (tableau tracing) Black-Box: Using reasoner as oracle All justications: Reiter's Hitting Set Tree Algorithm (HST) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 206 / 252
  • Black-Box Expansion-Contraction Strategy Expansion: Add axioms to empty set until entailment holds Contraction: Remove axioms from set such that set becomes minimal and entailment still can be derived. CHAPTER 3. COMPUTING JUSTIFICATIONS 54 Expansion Contraction Axiom Axiom in justification Selected axiom Key: Figure 3.1: A Depiction of a Black-Box Expand-Contract Strategy 3.2 Black-Box Algorithms for Computing Sin- Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 207 / 252 Source: M. Horridge:Justication Based Explanation in Ontologies(PhD Thesis)
  • Hitting Set Tree Algorithm from eld of Model Based Diagnosis given a faulty system (ontology), it constructs nite tree whose nodes are labelled with conict sets (justications), and whose edges are labelled with components (axioms) nds all minimal hitting sets, which represent diagnoses for the conict sets in the system diagnosis = repair Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 208 / 252
  • Hitting Set Tree Algorithm - Example CHAPTER 3. COMPUTING JUSTIFICATIONS 63 Figure 3.2: An Example of a Hitting Set Tree J1 = {A B, B D} A B A ∃R.C {} B D {} A ∃R.C {} {} J2 = {A ∃R.C, ∃R. D} ∃R. D∃R. D J2 = {A ∃R.C, ∃R. D} bottom right hand successor to the node labelled with J2 and whose successor edge is labelled with ∃R. D was generated by considering O S whereLehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 209 / 252 O = {A B B D A ∃R.C ∃R. D} |= A D Source: M. Horridge:Justication Based Explanation in Ontologies(PhD Thesis)
  • Justication Scenarios A user can be faced with the following situations: Small number of small justications Easy and pleasant to inspect Small number of large justications Better than alternatives Large number of justications Pretty hopeless with current mechanisms Idea: Find source of unsatisability Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 210 / 252
  • Root Unsatisability - Denitions A root UC is a class whose unsatisability does not depend on another class, otherwise it is a derived UC. A derived UC for which there is some justication that is not a strict superset of a justication for another UC is a partial derived UC. Root Unsatisable Class A class A is a root unsatisable class if there is no justication J |= A ⊥ such that J is a strict superset of a justication for some other unsatisable class. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 211 / 252
  • Root Unsatisability - Approaches Approaches: 1: compute all justications for each unsatisable class and apply the denition → computationally often too expensive 2: heuristics for structural analysis of axioms Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 212 / 252 Debugging Unsatisable Classes in OWL Ontologies, Kalyanpur, Parsia, Sirin, Hendler, J. Web Sem, 2005.
  • Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} |= B ⊥ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} |= B ⊥ J4 = {1, 2} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} |= B ⊥ J4 = {1, 2} root Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} |= B ⊥ J4 = {1, 2} root partial Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} |= B ⊥ J4 = {1, 2} root partial (J4 ⊂ J1) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • Axiom Relevance resolving justication requires to delete or edit axioms ranking methods highlight the most probable causes for problems methods: frequency syntactic relevance semantic relevance Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 214 / 252
  • Repair Consequences after repairing process, axioms have been deleted or modied → desired entailments may be lost or new entailments obtained (including inconsistencies!) → user can decide to preserve them Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 215 / 252
  • SPARQL Endpoint Support Previously mentioned approaches are implemented in the ORE tool (http://ore-tool.net) ORE supports using SPARQL endpoints implements an incremental load procedure knowledge base is loaded in small chunks: count number of axioms by type priority based loading procedure e.g. disjointness axioms have higher priority than class assertion axioms uses Pellet incremental reasoning Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 216 / 252 Learning of OWL Class Descriptions on Very Large Knowledge Bases, Hellmann, Lehmann, Auer, Int. Journal Semantic Web Inf. Syst, 2009
  • SPARQL Endpoint Support II algorithm performs sanity checks, e.g. SPARQL queries which probe for typical inconsistent axiom sets can fetch additional Linked Data dierent termination criteria Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 217 / 252
  • SPARQL Endpoint Support II algorithm performs sanity checks, e.g. SPARQL queries which probe for typical inconsistent axiom sets can fetch additional Linked Data dierent termination criteria overall: ORE allows to apply state-of-the-art ontology debugging methods on a larger scale than was possible previously aims at stronger support for the web aspect of the Semantic Web and the high popularity of Web of Data initiative Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 217 / 252
  • DBpedia Live Demo Inconsistency in DBpedia Live: Individual: dbr:Purify_(album) Facts: dbo:artist dbr:Axis_of_Advance Individual: dbr:Axis_of_Advance Types: dbo:Organisation Class: dbo:Organisation DisjointWith dbo:Person ObjectProperty: dbo:artist Range: dbo:Person Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 218 / 252
  • DBpedia Live Demo 2 Inconsistency in DBpedia in combination with WGS84 (Linked Data): Individual: dbr:WKWS Facts: geo:long -81.76833343505859 Types: dbo:Organisation DataProperty: geo:long Domain: geo:SpatialThing Class: dbo:Organisation DisjointWith: geo:SpatialThing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 219 / 252
  • OpenCyc Demo Inconsistency in OpenCyc: Individual: 'PopulatedPlace' Types: 'ArtifactualFeatureType', 'ExistingStuffType' Class: 'ArtifactualFeatureType' SubClassOf: 'ExistingObjectType' Class: 'ExistingObjectType' DisjointWith: 'ExistingStuffType' Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 220 / 252
  • ORE - Screenshot Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 221 / 252
  • ORE - Screenshot Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 222 / 252
  • Related Tools Swoop can compute justications for unsatisability of classes and oers repair mode ne-grained justication computation algorithm is incomplete can also compute justications for an inconsistent ontology, but does not oer repair mode in this case does not extract locality-based modules, which leads to lower performance for large ontologies RaDON plugin for the NeOn toolkit oers a number of techniques for working with inconsistent or incoherent ontologies allows to reason with inconsistent ontologies and can handle sets of ontologies (ontology networks) no ne-grained justications, no repair impact analysis Pellint searches for common patterns which lead to potential reasoning performance problems integration in ORE planned Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 223 / 252
  • Related Tools II PION and DION developed in the SEKT project to deal with inconsistencies PION is an inconsistency tolerant reasoner (four-valued paraconsistent logic) DION oers the possibility to compute justications, but no repair Explanation Workbench Protégé plugin for reasoner requests like class unsatisability or inferred subsumption relations can compute regular and laconic justications motivated the ORE debugging interface current version of Explanation Workbench does not allow to remove axioms in laconic justications RepairTab supports the user in nding and detecting errors in ontologies RepairTab uses a modied tableau algorithm shows inferences which can no longer be drawn after removing an axiom (inspired ORE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 224 / 252
  • Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 225 / 252
  • Motivation User Query Interfaces: Knowledge Base Specic Interfaces Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 226 / 252
  • Motivation User Query Interfaces: Knowledge Base Specic Interfaces Facet-Based Browsing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 226 / 252
  • Motivation User Query Interfaces: Knowledge Base Specic Interfaces Facet-Based Browsing Visual SPARQL Query Builders Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 226 / 252
  • Motivation User Query Interfaces: Knowledge Base Specic Interfaces Facet-Based Browsing Visual SPARQL Query Builders Question Answering Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 226 / 252
  • Motivation User Query Interfaces: Knowledge Base Specic Interfaces Facet-Based Browsing Visual SPARQL Query Builders Question Answering Which tools for creating (SPARQL) queries are end user friendly? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 226 / 252
  • Strengths of Weaknesses of Query Interfaces Easy to Use Robust Flexible Queries Expressive Knowledge Base Specific Facet-Based Browsing Visual SPARQL Query Builders Question Answering Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 227 / 252
  • Strengths of Weaknesses of Query Interfaces Easy to Use Robust Flexible Queries Expressive Knowledge Base Specific Facet-Based Browsing Visual SPARQL Query Builders Question Answering Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 227 / 252
  • Strengths of Weaknesses of Query Interfaces Easy to Use Robust Flexible Queries Expressive Knowledge Base Specific Facet-Based Browsing Visual SPARQL Query Builders Question Answering Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 227 / 252
  • Strengths of Weaknesses of Query Interfaces Easy to Use Robust Flexible Queries Expressive Knowledge Base Specific Facet-Based Browsing Visual SPARQL Query Builders Question Answering Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 227 / 252
  • Faceted Browsing Simple way to browse structured information User starts with all resources and then drills down via facets Multiple dimensions can be supported for facets, e.g. taxonomy, existence of properties, values of properties Can be combined with text search: previously information was browsed either via a xed classication scheme or text search (with the latter being dominant)  facet based browsing allows a combination of both Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 228 / 252
  • Facete Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 229 / 252
  • Facete Generic Facet-Based Browser RDF properties are facets (sub-facets are supported) Each facete serves as source for columns (table rendering), points of interest (map rendering) JavaScript implementation - SPARQL queries are done by the client Each SPARQL endpoint can serve as backend, no API needs to be implemented Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 230 / 252
  • Question Answering - State of the art 1 Map natural language question to triple-based representation. 2 Match triple-based representation against RDF data. Example: Where did Abraham Lincoln die? SELECT ?x WHERE { res:Abraham_Lincoln dbo:deathPlace ?x . } PowerAqua: Triple representation: state/place, die, Abraham Lincoln Ontology mappings: Place, deathPlace, Abraham_Lincoln Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 231 / 252
  • Problem Triples do not always provide a faithful representation of the semantic structure of the question. Thus more expressive queries cannot be answered. Example 1: Which cities have more than three universities? SELECT ?y WHERE { ?x rdf:type dbo:University . ?x dbo:city ?y . } HAVING (COUNT(?x) > 3) PowerAqua (triple representation): cities, more than, universities three Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 232 / 252
  • Problem Triples do not always provide a faithful representation of the semantic structure of the question. Thus more expressive queries cannot be answered. Example 2: Who produced the most lms? SELECT ?y WHERE { ?x rdf:type dbo:Film . ?x dbo:producer ?y . } ORDER BY DESC(COUNT(?x)) LIMIT 1 PowerAqua (triple representation): person/organization, produced, most lms Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 232 / 252
  • Goal In order to understand a user question, we need to understand: The words Find a mapping from natural language expressions to ontology concepts. Abraham Lincoln → res:Abraham_Lincoln died in → dbo:deathPlace The semantic structure Determine the triple structure as well as lters and aggregation functions (order by, count, etc.). the most N → ODER BY DESC(COUNT(?n)) LIMIT 1 more than three N → HAVING (COUNT(?n) > 3) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 233 / 252
  • Goal In order to understand a user question, we need to understand: The words Find a mapping from natural language expressions to ontology concepts. Abraham Lincoln → res:Abraham_Lincoln died in → dbo:deathPlace The semantic structure Determine the triple structure as well as lters and aggregation functions (order by, count, etc.). the most N → ODER BY DESC(COUNT(?n)) LIMIT 1 more than three N → HAVING (COUNT(?n) > 3) Goal: an approach that combines both an analysis of the semantic structure and a mapping of words to URIs Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 233 / 252
  • Templated-based question answering 1 Template generation (Understanding the semantic structure) Parse question to produce a SPARQL template that directly mirrors the structure of the question, including lters and aggregation operations. 2 Template instantiation (Understanding the words) Instantiate SPARQL template by matching natural language expressions with ontology concepts using statistical entity identication and predicate detection. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 234 / 252
  • Example: Who produced the most lms? 1 SPARQL template: SELECT ?x WHERE { ?y rdf:type ?c . ?y ?p ?x . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [lms] ?p PROPERTY [produced] 2 Instantiations: ?c = <http://dbpedia.org/ontology/Film> ?p = <http://dbpedia.org/ontology/producer> Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 235 / 252
  • Architecture Natural Language Question Semantic Representaion SPARQL Query Templates Templates with URI slots Ranked SPARQL Queries Answer LOD Entity identification Entity and Query Ranking Query Selection Resources and Classes SPARQL Endpoint Type Checking and Prominence BOA Pattern Library Properties Tagged Question Domain Independent Lexicon Domain Dependent Lexicon Parsing Corpora? ! Loading State Process Uses Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 236 / 252
  • Step 1: Template generation - Linguistic processing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 237 / 252
  • Step 1: Template generation - Linguistic processing 1 Natural language question is tagged with part-of-speech information. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 237 / 252
  • Step 1: Template generation - Linguistic processing 2 Based on POS tags, lexical entries are built on the y. Lexical entries are pairs of: tree structures (Lexicalized Tree Adjoining Grammar) semantic representations (ext. Discourse Representation Structures) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 237 / 252
  • Step 1: Template generation - Linguistic processing 3 These lexical entries, together with domain-independent lexical entries, are used for parsing the question. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 237 / 252
  • Step 1: Template generation - Linguistic processing 4 The resulting semantic representation is translated into a SPARQL template. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 237 / 252
  • Example: Who produced the most lms? domain-independent: who, the most domain-dependent: produced/VBD, lms/NNS SPARQL template 1: SELECT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [lms] ?p PROPERTY [produced] Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 238 / 252
  • Example: Who produced the most lms? domain-independent: who , the most domain-dependent: produced/VBD, lms/NNS SPARQL template 1: SELECT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [lms] ?p PROPERTY [produced] Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 238 / 252
  • Example: Who produced the most lms? domain-independent: who, the most domain-dependent: produced/VBD , lms/NNS SPARQL template 1: SELECT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [lms] ?p PROPERTY [produced] Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 238 / 252
  • Example: Who produced the most lms? domain-independent: who, the most domain-dependent: produced/VBD, lms/NNS SPARQL template 2: SELECT ?x WHERE { ?x ?p ?y . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?p PROPERTY [lms] Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 238 / 252
  • Step 2: Template instantiation - Entity identication Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 239 / 252
  • Step 2: Template instantiation - Entity identication 1 For resources and classes: Identify synonyms of the label using WordNet. Retrieve entities with a label similar to the slot label based on string similarities (trigram, Levenshtein, substring). Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 239 / 252
  • Step 2: Template instantiation - Entity identication 2 For property labels, the label is additionally compared to natural language expressions stored in the BOA pattern library. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 239 / 252
  • Step 2: Template instantiation - Entity identication 3 The highest ranking entities are returned as candidates for lling the query slots. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 239 / 252
  • BOA The BOA pattern library is a repository of natural language representations of Semantic Web predicates. Idea: For each predicate P in a data repository (e.g. DBpedia), collect the set of entities S and O connected through P. Search a text corpus (e.g. Wikipedia) for all sentences containing the labels of S and O. For all retrieved sentences, the natural language predicate is a potential pattern for P. The potential patterns are then scored by a neural network (e.g. according to frequency) and ltered. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 240 / 252
  • BOA: Example Predicate: http://dbpedia.org/ontology/subsidiary RDF snippet: <http://dbpedia.org/resource/Google> <http://dbpedia.org/ontology/subsidiary> <http://dbpedia.org/resource/YouTube> . <http://dbpedia.org/resource/Google> rdfs:label `Google'@en . <http://dbpedia.org/resource/YouTube> rdfs:label `Youtube'@en . Sentences: Google's acquisition of Youtube comes as online video is really starting to hit its stride. Youtube, a division of Google, is exploring a new way to get more high-quality clips on its site: nancing amateur video creators. Patterns: subsidiary: S 's acquisition of O subsidiary: O, a division of S Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 241 / 252
  • BOA The use of BOA patterns allows us to match natural language expressions and ontology concepts even if they are not string similar and not covered by WordNet. Examples: married to → http://dbpedia.org/ontology/spouse was born in → http://dbpedia.org/ontology/birthPlace graduated from → http://dbpedia.org/ontology/almaMater write → http://dbpedia.org/ontology/author Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 242 / 252
  • Example: Who produced the most lms? Candidates for lling query slots: ?c CLASS [lms] <http://dbpedia.org/ontology/Film> <http://dbpedia.org/ontology/FilmFestival> . . . ?p PROPERTY [produced] <http://dbpedia.org/ontology/producer> <http://dbpedia.org/property/producer> <http://dbpedia.org/ontology/wineProduced> . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 243 / 252
  • Step 3: Query ranking and selection Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 244 / 252
  • Step 3: Query ranking and selection 1 Every entity receives a score considering string similarity and prominence (roughly how often it occurs in the knowledge base). 2 The score of a query is then computed as the average of the scores of the entities used to ll its slots. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 244 / 252
  • Step 3: Query ranking and selection 3 In addition, type checks are performed: For all triples ?x rdf:type <class>, all query tripels ?x p e and e p ?x are checked w.r.t. whether domain/range of p is consistent with <class>. (If not, the query is rejected.) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 244 / 252
  • Step 3: Query ranking and selection 4 Of the remaining queries, the one with highest score that returns a result is chosen to retrieve an answer. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 244 / 252
  • Example: Who produced the most lms? SELECT ?x WHERE { ?x <http://dbpedia.org/ontology/producer> ?y . ?y rdf:type <http://dbpedia.org/ontology/Film> . } ORDER BY DESC(COUNT(?y)) LIMIT 1 Score: 0.7592425075864263 SELECT ?x WHERE { ?x <http://dbpedia.org/ontology/film> ?y . } ORDER BY DESC(COUNT(?y)) LIMIT 1 Score: 0.6264001353183296 SELECT ?x WHERE { ?x <http://dbpedia.org/ontology/producer> ?y . ?y rdf:type <http://dbpedia.org/ontology/FilmFestival>. } ORDER BY DESC(COUNT(?y)) LIMIT 1 Score: 0.6012584940627768 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 245 / 252
  • Evaluation set-up Question set: 39 DBpedia training questions from QALD-1 The other 11 questions rely on namespaces which were not incorporated in predicate detection (FOAF and YAGO). POS tags were idealized, in order to exclude tagging errors. Evaluation measures: Precision = number of correct resources returned by system number of resources returned by system Recall = number of correct resources returned by system number of resources in gold standard answer Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 246 / 252
  • Results Of the 39 questions. . . 5 could not be parsed due to unknown syntactic constructions or uncovered domain-independent expressions 19 were answered exactly as required by the benchmark (with precision and recall 1.0) another 2 are answered almost correctly (with precision and recall greater than 0.8) Mean precision: 0.61 Mean recall: 0.63 F-measure: 0.62 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 247 / 252
  • Discussion: Main sources of error Incorrect templates Template structure does not coincide with structure of the data: When did Germany join the EU? res:Germany dbp:accessioneudate ?x . Predicate detection fails inhabitants dbp:population, dbp:populationTotal owns dbo:keyPerson higher dbp:elevationM Wrong query is selected Who wrote The pillars of the Earth? res:The_Pillars_of_the_Earth_(TV_Miniseries) dbo:writer ?x . res:The_Pillars_of_the_Earth dbo:author ?x . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 248 / 252
  • Conclusion Two-step approach: 1 Build templates that capture the semantic structure of a question. map complex expressions (quantiers, comparatives, superlatives, etc.) to aggregation functions 2 Instantiate templates mapping natural language expressions to URIs. BOA patterns for cases where string similarity and WordNet are not sucient Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 249 / 252
  • Outlook Current work: Entity identication should take into account whether candidate entities actually have any connection in the dataset. Future work: Make templates less rigid and determine the exact triple structure on the basis of data exploration. The created template structure does not always coincide with how the data is actually modelled. Considering all possibilities of how the data could be modelled leads to a huge amount of templates (and even more queries) for one question. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 250 / 252
  • Links Web: http://aksw.org/Projects/AutoSPARQL Demo: http://autosparql-tbsl.dl-learner.org BOA: http://boa.aksw.org Daniel Gerber and Axel-Cyrille Ngonga Ngomo: Bootstrapping the Linked Data Web. In: Proceedings of the Web Scale Knowledge Extraction Workshop (WekEx), ISWC 2011. QALD: http://www.sc.cit-ec.uni-bielefeld.de/ild/
  • The End Jens Lehmann lehmann@informatik.uni-leipzig.de AKSW/Uni Leipzig Geo Know http://geoknow.eu Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 252 / 252