The Linked Data Life-Cycle
Jens Lehmann Lorenz Bühmann
contributors:
Quan Nguyen Sören Auer Richard Cyganiak Daniel Gerber...
Outline
1 Introduction to Linked Data
2 Linked Dataset Example: DBpedia
3 Linked Data Life-Cycle Overview
4 Knowledge Extr...
Outline
1 Introduction to Linked Data
2 Linked Dataset Example: DBpedia
3 Linked Data Life-Cycle Overview
4 Knowledge Extr...
The Linked Data Principles
The term Linked Data refers to a set of best practices for publishing and
interlinking structur...
LOD Cloud
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 5 / 252
Linked Data Principles Detailed: 1 + 2
1 URI references to identify not just Web documents and digital
content, but also r...
Principles Detailed: 3 Content Negotiation
Humans and machines should be able to retrieve appropirate
representations of r...
Principles Detailed: 3 Content Negotiation
Humans and machines should be able to retrieve appropirate
representations of r...
Principles Detailed: 3 Content Negotiation
Humans and machines should be able to retrieve appropirate
representations of r...
303 URIs
303 Redirect: instead of sending the object itself over the network,
the server responds to the client with the H...
Hash URIs
Hash URI strategy builds on characteristic that URIs may contain a
special part (fragment identier) separated fr...
Hash versus 303
Hash Uris
(+) Reduced number of necessary HTTP round-trips → reduces access
latency
(-) Descriptions of al...
Principles Detailed: 4 Links
If an RDF triple connects URIs in dierent namespaces/datasets, is is
called a link (no unique...
Why Linked Data?
Problem: Try to search for these things on the current Web:
Apartments near German-Russian bilingual chil...
Why Linked Data?
Problem: Try to search for these things on the current Web:
Apartments near German-Russian bilingual chil...
How to get there?
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 14 / 252
Tim Berners-Lee's 5-star plan
Tim Berners-Lee's 5-star plan for an open web of data
Make data available on the Web under a...
The 0th star
Data catalog with good metadata
Make your data ndable
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-C...
Data on the Web, Open License
���������� ���� ��������
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08...
Data on the Web, Open License
Open vs. Closed:
Data used to be closed by default
In the future, it may be open by default....
Data on the Web, Open License
Publishers: sharing data to make it more visible
Lehmann, Bühmann (Univ. Leipzig) The Linked...
Data on the Web, Open License
E-Commerce: Data sharing for increasing trac
Lehmann, Bühmann (Univ. Leipzig) The Linked Dat...
Data on the Web, Open License
Community: Collaboratively created databases
Lehmann, Bühmann (Univ. Leipzig) The Linked Dat...
Good reasons against opening data
Privacy
Competitive advantage
Producing data and charging for it as business model
Can't...
Structured Data
Enabling re-use:
Delivering data to end users in dierent forms
Combining data with other data
3rd party an...
Structured Data
Formats:
Good for re-use / Structured: MS Excel, CSV, XML, JSON, Microdata
Not so good for re-use: Pure we...
�������� ��������������
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 25 / 252
Non-Proprietary Formats
Specialist tools often have specialist formats
Few people have the tools
Expensive
Dicult to re-us...
URIs as Identiers
������������������������������������������������������������������������
Lehmann, Bühmann (Univ. Leipzig...
URIs as Identiers
�������������������������������������������������������
Lehmann, Bühmann (Univ. Leipzig) The Linked Data...
URIs as Identiers
URI-Design: prefer stable, implementation independent URIs
Lehmann, Bühmann (Univ. Leipzig) The Linked D...
URIs as Identiers
Turning local identiers into URIsWhy?
Make them globally unique
Clarify auhority
Make them resolvable
Ma...
Links to Other Data
Hyperlinks are the soul of the Web. The Web of Data is no dierent.
Lehmann, Bühmann (Univ. Leipzig) Th...
Links to Other Data
Hyperlinks are the soul of the Web. The Web of Data is no dierent.
������� ���������������������������...
Summary
Linked Data Principles:
1 Use URIs to name things (not only documents, but also people,
locations, concepts, etc.)...
Summary
Linked Data Principles:
1 Use URIs to name things (not only documents, but also people,
locations, concepts, etc.)...
Outline
1 Introduction to Linked Data
2 Linked Dataset Example: DBpedia
3 Linked Data Life-Cycle Overview
4 Knowledge Extr...
DBpedia
Community eort to extract structured information from Wikipedia
and to make this information available on the Web
...
Wikipedia Limitations
Simple Questions  hard to answer with Wikipedia:
What have Innsbruck and Leipzig in common?
Who are ...
Structure in Wikipedia
Title
Abstract
Infoboxes
Geo-coordinates
Categories
Images
Links
other language versions
other Wiki...
DBpedia Information Extraction Framework
DBpedia Information Extraction Framework (DIEF)
Started in 2007
Hosted on Sourcef...
DIEF - Overview
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 38 / 252
DIEF - Raw Infobox Extractor
WikiText syntax
{{Infobox Korean settlement
|title = Busan Metropolitan City
...
|area_km2 = ...
DIEF - Raw Infobox Extractor/Diversity
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 40 / 252
DIEF - Raw Infobox extractor/Diversity
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 41 / 252
DIEF - Mapping-Based Infobox Extractor
Cleaner data:
Combine what belongs together (birth_place, birthplace)
Separate what...
DIEF - Mapping-Based Infobox Extractor
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 43 / 252
URI/IRI schemes
http://{lang.}dbpedia.org is the main domain
For every article there exists a DBpedia resource in the form...
Linked Data Publication via 303 Redirects
http://dbpedia.org/resource/Dresden - URI of the city of
Dresden
http://dbpedia....
DBpedia Links
Data set Predicate Count Tool
Amsterdam Museum owl:sameAs 627 S
BBC Wildlife Finder owl:sameAs 444 S
Book Ma...
DBpedia Links
Data set Predicate Count Tool
ickr wrappr dbp:hasPhoto- 3 800 000 C
Collection
Freebase owl:sameAs 3 600 000...
DBpedia Links
Data set Predicate Count Tool
Revyu owl:sameAs 6
Sider owl:sameAs 2 000 S
TCMGeneDIT owl:sameAs 904
UMBEL rd...
DBpedia Links - Query Example
Compare funding per year (from FTS) and country with the gross domestic
product of a country...
Infrastructure
DBpedia has two extraction modes:
Wikipedia-database-dump-based extraction
DBpedia Live synchronisation (mo...
Query Answering
Back to our Wikipedia questions:
What have Innsbruck and Leipzig in common?
Who are mayors of central Euro...
DBpedia Live
DBpedia dumps are generated on a bi-annual basis
Wikipedia has around 100,000  150,000 page edits per day
DBp...
DBpedia Live - Overview
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 53 / 252
DBpedia Internationalization (I18n)
DBpedia Internationalization Committee founded:
http://wiki.dbpedia.org/Internationali...
DBpedia I18n - Overview
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 55 / 252
Applications: Disambiguation
Named entity recognition and disambiguation Tools such as: DBpedia
Spotlight, AlchemyAPI, Sem...
Applications: Question Answering
DBpedia is the primary target for several QA systems in the Question
Answering over Linke...
Applications: Faceted Browsing
Neofonie Browser
gFacet
OpenLink faceted browser (fct)
Lehmann, Bühmann (Univ. Leipzig) The...
Applications: Search and Querying
Query Builder
RelFinder
SemLens
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cy...
Applications: Digital Libraries  Archives
Virtual International Authority Files (VIAF) project as Linked Data
VIAF added a...
Applications: DBpedia Mobile
DBpedia Mobile is a location-centric DBpedia client application for mobile
devices consisting...
Applications: DBpedia Wiktionary
Wiktionary is a Wikimedia project: http://wiktionary.org
171 languages, 3M words for Engl...
Other Applications
See http://wiki.dbpedia.org/Applications for a more complete list
Lehmann, Bühmann (Univ. Leipzig) The ...
Outline
1 Introduction to Linked Data
2 Linked Dataset Example: DBpedia
3 Linked Data Life-Cycle Overview
4 Knowledge Extr...
Linked Data - Achievements and Challenges
Achievements:
1 Extension of the Web with a data
commons (50B facts)
2 vibrant, ...
Interlinking
/ Fusing
Classifi-
cation/
Enrichment
Quality
Analysis
Evolution /
Repair
Search/
Browsing/
Exploration
Extra...
Extraction
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 67 / 252
Extraction
From unstructured sources
Formats: plain text
Methods: NLP, text mining, ontology learning
From semi-structured...
Extraction Challenges
From unstructured sources
Improve F-Measure of existing NLP approaches (OpenCalais, Ontos
API)
Devel...
1234567859A8BC74DE96
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 70 / 252
RDF Data Management
From unstructured sources
SPARQL RDF access still by a factor 2-10 slower than relational data
managem...
Storage and Querying Challenges
Reduce the performance gap between relational and RDF data
management
SPARQL Query extensi...
Authoring
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 73 / 252
Authoring
Integrated in Existing Environments: Tiki
Data oriented: RDFauthor, rdfEditor
Schema oriented: Protégé, TopBraid...
Authoring: Semantic Wikis
1 Semantic (Text) Wikis
Authoring of semantically annotated
texts
Semantic MediaWiki, KiWi,
(Wik...
Authoring: Semantic Wikis
1 Semantic (Text) Wikis
Authoring of semantically annotated
texts
Semantic MediaWiki, KiWi,
(Wik...
123345647347829A2B8CDDB2EFCC22F
1234235
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 76 / 252
Interlinking
Data Web is an uncontrolled environment  proliferation of equivalent
or similar entities  need for links / me...
Interlinking Challenges
Apply work in the de-duplication/record linkage literature
Consider the open world nature of Linke...
1234567829
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 79 / 252
Enrichment
Currently, lack of knowledge bases with sophisticated schema
information and instance data adhering to this sch...
Enrichment: Example
Given: knowledge base with property birthPlace (i.e. triples using that
property) but no information o...
Repair
Ontology Debugging: OWL reasoning to detect inconsistencies and
satisable classes + detect the most likely sources ...
1234567
89347A5A
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 83 / 252
Linked Data Quality Analysis
Quality on the Data Web is varying a lot
Hand crafted or expensively curated knowledge base (...
Evolution © CC-BY-SA by alasis on flickr)
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 85 / 252
KB Evolution
Tasks:
Performing knowledge base changes / refactoring
Ensuring consistency of related knowledge
Managing cha...
1234567895A
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 87 / 252
Exploration
RDF data can be complex (as discussed by Pascal Hitzler)
Exploration phase aims to make data accessible to non...
Catalogus Professorum Lipsiensis
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 89 / 252
Visual Query Builder
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 90 / 252
Relationship Finder in CPL
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 91 / 252
Interlinking
/ Fusing
Classifi-
cation/
Enrichment
Quality
Analysis
Evolution /
Repair
Search/
Browsing/
Exploration
Extra...
Make the Web a Linked Data Washing Machine
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 93 / 252
Tool Support for Life-Cycle?
Many SW tools support one or more life-cycle stages
Linked Data Stack (http://stack.linkeddat...
Outline
1 Introduction to Linked Data
2 Linked Dataset Example: DBpedia
3 Linked Data Life-Cycle Overview
4 Knowledge Extr...
Knowledge Extraction
Knowledge Extraction is the creation of knowledge from structured
(relational databases, XML) and uns...
Categorisation of Approaches
Source - Examples: plain text, relational databases, XML, CSV
Exposition - How is the extract...
Extraction from Structured Sources to RDF
Simple mappings from RDB tables/views to RDF
Direct mapping of the model of rela...
Extraction from Natural Language Sources
80% of the information in business documents is in unstructured
natural language
...
LinkedGeoData + Sparqlify
Example: LinkedGeoData Knowledge Extraction Project using Sparqlify
Structure
Motivation
OpenStr...
Motivation
Ease information integration tasks that require spatial knowledge,
such as
Oerings of bakeries next door
Map of...
OpenStreetMap - Datamodel
Basic entities are:
Nodes Latitude, Longitude.
Ways Sequence of nodes.
Relations Associations be...
Example: Leipzig's Zoo
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 103 / 252
Comparison: Leipzig's Zoo (OpenStreetMap)
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 104 / 252
Comparison: Leipzig's Zoo (GoogleMaps)
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 105 / 252
LGD Architecture
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 106 / 252
Tag Mappings
Key-value pairs will be assigned to
RDF ressources
Each pair (k, v ) can be annotated with
datatypes, languag...
View Denition
RDF mapping of the data from a
PostgreSQL database
Create View lgd_nodes As
Construct {
?n a lgdm:Node .
?n ...
Sparqlify
SPARQL-SQL Rewriter
Rewrites SPARQL Queries according
to the view denition
Platform module oers SPARQL
Endpoint ...
Rest-API
Oers REST methods for frequent
queries
Based on SPARQL (Virtuoso) endpoint
Lehmann, Bühmann (Univ. Leipzig) The L...
Downloads
RDF dataset for download
Generated using
Construct { ?s ?p ?o }
http:
//downloads.linkedgeodata.org
Lehmann, Büh...
Ontology
Enriched classes and properties with multilingual labels from
TranslateWiki
http://translatewiki.net
Imported ico...
SML Mapping Examples
The following slides demonstrate how to map relational data to RDF
with the Sparqlication Mapping Lan...
SML - Mapping Example I: The Goal (1/4)
Input Table
nodes
id geom
1 POINT(0 0)
2 POINT(1 1)
How to map tables to RDF?
How ...
SML - Mapping Example I: SML Syntax Outline (2/4)
Input Table
nodes
id geom
1 POINT(0 0)
2 POINT(1 1)
Create View myNodesV...
SML - Mapping Example I: Construct and From (3/4)
Input Table
nodes
id geom
1 POINT(0 0)
2 POINT(1 1)
Create View myNodesV...
SML - Mapping Example I: Complete! (4/4)
Input Table
nodes
id geom
1 POINT(0 0)
2 POINT(1 1)
Create View myNodesView As
Co...
SML Mapping Examples
A more complex example, which demonstrates the use of an SQL
mapping table and an SQL helper view.
Le...
SML - Mapping Example II: The Goal (1/8)
Input Table
node_tags
id k v
1 name Universitaet Leipzig
1 name:en University of ...
SML - Mapping Example II: Source Data (2/8)
OSM Table
node_tags
id k v
1 name Universitaet Leipzig
1 name:en University of...
SML - Mapping Example II: Mapping Table (3/8)
OSM Table RDF Mapping Table
node_tags
id k v
1 name Universitaet Leipzig
1 n...
SML - Mapping Example II: Helper View (4/8)
OSM Table RDF Mapping Table
node_tags
id k v
1 name Universitaet Leipzig
1 nam...
SML - Mapping Example II: SML View (5/8)
Logical Table SML View
lgd_node_tags_literal
id property v lang
1 rdfs:label Univ...
SML - Mapping Example II: SML View (6/8)
Logical Table SML View
lgd_node_tags_literal
id property v lang
1 rdfs:label Univ...
SML - Mapping Example II: SML View (7/8)
Logical Table SML View
lgd_node_tags_literal
id property v lang
1 rdfs:label Univ...
SML - Mapping Example II: SML View (8/8)
Logical Table SML View
+
lgd_node_tags_literal
id property v lang
1 rdfs:label Un...
Further Tag Mappings
lgd_map_dataype
k datatype
seats integer
unisex boolean
lgd_map_property
k property
website foaf:home...
LGD Edit Tool
Multi User Tag Mapping WebApp
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 128 / 2...
Resources
Sparqlify
http://sparqlify.org
LinkedGeoData
http://linkedgeodata.org
Tag Mappings
https://github.com/GeoKnow/Li...
Statistics (15 August 2013)
Complete OSM planet le corresponds to ∼ 20.000.000.000 triples
Virtual access via Sparqlify
Do...
Access
Materialized Sparql Endpoint (based on Virtuoso DB, download
datasets loaded)
http://linkedgeodata.org/sparql
http:...
Use Cases Augmented Reality
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 132 / 252
Use Cases Generic Browsing
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 133 / 252
Use Cases Generic Browsing
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 134 / 252
Outline
1 Introduction to Linked Data
2 Linked Dataset Example: DBpedia
3 Linked Data Life-Cycle Overview
4 Knowledge Extr...
Why Link Discovery?
1 Fourth Linked Data
principle
2 Links are central for
Cross-ontology QA
Data Integration
Reasoning
Fe...
Why is it dicult?
1 Time complexity
Large number of triples
Quadratic a-priori runtime
69 days for mapping cities from
DBp...
Why is it dicult?
2 Complexity of specications
Combination of several attributes required for high precision
Tedious disco...
LIMES Framework
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 139 / 252
Runtime Optimization
Reduce the number of comparisons C (A) ≥ |M | (assuming we need
all σ/θ values for links)
Maximize re...
Runtime Optimization
Reduce the number of comparisons C (A) ≥ |M | (assuming we need
all σ/θ values for links)
Maximize re...
RR Guarantee
Best achievable reduction ratio: RRmax = 1 − |M |
|S||T |
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Li...
RR Guarantee
Best achievable reduction ratio: RRmax = 1 − |M |
|S||T |
Approach H(α) fullls RR guarantee criterion, i:
∀r ...
RR Guarantee
Best achievable reduction ratio: RRmax = 1 − |M |
|S||T |
Approach H(α) fullls RR guarantee criterion, i:
∀r ...
Goal
Formal Goal
Devise H(α) : ∀r  1, ∃α : RRR(H(α)) ≤ r
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-...
Restrictions
Minkowski Distance
δ(s, t) = p
n
i=1
|si − ti |p
, p ≥ 2
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Lif...
Space Tiling
HYPPO
δ(s, t) ≤ θ describes a hypersphere
Approximate hypersphere by using a hypercube
Easy to compute
No los...
Space Tiling
Set width of single hypercube to ∆ = θ/α
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-...
Space Tiling
Set width of single hypercube to ∆ = θ/α
Tile Ω = S ∪ T into the adjacent cubes C
Coordinates: (c1, . . . , c...
Space Tiling
Set width of single hypercube to ∆ = θ/α
Tile Ω = S ∪ T into the adjacent cubes C
Coordinates: (c1, . . . , c...
HYPPO
Combine (2α + 1)n
hypercubes around C (ω) to approximate
hypersphere
RRR(HYPPO(α)) = (2α+1)n
αnS(n)
lim
α→∞
RRR(HYPP...
HYPPO
RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08...
HYPPO
RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50
lim
α→∞
RRR(HYPPO(α)) = 4
π ≈ 1.27 (n = 2)
lim
α→∞
RRR(HYPPO(α)) = ...
HR3
: Idea
index(C , ω) =



0 if ∃i : |ci − c(ω)i | ≤ 1, 1 ≤ i ≤ n,
n
i=1
(|ci − c(ω)i | − 1)p
else,
Lehmann, Bühmann ...
HR3
: Idea
Compare C (ω) with C i index(C , ω) ≤ αp
α = 4, p = 2
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cyc...
HR3
: Idea
Lemma
∀s ∈ S : index(C , s)  αp
implies that all t ∈ C are non-matches
Claims
No loss of recall
lim
α→∞
RRR(HR3...
HR3
: Lemma 3
Lemma
∀α  1 RRR(HR3(2α))  RRR(HR3(α))
p = 2, α = 4
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cyc...
HR3
: Proof
Lemma
∀α  1 RRR(HR
3(2α))  RRR(HR
3(α))
p = 2, α = 8
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cyc...
HR3
: Proof
Lemma
∀α  1 RRR(HR
3(2α))  RRR(HR
3(α))
p = 2, α = 25
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cy...
HR3
: Proof
Lemma
∀α  1 RRR(HR
3(2α))  RRR(HR
3(α))
p = 2, α = 50
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cy...
HR3
: Idea
Theorem
lim
α→∞
RRR(HR3(α)) = 1
Claims
No loss of recall
lim
α→∞
RRR(HR3(α)) = 1
Lehmann, Bühmann (Univ. Leipzi...
HR3
: Experiments
Compare HR3 with LIMES 0.5's HYPPO and SILK 2.5.1
Experimental Setup:
Deduplicating DBpedia places by mi...
HR3
: Experiments (Comparisons)
Experiment 2: Deduplicating DBpedia places, θ = 99m
0.64 × 10
6 less comparisons
Lehmann, ...
HR3
: Experiments (Comparisons)
Experiment 4: Linking Geonames and LinkedGeoData, θ = 9
◦
4.3 × 10
6 less comparisons
Lehm...
HR3
: Experiments (Runtime)
Experiment 1, 2: DBpedia, θ = 49, 99m
Experiment 3, 4: Geonames and LGD, θ = 1, 9
◦
Exp. 1 Exp...
HR3
: Summary
Mission
New category of algorithms for link discovery
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-...
HR3
: Summary
Mission
New category of algorithms for link discovery
Presented HR3
Link discovery in ane spaces with Minkow...
Learning Complex Specications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EA...
Learning Complex Specications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EA...
Learning Complex Specications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EA...
Learning Complex Specications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EA...
Learning Complex Specications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EA...
Learning Complex Specications
Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EA...
Learning Complex Specications
Insight
Choice of right example is key for learning
So far, only use of informativeness
Lehm...
Learning Complex Specications
Insight
Choice of right example is key for learning
So far, only use of informativeness
Ques...
Learning Complex Specications
Insight
Choice of right example is key for learning
So far, only use of informativeness
Ques...
Basic Idea
Use similarity of link candidates when selecting most informative
examples (intra + inter class similarity)
Leh...
Basic Idea
Use similarity of link candidates when selecting most informative
examples (intra + inter class similarity)
Leh...
Basic Idea
Use similarity of link candidates when selecting most informative
examples (intra + inter class similarity)
Leh...
Similarity of Candidates
Link candidate x = (s, t) can be regarded as vector
(σ1(x), . . . , σn(x)) ∈ [0, 1]n
.
Similarity...
Graph Clustering
Rationale: Use intra-class similarity
Approach
Cluster elements of S
+
and S
−
independently
Choose one e...
BorderFlow
G = (V , E , ω) with V = S
+ or V = S
−
ω(x, y ) = sim(x, y )
Keep best ec edges for each x ∈ V
Lehmann, Bühman...
BorderFlow
Seed-based algorithm
Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X )
Ω(b(X ),n(X ))
Lehmann, Bühmann (Univ....
BorderFlow
Seed-based algorithm
Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X )
Ω(b(X ),n(X ))
Lehmann, Bühmann (Univ....
BorderFlow
Seed-based algorithm
Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X )
Ω(b(X ),n(X ))
http://sourceforge.net/...
BorderFlow
Seed-based algorithm
Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X )
Ω(b(X ),n(X ))
Lehmann, Bühmann (Univ....
BorderFlow
Seed-based algorithm
Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X )
Ω(b(X ),n(X ))
http://sourceforge.net/...
Conclusion
Can be combined with arbitrary active learning ML algorithms
Was experimentally combined with EAGLE (genetic pr...
Summary
Linking crucial task in the web of data
Tow key problems
1 Ecient execution of link specications
2 Creation of lin...
Outline
1 Introduction to Linked Data
2 Linked Dataset Example: DBpedia
3 Linked Data Life-Cycle Overview
4 Knowledge Extr...
Motivation
rise in the availability and usage of knowledge bases
still a lack of knowledge bases that consist of sophistic...
Example
d b r : B r a d _ P i t t : b i r t h P l a c e d b r : Shawnee , _Oklahoma ;
a : P e r s o n .
d b r : Angela_Mer...
Benets of an expressive schema
Axioms serve as documentation for the purpose and correct usage of
schema elements
Addition...
Each person was only born at one place?!
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 176 / 252
birthPlace birthPlace
birthPlace birthPlace
=
birthPlace birthPlace
=
birthPlace is functional
birthPlace birthPlace
=
birthPlace is functional
birthPlace birthPlace
=
birthPlace is functional
SELECT ? s WHERE {
? s dbo : b i r t h P l a c e ?o1 .
? s dbo : b i r t ...
Where was Julia Nannie Wallace born?
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 178 / 252
Julia Nannie Wallace was born in Lacrosse?
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 179 / 252
No, Julia Nannie Wallace was born in La Crosse!
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 180...
birthPlace
birthPlace
Sport
rdf:type
birthPlace
Sport
rdf:type
birthPlace range Place
birthPlace
Sport
rdf:type
birthPlace range Place
Place
rdf:type
birthPlace
Sport
rdf:type
birthPlace range Place
Place
rdf:type
Place disjointWith Sport
=
birthPlace
Sport
rdf:type
birthPlace range Place
Place
rdf:type
Place disjointWith Sport
=
birthPlace
rdf:type
birthPlace range Place
Place
rdf:type
Place disjointWith Sport
City
birthPlace
rdf:type
birthPlace range Place
Place
rdf:type
Place disjointWith Sport
City
SELECT ? s ? place WHERE {
? s dbo...
3 Steps to get a schema
SPARQL
Endpoint
Input: Entity URI,
Axiom Type,
Knowledge Base
(SPARQL Endpoint)
3-Phase Enrichment...
3 Steps to get a schema
1. obtain schema
information
SPARQL
Endpoint
Input: Entity URI,
Axiom Type,
Knowledge Base
(SPARQL...
3 Steps to get a schema
1. obtain schema
information
Reasoner
SPARQL
Endpoint
Input: Entity URI,
Axiom Type,
Knowledge Bas...
3 Steps to get a schema
1. obtain schema
information
Reasoner
SPARQL
Endpoint
Enrichment
Ontology
Input: Entity URI,
Axiom...
3 Steps to get a schema
1. obtain schema
information
Reasoner
SPARQL
Endpoint
Enrichment
Ontology
Input: Entity URI,
Axiom...
Starting Point
SPARQL endpoint: http://dbpedia.org/sparql
Entity URI: http://dbpedia.org/ontology/author
Axiom Type: Objec...
Step 1 - Obtaining Schema Information
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
Step 1 - Obtaining Schema Information
CONSTRUCT WHERE {
?sub r d f s : subClassOf ?sup .
}
ORDER BY DESC(? sub ) LIMIT 100...
Step 1 - Obtaining Schema Information
CONSTRUCT WHERE {
?sub r d f s : subClassOf ?sup .
}
ORDER BY DESC(? sub ) LIMIT 100...
Step 2 - Obtain axiom type and entity specic data
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 1...
Step 2 - Obtain axiom type and entity specic data
SELECT ? type (COUNT(DISTINCT ? s ) AS ? cnt ) WHERE {
? s dbo : author ...
Step 2 - Obtain axiom type and entity specic data
SELECT ? type (COUNT(DISTINCT ? s ) AS ? cnt ) WHERE {
? s dbo : author ...
Step 2 - Obtain axiom type and entity specic data
CONSTRUCT WHERE {
? ind dbo : author ?o .
? ind a ? type .
}
ORDER BY DE...
Step 2 - Obtain axiom type and entity specic data
CONSTRUCT WHERE {
? ind dbo : author ?o .
? ind a ? type .
}
ORDER BY DE...
Step 3 - Scoring
dbpedia : The_Adventures_of_Tom_Sawyer
dbo : author dbpedia : Mark_Twain ;
r d f : type dbo : Book .
dbpe...
Step 3 - Scoring
dbpedia : The_Adventures_of_Tom_Sawyer
dbo : author dbpedia : Mark_Twain ;
r d f : type dbo : Book .
dbpe...
Step 3 - Scoring
dbpedia : The_Adventures_of_Tom_Sawyer
dbo : author dbpedia : Mark_Twain ;
r d f : type dbo : Book .
dbpe...
Step 3 - Scoring
dbpedia : The_Adventures_of_Tom_Sawyer
dbo : author dbpedia : Mark_Twain ;
r d f : type dbo : Book .
dbpe...
Step 3 - Scoring(2)
Problem:
support for axiom in KB not taken into account
→ no dierence between 3 out of 3 and 100 out o...
Step 3 - Scoring(2)
Problem:
support for axiom in KB not taken into account
→ no dierence between 3 out of 3 and 100 out o...
Step 3 - Scoring(2)
Problem:
support for axiom in KB not taken into account
→ no dierence between 3 out of 3 and 100 out o...
More Complex Axioms
Pattern Based Knowledge Base Enrichment, ISWC 2013
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Li...
Outlook and Summary
Schema in the Linked Data Web often shallow → tools needed to
support knowledge engineers
Showed some ...
Outline
1 Introduction to Linked Data
2 Linked Dataset Example: DBpedia
3 Linked Data Life-Cycle Overview
4 Knowledge Extr...
Motivation
increasing number of knowledge bases in the
Semantic Web (see e.g. LOD cloud)
maintenance of knowledge bases wi...
(Automatically) Detectable Ontology Problems
Common problems:
Syntactic Problems
Structural Problems
Semantic Problems (fo...
Syntactic Problems
Syntactic errors are mainly violations of conventions of the language in
which the ontology is modelled...
Structural Problems
Problems in the taxonomy
Example (Circularities)
A B, B C , C A
Lehmann, Bühmann (Univ. Leipzig) The L...
Reasoning Related Problems
Problems which negatively aect the performance of reasoning over
expressive knowledge bases
Exa...
Linked Data Related Problems
Problems which are the specic to publishing RDF using the Linked Data
principles
Incorrect im...
Semantic Problems
Logical contradictions in the underlying knowledge base
Example (Unsatisable classes)
O = {A B C , C ¬B}...
Ontology Debugging
Problem: We have undesirable entailments
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 20...
Ontology Debugging
Problem: We have undesirable entailments
Solution: Repair (Delete/Modify) responsible axioms
Lehmann, B...
Ontology Debugging
Problem: We have undesirable entailments
Solution: Repair (Delete/Modify) responsible axioms
Question: ...
Ontology Debugging
Problem: We have undesirable entailments
Solution: Repair (Delete/Modify) responsible axioms
Question: ...
Justication
Justication
For an ontology O and an entailment η where O |= η, a set of axioms J is
a justication for η in O ...
Justication - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
Lehmann, Bühmann (Univ. Leipz...
Justication - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
Lehmann, Bühmann (Univ...
Justication - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
J1 = {1, 2, 3}
Lehmann...
Justication - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
J1 = {1, 2, 3}
J2 = {5...
Justication - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
J1 = {1, 2, 3}
J2 = {5...
Justication Based Repair
For a repair, at least one axiom from every justication needs to be
removed.
For a repair plan, a...
Justication Algorithms
Single justication:
Glass Box: Modifying underlying reasoning algorithm (tableau tracing)
Black-Box...
Black-Box
Expansion-Contraction Strategy
Expansion: Add axioms to empty set until entailment holds
Contraction: Remove axi...
Hitting Set Tree Algorithm
from eld of Model Based Diagnosis
given a faulty system (ontology), it constructs nite tree who...
Hitting Set Tree Algorithm - Example
CHAPTER 3. COMPUTING JUSTIFICATIONS 63
Figure 3.2: An Example of a Hitting Set Tree
J...
Justication Scenarios
A user can be faced with the following situations:
Small number of small justications
Easy and pleas...
Root Unsatisability - Denitions
A root UC is a class whose unsatisability does not depend on another
class, otherwise it i...
Root Unsatisability - Approaches
Approaches:
1: compute all justications for each unsatisable class and apply the
denition...
Root Unsatisability - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
Lehmann, Bühmann (Uni...
Root Unsatisability - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
Lehmann, Bühma...
Root Unsatisability - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
J1 = {1, 2, 3}...
Root Unsatisability - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
J1 = {1, 2, 3}...
Root Unsatisability - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
J1 = {1, 2, 3}...
Root Unsatisability - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
J1 = {1, 2, 3}...
Root Unsatisability - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
J1 = {1, 2, 3}...
Root Unsatisability - Example
O = {
B ∃r .D (1)
B ∀r .¬D (2)
A B C (3)
B ¬C (4)
A E (5)
A ¬E F (6)
}
|= A ⊥
J1 = {1, 2, 3}...
Axiom Relevance
resolving justication requires to delete or edit axioms
ranking methods highlight the most probable causes...
Repair Consequences
after repairing process, axioms have been deleted or modied
→ desired entailments may be lost or new e...
SPARQL Endpoint Support
Previously mentioned approaches are implemented in the ORE tool
(http://ore-tool.net)
ORE supports...
SPARQL Endpoint Support II
algorithm performs sanity checks, e.g. SPARQL queries which probe
for typical inconsistent axio...
SPARQL Endpoint Support II
algorithm performs sanity checks, e.g. SPARQL queries which probe
for typical inconsistent axio...
DBpedia Live Demo
Inconsistency in DBpedia Live:
Individual: dbr:Purify_(album)
Facts: dbo:artist dbr:Axis_of_Advance
Indi...
DBpedia Live Demo 2
Inconsistency in DBpedia in combination with WGS84 (Linked Data):
Individual: dbr:WKWS Facts: geo:long...
OpenCyc Demo
Inconsistency in OpenCyc:
Individual: 'PopulatedPlace'
Types: 'ArtifactualFeatureType', 'ExistingStuffType'
C...
ORE - Screenshot
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 221 / 252
ORE - Screenshot
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 222 / 252
Related Tools
Swoop
can compute justications for unsatisability of classes and oers repair
mode
ne-grained justication com...
Related Tools II
PION and DION
developed in the SEKT project to deal with inconsistencies
PION is an inconsistency toleran...
Outline
1 Introduction to Linked Data
2 Linked Dataset Example: DBpedia
3 Linked Data Life-Cycle Overview
4 Knowledge Extr...
Motivation
User Query Interfaces:
Knowledge Base
Specic Interfaces
Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-C...
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
The Linked Data Life-Cycle
Upcoming SlideShare
Loading in...5
×

The Linked Data Life-Cycle

930

Published on

Presentation of the Linked Data Lifecycle given at the ICCL Summer School 2013.

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
930
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
75
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "The Linked Data Life-Cycle"

  1. 1. The Linked Data Life-Cycle Jens Lehmann Lorenz Bühmann contributors: Quan Nguyen Sören Auer Richard Cyganiak Daniel Gerber Sebastian Hellmann Anja Jentzsch Dimitris Kontokostas Axel Ngonga Claus Stadler Christina Unger 2013-08-23 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 1 / 252
  2. 2. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 2 / 252
  3. 3. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 3 / 252
  4. 4. The Linked Data Principles The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. Linked Data principles: 1 Use URIs as names for things. 2 Use HTTP URIs, so that people can look up those names. 3 When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL). 4 Include links to other URIs, so that they can discover more things. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 4 / 252
  5. 5. LOD Cloud Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 5 / 252
  6. 6. Linked Data Principles Detailed: 1 + 2 1 URI references to identify not just Web documents and digital content, but also real world objects and abstract concepts tangible things: people, places abstract things: relationship type of knowing somebody 2 HTTP URIs enable re-use of Web architecture Linked Data gives emphasis to the Web in Semantic Web Resource dereferencing Re-use of standard tools for security, load-balancing etc. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 6 / 252
  7. 7. Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: HTML for humans, RDF for machines Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 7 / 252
  8. 8. Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: HTML for humans, RDF for machines Achievable using an HTTP mechanism called content negotiation Basic idea: HTTP client sends HTTP headers with each request to indicate what kinds of documents they prefer Servers can inspect headers and select appropriate response Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 7 / 252
  9. 9. Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: HTML for humans, RDF for machines Achievable using an HTTP mechanism called content negotiation Basic idea: HTTP client sends HTTP headers with each request to indicate what kinds of documents they prefer Servers can inspect headers and select appropriate response Two strategies: 303 URIs Hash URIs Both ensure that objects and the documents that describe them are not confused + humans and machines can retrieve appropriate representations Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 7 / 252
  10. 10. 303 URIs 303 Redirect: instead of sending the object itself over the network, the server responds to the client with the HTTP response code 303 See Other and the URI of a Web document which describes the real-world object Second step: client dereferences new URI and gets a Web document describing the real-world object Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 8 / 252
  11. 11. Hash URIs Hash URI strategy builds on characteristic that URIs may contain a special part (fragment identier) separated from their base part by a hash symbol (#) HTTP protocol requires the fragment part to be stripped o before requesting the URI from the server → a URI that includes a hash cannot be retrieved directly and therefore does not necessarily identify a Web document Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 9 / 252
  12. 12. Hash versus 303 Hash Uris (+) Reduced number of necessary HTTP round-trips → reduces access latency (-) Descriptions of all resources sharing the same non-fragment URI part are always returned to the client together → can lead to large amounts of data being unnecessarily transmitted to the client 303 Uris (+) Flexible because the redirection target can be congured separately for each resource (usually points to a single document for each resource, but could also summarise several resources) (-) Requires two HTTP requests to retrieve a single description of a real-world object Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 10 / 252
  13. 13. Principles Detailed: 4 Links If an RDF triple connects URIs in dierent namespaces/datasets, is is called a link (no unique syntactical denition of link exists) Basic idea of Linked Data: apply the general hyperlink-based architecture of the World Wide Web to the task of sharing structured data on global scale Research challenge: ecient creation of links with high precision and recall Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 11 / 252
  14. 14. Why Linked Data? Problem: Try to search for these things on the current Web: Apartments near German-Russian bilingual childcare in Leipzig. ERP service providers with oces in Vienna and London. Researchers working on multimedia topics in Eastern Europe. Information is available on the Web, but opaque to current Web search. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 12 / 252
  15. 15. Why Linked Data? Problem: Try to search for these things on the current Web: Apartments near German-Russian bilingual childcare in Leipzig. ERP service providers with oces in Vienna and London. Researchers working on multimedia topics in Eastern Europe. Information is available on the Web, but opaque to current Web search. Solution: complement text on Web pages with structured linked open data intelligently combine/integrate such structured information from dierent sources: Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 13 / 252
  16. 16. How to get there? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 14 / 252
  17. 17. Tim Berners-Lee's 5-star plan Tim Berners-Lee's 5-star plan for an open web of data Make data available on the Web under an open license Make it available as structured data Use a non-proprietary format Use URIs to identify things Link your data to other people's data to provide context Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 15 / 252
  18. 18. The 0th star Data catalog with good metadata Make your data ndable Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 16 / 252
  19. 19. Data on the Web, Open License ���������� ���� �������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 17 / 252
  20. 20. Data on the Web, Open License Open vs. Closed: Data used to be closed by default In the future, it may be open by default. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 18 / 252
  21. 21. Data on the Web, Open License Publishers: sharing data to make it more visible Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 19 / 252
  22. 22. Data on the Web, Open License E-Commerce: Data sharing for increasing trac Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 20 / 252
  23. 23. Data on the Web, Open License Community: Collaboratively created databases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 21 / 252
  24. 24. Good reasons against opening data Privacy Competitive advantage Producing data and charging for it as business model Can't get license from upstream Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 22 / 252
  25. 25. Structured Data Enabling re-use: Delivering data to end users in dierent forms Combining data with other data 3rd party analysis of data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 23 / 252
  26. 26. Structured Data Formats: Good for re-use / Structured: MS Excel, CSV, XML, JSON, Microdata Not so good for re-use: Pure websites, MS Word Bad for re-use: PDF Really bad for re-use: Only charts/maps without numbers Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 24 / 252
  27. 27. �������� �������������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 25 / 252
  28. 28. Non-Proprietary Formats Specialist tools often have specialist formats Few people have the tools Expensive Dicult to re-use (Geospatial tools, statistics packages, etc.) Non-proprietary: CSV (dead simple) XML JSON RDF (good for 4+5 stars) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 26 / 252
  29. 29. URIs as Identiers ������������������������������������������������������������������������ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 27 / 252
  30. 30. URIs as Identiers ������������������������������������������������������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 28 / 252
  31. 31. URIs as Identiers URI-Design: prefer stable, implementation independent URIs Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 29 / 252
  32. 32. URIs as Identiers Turning local identiers into URIsWhy? Make them globally unique Clarify auhority Make them resolvable Make them linkable Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 30 / 252
  33. 33. Links to Other Data Hyperlinks are the soul of the Web. The Web of Data is no dierent. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 31 / 252
  34. 34. Links to Other Data Hyperlinks are the soul of the Web. The Web of Data is no dierent. ������� ����������������������������� �������� ���� ����� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 31 / 252
  35. 35. Summary Linked Data Principles: 1 Use URIs to name things (not only documents, but also people, locations, concepts, etc.) 2 To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs 3 When someone looks up a URI, provide useful information (structured data in RDF, SPARQL). 4 Include links to other URIs allowing agents to discover more things Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 32 / 252
  36. 36. Summary Linked Data Principles: 1 Use URIs to name things (not only documents, but also people, locations, concepts, etc.) 2 To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs 3 When someone looks up a URI, provide useful information (structured data in RDF, SPARQL). 4 Include links to other URIs allowing agents to discover more things 5-Star-Data: Five-star plan for realising an emerging web of data, dataset by dataset 2 stars: re-usable data 3 stars: open standards 4+5 stars: connect data silos Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 32 / 252
  37. 37. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 33 / 252
  38. 38. DBpedia Community eort to extract structured information from Wikipedia and to make this information available on the Web Allows to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data Semi-structured Wiki markup → structured information Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 34 / 252
  39. 39. Wikipedia Limitations Simple Questions hard to answer with Wikipedia: What have Innsbruck and Leipzig in common? Who are mayors of central European towns elevated more than 1000m? Which movies are starring both Brad Pitt and Angelina Jolie? All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 35 / 252
  40. 40. Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguation ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 36 / 252
  41. 41. DBpedia Information Extraction Framework DBpedia Information Extraction Framework (DIEF) Started in 2007 Hosted on Sourceforge and Github Initially written in PHP but fully re-written Written in Scala and Java Around 40 Contributors See https://www.ohloh.net/p/dbpedia for detailed overview Can potentially be adapted to other MediaWikis Currently Wiktionary http://wiktionary.dbpedia.org Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 37 / 252
  42. 42. DIEF - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 38 / 252
  43. 43. DIEF - Raw Infobox Extractor WikiText syntax {{Infobox Korean settlement |title = Busan Metropolitan City ... |area_km2 = 763.46 |pop = 3635389 |region = [[Yeongnam]] }} RDF serialization dbp:Busan dbp:title Busan Metropolitan City dbp:Busan dbp:area_km2 763.46^xsd:oat dbp:Busan dbp:pop 3635389^xsd:int dbp:Busan dbp:region dbp:Yeongnam Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 39 / 252
  44. 44. DIEF - Raw Infobox Extractor/Diversity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 40 / 252
  45. 45. DIEF - Raw Infobox extractor/Diversity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 41 / 252
  46. 46. DIEF - Mapping-Based Infobox Extractor Cleaner data: Combine what belongs together (birth_place, birthplace) Separate what is dierent (bornIn, birthplace) Correct handling of datatypes Mappings Wiki: http://mappings.dbpedia.org Everybody can contribute to new mappings or improve existing ones ≈ 170 editors Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 42 / 252
  47. 47. DIEF - Mapping-Based Infobox Extractor Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 43 / 252
  48. 48. URI/IRI schemes http://{lang.}dbpedia.org is the main domain For every article there exists a DBpedia resource in the form: http://lang.dbpedia.org/resource/{ArticleName} Properties from the raw infobox extractor use the http://{lang.}dbpedia.org/property/namespace Ontology is global for all languages and under http://dbpedia.org/ontology/namespace Note: that for English language no language code is used http://dbpedia.org as main domain http://dbpedia.org/resource/{title} for articles http://dbpedia.org/property/{title} for properties Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 44 / 252
  49. 49. Linked Data Publication via 303 Redirects http://dbpedia.org/resource/Dresden - URI of the city of Dresden http://dbpedia.org/page/Dresden - information resource describing the city of Dresden in HTML format http://dbpedia.org/data/Dresden - information resource describing the city of Dresden in RDF/XML format further formats supported, e.g. http://dbpedia.org/data/Dresden.n3 for N3 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 45 / 252
  50. 50. DBpedia Links Data set Predicate Count Tool Amsterdam Museum owl:sameAs 627 S BBC Wildlife Finder owl:sameAs 444 S Book Mashup rdf:type 9 100 owl:sameAs Bricklink dc:publisher 10 100 CORDIS owl:sameAs 314 S Dailymed owl:sameAs 894 S DBLP Bibliography owl:sameAs 196 S DBTune owl:sameAs 838 S Diseasome owl:sameAs 2 300 S Drugbank owl:sameAs 4 800 S EUNIS owl:sameAs 3 100 S Eurostat (Linked Stats) owl:sameAs 253 S Eurostat (WBSG) owl:sameAs 137 CIA World Factbook owl:sameAs 545 S Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 46 / 252
  51. 51. DBpedia Links Data set Predicate Count Tool ickr wrappr dbp:hasPhoto- 3 800 000 C Collection Freebase owl:sameAs 3 600 000 C GADM owl:sameAs 1 900 GeoNames owl:sameAs 86 500 S GeoSpecies owl:sameAs 16 000 S GHO owl:sameAs 196 L Project Gutenberg owl:sameAs 2 500 S Italian Public Schools owl:sameAs 5 800 S LinkedGeoData owl:sameAs 103 600 S LinkedMDB owl:sameAs 13 800 S MusicBrainz owl:sameAs 23 000 New York Times owl:sameAs 9 700 OpenCyc owl:sameAs 27 100 C OpenEI (Open Energy) owl:sameAs 678 S Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 47 / 252
  52. 52. DBpedia Links Data set Predicate Count Tool Revyu owl:sameAs 6 Sider owl:sameAs 2 000 S TCMGeneDIT owl:sameAs 904 UMBEL rdf:type 896 400 US Census owl:sameAs 12 600 WikiCompany owl:sameAs 8 300 WordNet dbp:wordnet_type 467 100 YAGO2 rdf:type 18 100 000 Sum 27 211 732 (S: Silk, L: LIMES, C: custom script, missing: no regeneration) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 48 / 252
  53. 53. DBpedia Links - Query Example Compare funding per year (from FTS) and country with the gross domestic product of a country (from DBpedia) SELECT ∗ { { SELECT ? f t s y e a r ? d b p c o u n t r y (SUM( ? amount ) AS ? f u n d i n g ) { ?com r d f : t y p e f t s −o : Commitment . ?com f t s −o : y e a r ? y e a r . ? y e a r r d f s : l a b e l ? f t s y e a r . ? b e n e f i t f t s −o : d e t a i l A m o u n t ? amount . ? b e n e f i t f t s −o : b e n e f i c i a r y ? b e n e f i c i a r y . ? b e n e f i c i a r y f t s −o : c o u n t r y ? f t s c o u n t r y . ? f t s c o u n t r y owl : sameAs ? d b p c o u n t r y . } } { SELECT ? d b p c o u n t r y ? g d p y e a r ? g d p n o m i n a l { ? d b p c o u n t r y r d f : t y p e dbo : C o u n t r y . ? d b p c o u n t r y dbp : gdpNominal ? g d p n o m i n a l . ? d b p c o u n t r y dbp : gdpNominalYear ? g d p y e a r . } } FILTER ( ( ? f t s y e a r = s t r ( ? g d p y e a r ) ) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 49 / 252
  54. 54. Infrastructure DBpedia has two extraction modes: Wikipedia-database-dump-based extraction DBpedia Live synchronisation (more later) DBpedia Dumps: The DBpedia Dump archive is located in: http://downloads.dbpedia.org/ Latest downloads is described in: http://dbpedia.org/Downloads Ocial Endpoint (by OpenLink): http://dbpedia.org/sparql Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 50 / 252
  55. 55. Query Answering Back to our Wikipedia questions: What have Innsbruck and Leipzig in common? Who are mayors of central European towns elevated more than 1000m? Which movies are starring both Brad Pitt and Angelina Jolie? All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants Using the data extracted from Wikipedia and the public SPARQL endpoint DBpedia can answer these questions. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 51 / 252
  56. 56. DBpedia Live DBpedia dumps are generated on a bi-annual basis Wikipedia has around 100,000 150,000 page edits per day DBpedia Live pulls page updates in real-time and extraction results update the triple store In practice, a 5 minute update delay increases performance by 15% Links SPARQL Endpoint: http://live.dbpedia.org/sparql Documentation: http://wiki.dbpedia.org/DBpediaLive Statistics: http://live.dbpedia.org/LiveStats/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 52 / 252
  57. 57. DBpedia Live - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 53 / 252
  58. 58. DBpedia Internationalization (I18n) DBpedia Internationalization Committee founded: http://wiki.dbpedia.org/Internationalization Available DBpedia language editions in: Korean, Greek, German, Polish, Russian, Dutch, Portuguese, Spanish, Italian, Japanese, French Use the corresponding Wikipedia language edition for input Mappings available for 23 languages Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 54 / 252
  59. 59. DBpedia I18n - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 55 / 252
  60. 60. Applications: Disambiguation Named entity recognition and disambiguation Tools such as: DBpedia Spotlight, AlchemyAPI, Semantic API, Open Calais, Zemanta and Apache Stanbol Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 56 / 252
  61. 61. Applications: Question Answering DBpedia is the primary target for several QA systems in the Question Answering over Linked Data (QALD) workshop series IBM Watson relied also on DBpedia Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 57 / 252
  62. 62. Applications: Faceted Browsing Neofonie Browser gFacet OpenLink faceted browser (fct) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 58 / 252
  63. 63. Applications: Search and Querying Query Builder RelFinder SemLens Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 59 / 252
  64. 64. Applications: Digital Libraries Archives Virtual International Authority Files (VIAF) project as Linked Data VIAF added a total of 250,000 reciprocal authority links to Wikipedia. DBpedia can also provide: Context information for bibliographic and archive records (e.g. an author's demographics, a lm's homepage, an image etc.) Stable and curated identiers for linking. The broad range of Wikipedia topics can form the basis for a thesaurus for subject indexing. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 60 / 252
  65. 65. Applications: DBpedia Mobile DBpedia Mobile is a location-centric DBpedia client application for mobile devices consisting of a map view, the Marbles Linked Data Browser and a GPS-enabled launcher application. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 61 / 252
  66. 66. Applications: DBpedia Wiktionary Wiktionary is a Wikimedia project: http://wiktionary.org 171 languages, 3M words for English. Extracted Using the DBpedia Information Extraction Framework Easily congurable for every Wiktionary language edition Pre-congured for German, Greek, English, Russian and French. http://Wiktionary.dbpedia.org 100 milion triples Lemon model Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 62 / 252
  67. 67. Other Applications See http://wiki.dbpedia.org/Applications for a more complete list Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 63 / 252
  68. 68. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 64 / 252
  69. 69. Linked Data - Achievements and Challenges Achievements: 1 Extension of the Web with a data commons (50B facts) 2 vibrant, global RTD community 3 Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly, NY Times, Facebook, Google, Yahoo) 4 Governmental adoption in sight 5 Establishing Linked Data as a deployment path for the Semantic Web. Challenges: 1 Coherence: Relatively few, expensively maintained links 2 Quality: partly low quality data and inconsistencies 3 Performance: Still substantial penalties compared to relational 4 Data consumption: large-scale processing, schema mapping and data fusion still in its infancy 5 Usability: Missing direct end-user tools and network eect. These issues are closely related and should ultimately lead to an ecosystem of interlinked knowledge! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 65 / 252
  70. 70. Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 66 / 252
  71. 71. Extraction Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 67 / 252
  72. 72. Extraction From unstructured sources Formats: plain text Methods: NLP, text mining, ontology learning From semi-structured sources Formats: wiki markup, tags Tools: DBpedia framework (Wikipedia, Wictionary) From structured sources Formats: databases, spreadsheets, XML RDB2RDF tools: Sparqlify, D2R, Triplify CSV converters: RDF extension of Google Rene Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 68 / 252
  73. 73. Extraction Challenges From unstructured sources Improve F-Measure of existing NLP approaches (OpenCalais, Ontos API) Develop standardized, LOD enabled interfaces between NLP tools (NLP2RDF) From semi-structured sources Ecient bi-directional synchronization From structured sources Declarative syntax and semantics of data model transformations (W3C WG RDB2RDF) Orthogonal challenges Using LOD as background knowledge Provenance Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 69 / 252
  74. 74. 1234567859A8BC74DE96 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 70 / 252
  75. 75. RDF Data Management From unstructured sources SPARQL RDF access still by a factor 2-10 slower than relational data management Performance increases steadily Comprehensive, well-supported open-soure and commercial implementations are available: OpenLink's Virtuoso (os+commercial) OWLIM-Lite (free), OWLIM-SE, OWLIM-Enterprise Talis (hosted) Bigdata (distributed) Allegrograph (commercial) Mulgara (os) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 71 / 252
  76. 76. Storage and Querying Challenges Reduce the performance gap between relational and RDF data management SPARQL Query extensions: Spatial/semantic/temporal data management View maintenance / adaptive reorganization based on common access patterns More realistic benchmarks Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 72 / 252
  77. 77. Authoring Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 73 / 252
  78. 78. Authoring Integrated in Existing Environments: Tiki Data oriented: RDFauthor, rdfEditor Schema oriented: Protégé, TopBraid Composer, NeOn Toolkit, Swoop, Neologism, Knoodl Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 74 / 252
  79. 79. Authoring: Semantic Wikis 1 Semantic (Text) Wikis Authoring of semantically annotated texts Semantic MediaWiki, KiWi, (Wikipedia+DBpedia) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 75 / 252
  80. 80. Authoring: Semantic Wikis 1 Semantic (Text) Wikis Authoring of semantically annotated texts Semantic MediaWiki, KiWi, (Wikipedia+DBpedia) 2 Semantic Data Wikis Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL) OntoWiki Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 75 / 252
  81. 81. 123345647347829A2B8CDDB2EFCC22F 1234235 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 76 / 252
  82. 82. Interlinking Data Web is an uncontrolled environment proliferation of equivalent or similar entities need for links / merging Currently only few RDF triples are links Manual Link Discovery: Sindice Integration, LODStats, Semantic Pingback Tool supported / Semi-Automatic: SILK, LIMES, COMA, RDF-AI Usually via mapping specications / heuristics Machine Learning / Automatic: RAVEN, EAGLE, SILK GP Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 77 / 252
  83. 83. Interlinking Challenges Apply work in the de-duplication/record linkage literature Consider the open world nature of Linked Data Use LOD background knowledge Zero-conguration linking Explore active learning approaches, which integrate users in a feedback loop Maintain a 24/7 linking service: Linked Open Data Around-The-Clock project (http://latc-project.eu/) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 78 / 252
  84. 84. 1234567829 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 79 / 252
  85. 85. Enrichment Currently, lack of knowledge bases with sophisticated schema information and instance data adhering to this schema Goal: powerful reasoning, consistency checking and querying Manual: Via ontology editors, DBpedia mappings (Semi-)Automatic: DL-Learner, Statistical Schema Induction Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 80 / 252
  86. 86. Enrichment: Example Given: knowledge base with property birthPlace (i.e. triples using that property) but no information on the semantics of birthPlace Possibly enrichment: ObjectProperty: birthPlace Characteristics: Functional Domain: Person Range: Place SubPropertyOf: hasBeenAt Benets: axioms serve as documentation for purpose and correct usage of schema elements additional implicit information can be inferred improve the applicability of schema debugging techniques Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 81 / 252
  87. 87. Repair Ontology Debugging: OWL reasoning to detect inconsistencies and satisable classes + detect the most likely sources for the problems basic task: provide feedback to user for resolving undesired entailments justication J ⊆ O of an entailment is a minimal set of axioms from which the entailment can be drawn Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 82 / 252
  88. 88. 1234567 89347A5A Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 83 / 252
  89. 89. Linked Data Quality Analysis Quality on the Data Web is varying a lot Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia) Quality = Fitness for use Often not necessary to x all problems, but to know about them 30+ quality dimensions dened in recent survey Research Challenge Establish measures for assessing the authority, provenance, reliability of Data Web resources Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 84 / 252
  90. 90. Evolution © CC-BY-SA by alasis on flickr) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 85 / 252
  91. 91. KB Evolution Tasks: Performing knowledge base changes / refactoring Ensuring consistency of related knowledge Managing changes, e.g. undo operations Update materialized inferred data upon changes Update materialised links to other data upon changes Tools: Protégé - PROMPT and change management plugins EvoPat - easily re-usable and sharable evolution patterns dened via SPARQL PatOMat - ontology transformation framework Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 86 / 252
  92. 92. 1234567895A Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 87 / 252
  93. 93. Exploration RDF data can be complex (as discussed by Pascal Hitzler) Exploration phase aims to make data accessible to non-experts Options: Faceted Browsing Question Answering Query Builders Visualisation of statistical or geospatial data . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 88 / 252
  94. 94. Catalogus Professorum Lipsiensis Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 89 / 252
  95. 95. Visual Query Builder Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 90 / 252
  96. 96. Relationship Finder in CPL Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 91 / 252
  97. 97. Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 92 / 252
  98. 98. Make the Web a Linked Data Washing Machine Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 93 / 252
  99. 99. Tool Support for Life-Cycle? Many SW tools support one or more life-cycle stages Linked Data Stack (http://stack.linkeddata.org) provides a consolidated repository of such tools Each tool is a Debian package Lightweight integration between tools via common vocabularies and SPARQL Demonstrator interfaces for showing tools in combination Developed by LOD2 and GeoKnow EU projects Geo Know Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 94 / 252
  100. 100. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 95 / 252
  101. 101. Knowledge Extraction Knowledge Extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. Resulting knowledge needs to be in a machine-readable and machine-interpretable format and facilitate inferencing Similar to Information Extraction (NLP) and ETL (Data Warehouse), but main dierence: extraction result goes beyond the creation of structured information or the transformation into a relational schema Requires re-use of existing formal knowledge (reusing ontologies) or the generation of a schema based on the source data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 96 / 252
  102. 102. Categorisation of Approaches Source - Examples: plain text, relational databases, XML, CSV Exposition - How is the extracted knowledge made explicit? How can you query and perform inference? Synchronization - Is the knowledge extraction process executed once to produce a dump or is the result synchronized with the source? Are changes to the result written back (Bi-directional)? Reuse of Vocabularies - Can popular ontologies (Good Relations, FOAF, . . . ) be re-used to simplify global data integration? Automatisation - manual, semi-automatic, automatic Domain Ontology Required - Does the approach require a pre-dened ontology or can it create a schema from the source (e.g. ontology learning)? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 97 / 252
  103. 103. Extraction from Structured Sources to RDF Simple mappings from RDB tables/views to RDF Direct mapping of the model of relational databases to RDF Table → OWL class Row → Instance s of this class Cell with value o in column p → Triple (s,p,o) Details: http://www.w3.org/TR/rdb-direct-mapping/ Complex mappings of relational databases to RDF Additional renements can be employed to 1:1 mapping to improve the usefulness of RDF output Extract or learn an OWL schema from the given database schema Map the schema and its contents to a pre-existing domain ontology Powerful mapping languages: R2RML, SML XML XML tree structure can be directly converted to RDF graph structure Complex mappings possible, e.g. via XSLT processors Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 98 / 252
  104. 104. Extraction from Natural Language Sources 80% of the information in business documents is in unstructured natural language 1 (-) Increased complexity and decreased quality of extraction (+) Potential for a massive acquisition of extracted knowledge Traditional Information Extraction (IE) Recognize and categorise elements in text Techniques: Named Entity Recognition (NER), Coreference Resolution (CO), . . . Ontology Learning (OL) from Text Learn whole ontologies from natural language text Usually (semi-)automatic extracted 1 Wimalasuriya, Dou. Ontology-based information extraction: [. . . ] Journal of Information Science Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 99 / 252
  105. 105. LinkedGeoData + Sparqlify Example: LinkedGeoData Knowledge Extraction Project using Sparqlify Structure Motivation OpenStreetMap LGD Architecture Mapping Access (How LinkedGeoData is published) Use Cases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 100 / 252
  106. 106. Motivation Ease information integration tasks that require spatial knowledge, such as Oerings of bakeries next door Map of distributed branches of a company Historical sights along a bicycle track LOD cloud contains data sets with spatial features e.g. Geonames, DBpedia, US census, EuroStat But: they are restricted to popular or large entities like countries, famous places etc. or specic regions Therefore they lack buildings, roads, mailboxes, etc. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 101 / 252
  107. 107. OpenStreetMap - Datamodel Basic entities are: Nodes Latitude, Longitude. Ways Sequence of nodes. Relations Associations between any number of nodes, ways and relations. Every member in a relation plays a certain role. Each entity may be described with tags (= key-value pairs) A way is closed if the ID of the last referenced node equals that of the rst one. Whether a closed way denotes a linear ring or a polygon (i.e. whether the enclosed area is part of the respective OSM entity) depends on the tags. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 102 / 252
  108. 108. Example: Leipzig's Zoo Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 103 / 252
  109. 109. Comparison: Leipzig's Zoo (OpenStreetMap) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 104 / 252
  110. 110. Comparison: Leipzig's Zoo (GoogleMaps) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 105 / 252
  111. 111. LGD Architecture Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 106 / 252
  112. 112. Tag Mappings Key-value pairs will be assigned to RDF ressources Each pair (k, v ) can be annotated with datatypes, language tags, classes Mappings are themselves tables Example table: lgd_map_literal k property lang name rdfs:label name:en rdfs:label en alt_label skos:altLabel note rdfs:comment . . . . . . . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 107 / 252
  113. 113. View Denition RDF mapping of the data from a PostgreSQL database Create View lgd_nodes As Construct { ?n a lgdm:Node . ?n geom:geometry ?g . ?g ogc:asWKT ?o . } With ?n = uri(lgd:node, ?id) ?g = uri(lgd-geom:node, ?id) ?o = typedLiteral(?geom, ogc:wktLiteral) From nodes Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 108 / 252
  114. 114. Sparqlify SPARQL-SQL Rewriter Rewrites SPARQL Queries according to the view denition Platform module oers SPARQL Endpoint and Linked Data interface https: //github.com/AKSW/Sparqlify Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 109 / 252
  115. 115. Rest-API Oers REST methods for frequent queries Based on SPARQL (Virtuoso) endpoint Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 110 / 252
  116. 116. Downloads RDF dataset for download Generated using Construct { ?s ?p ?o } http: //downloads.linkedgeodata.org Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 111 / 252
  117. 117. Ontology Enriched classes and properties with multilingual labels from TranslateWiki http://translatewiki.net Imported icons for 90 classes from the freely available icon collection from the SJJB Management http://www.sjjb.co.uk/mapicons/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 112 / 252
  118. 118. SML Mapping Examples The following slides demonstrate how to map relational data to RDF with the Sparqlication Mapping Language (SML). Thereby, these prexes are used: Prexes prex IRI rdfs http://www.w3.org/2000/01/rdf-schema# ogc http://www.opengis.net/ont/geosparql# geom http://geovocab.org/geometry# lgd http://linkedgeodata.org/triplify/ lgd-geom http://linkedgeodata.org/geometry/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 113 / 252
  119. 119. SML - Mapping Example I: The Goal (1/4) Input Table nodes id geom 1 POINT(0 0) 2 POINT(1 1) How to map tables to RDF? How to introduce the commonly used distinction in GIS between feature and geometry? Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 114 / 252
  120. 120. SML - Mapping Example I: SML Syntax Outline (2/4) Input Table nodes id geom 1 POINT(0 0) 2 POINT(1 1) Create View myNodesView As Construct { ... } With ... From ... Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 115 / 252
  121. 121. SML - Mapping Example I: Construct and From (3/4) Input Table nodes id geom 1 POINT(0 0) 2 POINT(1 1) Create View myNodesView As Construct { ?n geom:geometry ?g . ?g ogc:asWKT ?o } With ... From nodes Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 116 / 252
  122. 122. SML - Mapping Example I: Complete! (4/4) Input Table nodes id geom 1 POINT(0 0) 2 POINT(1 1) Create View myNodesView As Construct { ?n geom:geometry ?g . ?g ogc:asWKT ?o } With ?n = uri(lgd:node, ?id) ?g = uri(lgd-geom:node, ?id) ?o = typedLiteral(?geom, ogc:wktLiteral) From nodes Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 117 / 252
  123. 123. SML Mapping Examples A more complex example, which demonstrates the use of an SQL mapping table and an SQL helper view. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 118 / 252
  124. 124. SML - Mapping Example II: The Goal (1/8) Input Table node_tags id k v 1 name Universitaet Leipzig 1 name:en University of Leipzig 1 amenity university 1 addr:street Augustusplatz 1 addr:city Leipzig Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix lgd: http://linkedgeodata.org/triplify/ . lgd:node1 rdfs:label Universitaet Leipzig . lgd:node1 rdfs:label University of Leipzig@en . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 119 / 252
  125. 125. SML - Mapping Example II: Source Data (2/8) OSM Table node_tags id k v 1 name Universitaet Leipzig 1 name:en University of Leipzig 1 amenity university 1 addr:street Augustusplatz 1 addr:city Leipzig Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 120 / 252
  126. 126. SML - Mapping Example II: Mapping Table (3/8) OSM Table RDF Mapping Table node_tags id k v 1 name Universitaet Leipzig 1 name:en University of Leipzig 1 amenity university 1 addr:street Augustusplatz 1 addr:city Leipzig lgd_map_literal k property lang name rdfs:label name:en rdfs:label en alt_label skos:altLabel note rdfs:comment . . . . . . . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 121 / 252
  127. 127. SML - Mapping Example II: Helper View (4/8) OSM Table RDF Mapping Table node_tags id k v 1 name Universitaet Leipzig 1 name:en University of Leipzig 1 amenity university 1 addr:street Augustusplatz 1 addr:city Leipzig lgd_map_literal k property lang name rdfs:label name:en rdfs:label en alt_label skos:altLabel note rdfs:comment . . . . . . . . . Helper View lgd_node_tags_literal id property v lang 1 rdfs:label Universitaet Leipzig 1 rdfs:label University of Leipzig en . . . . . . . . . . . . SELECT id, property, v, lang FROM node_tags, lgd_map_literal WHERE node_tags.k = lgd_map_literal.k Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 122 / 252
  128. 128. SML - Mapping Example II: SML View (5/8) Logical Table SML View lgd_node_tags_literal id property v lang 1 rdfs:label Univ. L. 1 rdfs:label Univ. of L. en . . . . . . . . . . . . Create View lgd_node_tags_text As Construct { Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 123 / 252
  129. 129. SML - Mapping Example II: SML View (6/8) Logical Table SML View lgd_node_tags_literal id property v lang 1 rdfs:label Univ. L. 1 rdfs:label Univ. of L. en . . . . . . . . . . . . Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ... From lgd_node_tags_literal Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 124 / 252
  130. 130. SML - Mapping Example II: SML View (7/8) Logical Table SML View lgd_node_tags_literal id property v lang 1 rdfs:label Univ. L. 1 rdfs:label Univ. of L. en . . . . . . . . . . . . Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ?s = uri(lgd:node, ?id) ?p = uri(?property) ?o = plainLiteral(?v, ?lang) From lgd_node_tags_literal Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 125 / 252
  131. 131. SML - Mapping Example II: SML View (8/8) Logical Table SML View + lgd_node_tags_literal id property v lang 1 rdfs:label Univ. L. 1 rdfs:label Univ. of L. en . . . . . . . . . . . . Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ?s = uri(lgd:node, ?id) ?p = uri(?property) ?o = plainLiteral(?v, ?lang) From lgd_node_tags_literal Resulting RDF @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix lgd: http://linkedgeodata.org/triplify/ . lgd:node1 rdfs:label Universitaet Leipzig . lgd:node1 rdfs:label University of Leipzig@en . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 126 / 252
  132. 132. Further Tag Mappings lgd_map_dataype k datatype seats integer unisex boolean lgd_map_property k property website foaf:homepage lgd_map_resource_k k property object highway rdf:type lgdo:HighwayThing lgd_map_resource_kv k v property object waterway river rdf:type lgdo:River Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 127 / 252
  133. 133. LGD Edit Tool Multi User Tag Mapping WebApp Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 128 / 252
  134. 134. Resources Sparqlify http://sparqlify.org LinkedGeoData http://linkedgeodata.org Tag Mappings https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/ org/aksw/linkedgeodata/sql/Mappings.sql SML View Denitions https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/ org/aksw/linkedgeodata/sml/LinkedGeoData-Triplify-IndividualViews.sml Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 129 / 252
  135. 135. Statistics (15 August 2013) Complete OSM planet le corresponds to ∼ 20.000.000.000 triples Virtual access via Sparqlify Downloads limited to selected classes. 292.780.188 Triples 153.613.243 triples of Nodes 139.166.945 triples of Ways Relations not yet available for download Among them 532.812 PlaceOfWorship 82.788 RailwayStation 72.091 Toilets 71.613 Town 19.937 City Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 130 / 252
  136. 136. Access Materialized Sparql Endpoint (based on Virtuoso DB, download datasets loaded) http://linkedgeodata.org/sparql http://linkedgeodata.org/snorql Virtual Sparql Endpoint (based on Sparqlify, access to 20B triples, limited SPARQL 1.0 support) http://linkedgeodata.org/vsparql http://linkedgeodata.org/vsnorql Rest Interface (based on the Virtual Sparql Endpoint) Supports limited queries (e.g. circular/rectangular area, ltering by labels) Downloads http://downloads.linkedgeodata.org Monthly updates on the above datasets envisioned Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 131 / 252
  137. 137. Use Cases Augmented Reality Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 132 / 252
  138. 138. Use Cases Generic Browsing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 133 / 252
  139. 139. Use Cases Generic Browsing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 134 / 252
  140. 140. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 135 / 252
  141. 141. Why Link Discovery? 1 Fourth Linked Data principle 2 Links are central for Cross-ontology QA Data Integration Reasoning Federated Queries ... 3 2011 topology of the LOD Cloud: 31+ billion triples ≈ 0.5 billion links owl:sameAs in most cases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 136 / 252
  142. 142. Why is it dicult? 1 Time complexity Large number of triples Quadratic a-priori runtime 69 days for mapping cities from DBpedia to Geonames (1ms per comparison) decades for linking DBpedia and LGD . . . Denition (Link Discovery) Given sets S and T of resources and relation R Task: Find M = {(s, t) ∈ S × T : R(s, t)} Common approaches: Find M = {(s, t) ∈ S × T : σ(s, t) ≥ θ} Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 137 / 252
  143. 143. Why is it dicult? 2 Complexity of specications Combination of several attributes required for high precision Tedious discovery of most adequate mapping Dataset-dependent similarity functions Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 138 / 252
  144. 144. LIMES Framework Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 139 / 252
  145. 145. Runtime Optimization Reduce the number of comparisons C (A) ≥ |M | (assuming we need all σ/θ values for links) Maximize reduction ratio: RR(A) = 1 − C (A) |S||T | Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 140 / 252
  146. 146. Runtime Optimization Reduce the number of comparisons C (A) ≥ |M | (assuming we need all σ/θ values for links) Maximize reduction ratio: RR(A) = 1 − C (A) |S||T | Question Can we devise lossless approaches with guaranteed RR? Advantages Space management Runtime prediction Resource scheduling Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 140 / 252
  147. 147. RR Guarantee Best achievable reduction ratio: RRmax = 1 − |M | |S||T | Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 141 / 252
  148. 148. RR Guarantee Best achievable reduction ratio: RRmax = 1 − |M | |S||T | Approach H(α) fullls RR guarantee criterion, i: ∀r RRmax, ∃α : RR(H(α)) ≥ r Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 141 / 252
  149. 149. RR Guarantee Best achievable reduction ratio: RRmax = 1 − |M | |S||T | Approach H(α) fullls RR guarantee criterion, i: ∀r RRmax, ∃α : RR(H(α)) ≥ r Here, we use relative reduction ratio (RRR): RRR(A) = RRmax RR(A) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 141 / 252
  150. 150. Goal Formal Goal Devise H(α) : ∀r 1, ∃α : RRR(H(α)) ≤ r Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 142 / 252
  151. 151. Restrictions Minkowski Distance δ(s, t) = p n i=1 |si − ti |p , p ≥ 2 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 143 / 252
  152. 152. Space Tiling HYPPO δ(s, t) ≤ θ describes a hypersphere Approximate hypersphere by using a hypercube Easy to compute No loss of recall (blocking) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 144 / 252
  153. 153. Space Tiling Set width of single hypercube to ∆ = θ/α Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 145 / 252
  154. 154. Space Tiling Set width of single hypercube to ∆ = θ/α Tile Ω = S ∪ T into the adjacent cubes C Coordinates: (c1, . . . , cn ) ∈ Nn Contains points ω ∈ Ω : ∀i ∈ {1 . . . n}, ci ∆ ≤ ωi (ci + 1)∆ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 145 / 252
  155. 155. Space Tiling Set width of single hypercube to ∆ = θ/α Tile Ω = S ∪ T into the adjacent cubes C Coordinates: (c1, . . . , cn ) ∈ Nn Contains points ω ∈ Ω : ∀i ∈ {1 . . . n}, ci ∆ ≤ ωi (ci + 1)∆ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 145 / 252
  156. 156. HYPPO Combine (2α + 1)n hypercubes around C (ω) to approximate hypersphere RRR(HYPPO(α)) = (2α+1)n αnS(n) lim α→∞ RRR(HYPPO(α)) = 2n S(n) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 146 / 252
  157. 157. HYPPO RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 147 / 252
  158. 158. HYPPO RRR(HYPPO) for p = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50 lim α→∞ RRR(HYPPO(α)) = 4 π ≈ 1.27 (n = 2) lim α→∞ RRR(HYPPO(α)) = 6 π ≈ 1.91 (n = 3) lim α→∞ RRR(HYPPO(α)) = 32 π2 ≈ 3.24 (n = 4) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 147 / 252
  159. 159. HR3 : Idea index(C , ω) =    0 if ∃i : |ci − c(ω)i | ≤ 1, 1 ≤ i ≤ n, n i=1 (|ci − c(ω)i | − 1)p else, Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 148 / 252
  160. 160. HR3 : Idea Compare C (ω) with C i index(C , ω) ≤ αp α = 4, p = 2 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 149 / 252
  161. 161. HR3 : Idea Lemma ∀s ∈ S : index(C , s) αp implies that all t ∈ C are non-matches Claims No loss of recall lim α→∞ RRR(HR3(α)) = 1 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 150 / 252
  162. 162. HR3 : Lemma 3 Lemma ∀α 1 RRR(HR3(2α)) RRR(HR3(α)) p = 2, α = 4 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 151 / 252
  163. 163. HR3 : Proof Lemma ∀α 1 RRR(HR 3(2α)) RRR(HR 3(α)) p = 2, α = 8 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 152 / 252
  164. 164. HR3 : Proof Lemma ∀α 1 RRR(HR 3(2α)) RRR(HR 3(α)) p = 2, α = 25 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 153 / 252
  165. 165. HR3 : Proof Lemma ∀α 1 RRR(HR 3(2α)) RRR(HR 3(α)) p = 2, α = 50 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 154 / 252
  166. 166. HR3 : Idea Theorem lim α→∞ RRR(HR3(α)) = 1 Claims No loss of recall lim α→∞ RRR(HR3(α)) = 1 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 155 / 252
  167. 167. HR3 : Experiments Compare HR3 with LIMES 0.5's HYPPO and SILK 2.5.1 Experimental Setup: Deduplicating DBpedia places by minimum elevation, elevation and maximum elevation (θ = 49m, 99m). Geonames and LinkedGeoData by longitude and latitude (θ = 1 ◦ , 9 ◦ ) 64-bit computer with a 2.8GHz i7 processor with 8GB RAM. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 156 / 252
  168. 168. HR3 : Experiments (Comparisons) Experiment 2: Deduplicating DBpedia places, θ = 99m 0.64 × 10 6 less comparisons Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 157 / 252
  169. 169. HR3 : Experiments (Comparisons) Experiment 4: Linking Geonames and LinkedGeoData, θ = 9 ◦ 4.3 × 10 6 less comparisons Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 158 / 252
  170. 170. HR3 : Experiments (Runtime) Experiment 1, 2: DBpedia, θ = 49, 99m Experiment 3, 4: Geonames and LGD, θ = 1, 9 ◦ Exp. 1 Exp. 2 Exp. 3 Exp. 4 10 0 10 1 10 2 10 3 10 4 Runtime(s) HR3 HYPPO SILK Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 159 / 252
  171. 171. HR3 : Summary Mission New category of algorithms for link discovery Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 160 / 252
  172. 172. HR3 : Summary Mission New category of algorithms for link discovery Presented HR3 Link discovery in ane spaces with Minkowski measures Outperforms the state of the art (runtime, comparisons) Optimal reduction ratio Integrated in LIMES Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 160 / 252
  173. 173. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  174. 174. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  175. 175. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  176. 176. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  177. 177. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  178. 178. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  179. 179. Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  180. 180. Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Question Can we do better by using more information? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  181. 181. Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Question Can we do better by using more information? Higher F-measure Often slower Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  182. 182. Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  183. 183. Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  184. 184. Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  185. 185. Similarity of Candidates Link candidate x = (s, t) can be regarded as vector (σ1(x), . . . , σn(x)) ∈ [0, 1]n . Similarity of link candidates x and y : sim(x, y ) = 1 1 + n i=1 (σi (x) − σi (y ))2 . (1) Allows exploiting both intra- and inter-class similarity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 165 / 252
  186. 186. Graph Clustering Rationale: Use intra-class similarity Approach Cluster elements of S + and S − independently Choose one element per cluster as representative Present oracle with most informative representatives 0.8 0.9 0.8 S+ S- 0.8 0.9 0.8 0.25 0.25 0.9 0.8 0.8 0.8 0.25 a b c d e d f g h i k l Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 166 / 252
  187. 187. BorderFlow G = (V , E , ω) with V = S + or V = S − ω(x, y ) = sim(x, y ) Keep best ec edges for each x ∈ V Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 167 / 252
  188. 188. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X ) Ω(b(X ),n(X )) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 168 / 252
  189. 189. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X ) Ω(b(X ),n(X )) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 168 / 252
  190. 190. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X ) Ω(b(X ),n(X )) http://sourceforge.net/projects/cugar-framework/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 168 / 252
  191. 191. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X ) Ω(b(X ),n(X )) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 169 / 252
  192. 192. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b(X ),X ) Ω(b(X ),n(X )) http://sourceforge.net/projects/cugar-framework/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 169 / 252
  193. 193. Conclusion Can be combined with arbitrary active learning ML algorithms Was experimentally combined with EAGLE (genetic programming) and RAVEN (linear classier) and shown to outperform the plain informativeness function in terms of F-measure Choice of example important to minimise user eort Contact me for detailed experimental results Longer runtimes (up to 2×) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 170 / 252
  194. 194. Summary Linking crucial task in the web of data Tow key problems 1 Ecient execution of link specications 2 Creation of link specication Presented HR3 to handle the rst problem Presented COALA as building block for the second problem Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 171 / 252
  195. 195. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 172 / 252
  196. 196. Motivation rise in the availability and usage of knowledge bases still a lack of knowledge bases that consist of sophisticated schema information and instance data adhering to this schema e.g. in the life sciences several knowledge bases only consist of schema information to a large extent, a collection of facts without a clear structure (e.g. information extracted from databases) combination of sophisticated schema and instance data would allow powerful reasoning, consistency checking, and improved querying → create schemata based on existing data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 173 / 252
  197. 197. Example d b r : B r a d _ P i t t : b i r t h P l a c e d b r : Shawnee , _Oklahoma ; a : P e r s o n . d b r : Angela_Merkel : b i r t h P l a c e d b r : Hamburg ; a : P e r s o n . d b r : A l b e r t _ E i n s t e i n : b i r t h P l a c e d b r : Ulm ; a : P e r s o n . d b r : Shawnee , _Oklahoma a : P l a c e . d b r : Ulm a : P l a c e . d b r : Hamburg a : P l a c e . Suggestions: birthPlace O b j e c t P r o p e r t y : b i r t h P l a c e C h a r a c t e r i s t i c s : F u n c t i o n a l Domain : P e r s o n Range : P l a c e S u b P r o p e r t y O f : hasBeenAt Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 174 / 252
  198. 198. Benets of an expressive schema Axioms serve as documentation for the purpose and correct usage of schema elements Additional implicit information can be inferred Improve querying optimisations Improve/allow the application of schema debugging techniques Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 175 / 252
  199. 199. Each person was only born at one place?! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 176 / 252
  200. 200. birthPlace birthPlace
  201. 201. birthPlace birthPlace =
  202. 202. birthPlace birthPlace = birthPlace is functional
  203. 203. birthPlace birthPlace = birthPlace is functional
  204. 204. birthPlace birthPlace = birthPlace is functional SELECT ? s WHERE { ? s dbo : b i r t h P l a c e ?o1 . ? s dbo : b i r t h P l a c e ?o2 . FILTER(? o1 != ?o2 )} } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 177 / 252
  205. 205. Where was Julia Nannie Wallace born? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 178 / 252
  206. 206. Julia Nannie Wallace was born in Lacrosse? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 179 / 252
  207. 207. No, Julia Nannie Wallace was born in La Crosse! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 180 / 252
  208. 208. birthPlace
  209. 209. birthPlace Sport rdf:type
  210. 210. birthPlace Sport rdf:type birthPlace range Place
  211. 211. birthPlace Sport rdf:type birthPlace range Place Place rdf:type
  212. 212. birthPlace Sport rdf:type birthPlace range Place Place rdf:type Place disjointWith Sport =
  213. 213. birthPlace Sport rdf:type birthPlace range Place Place rdf:type Place disjointWith Sport =
  214. 214. birthPlace rdf:type birthPlace range Place Place rdf:type Place disjointWith Sport City
  215. 215. birthPlace rdf:type birthPlace range Place Place rdf:type Place disjointWith Sport City SELECT ? s ? place WHERE { ? s dbo : b i r t h P l a c e ? place . ? place rdf : type / r d f s : subClassOf ∗ ? type1 . ? type2 r d f s : subClassOf ∗ dbo : Place . ? type1 owl : d i s j o i n t W i t h ? type2 . } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 181 / 252
  216. 216. 3 Steps to get a schema SPARQL Endpoint Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) 3-Phase Enrichment Learning Approach: Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 182 / 252
  217. 217. 3 Steps to get a schema 1. obtain schema information SPARQL Endpoint Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) Background Knowledge 3-Phase Enrichment Learning Approach: (onlyexecutedonce perknowledgebase) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 183 / 252
  218. 218. 3 Steps to get a schema 1. obtain schema information Reasoner SPARQL Endpoint Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) Background Knowledge Background Knowledge + Relevant Instance Data (optional invocation) 2. obtain axiom type and entity specific data 3-Phase Enrichment Learning Approach: (onlyexecutedonce perknowledgebase) (sampledata ifnecessary) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 184 / 252
  219. 219. 3 Steps to get a schema 1. obtain schema information Reasoner SPARQL Endpoint Enrichment Ontology Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) Background Knowledge Background Knowledge + Relevant Instance Data List of Axiom Suggestions + Metadata (optional invocation) 2. obtain axiom type and entity specific data 3. run machine learning algorithm 3-Phase Enrichment Learning Approach: (onlyexecutedonce perknowledgebase) (sampledata ifnecessary) Learner DL-Learner Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 185 / 252
  220. 220. 3 Steps to get a schema 1. obtain schema information Reasoner SPARQL Endpoint Enrichment Ontology Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) Background Knowledge Background Knowledge + Relevant Instance Data List of Axiom Suggestions + Metadata (optional invocation) 2. obtain axiom type and entity specific data 3. run machine learning algorithm 3-Phase Enrichment Learning Approach: (onlyexecutedonce perknowledgebase) iterate over all axiom types and schema entities for full enrichment (sampledata ifnecessary) Learner DL-Learner Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 186 / 252
  221. 221. Starting Point SPARQL endpoint: http://dbpedia.org/sparql Entity URI: http://dbpedia.org/ontology/author Axiom Type: Object Property Domain Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 187 / 252
  222. 222. Step 1 - Obtaining Schema Information Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
  223. 223. Step 1 - Obtaining Schema Information CONSTRUCT WHERE { ?sub r d f s : subClassOf ?sup . } ORDER BY DESC(? sub ) LIMIT 1000 OFFSET 1000 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
  224. 224. Step 1 - Obtaining Schema Information CONSTRUCT WHERE { ?sub r d f s : subClassOf ?sup . } ORDER BY DESC(? sub ) LIMIT 1000 OFFSET 1000 dbo : Disease r d f s : subClassOf owl : Thing . dbo : Book r d f s : subClassOf dbo : WrittenWork . dbo : WrittenWork r d f s : subClassOf dbo : Work . dbo : Work r d f s : subClassOf owl : Thing . dbo : Philosopher r d f s : subClassOf dbo : Person . dbo : Person r d f s : subClassOf dbo : Agent . dbo : Agent r d f s : subClassOf owl : Thing . dbo : Sport r d f s : subClassOf dbo : A c t i v i t y . dbo : A c t i v i t y r d f s : subClassOf owl : Thing . dbo : Fish r d f s : subClassOf dbo : Animal . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
  225. 225. Step 2 - Obtain axiom type and entity specic data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  226. 226. Step 2 - Obtain axiom type and entity specic data SELECT ? type (COUNT(DISTINCT ? s ) AS ? cnt ) WHERE { ? s dbo : author ?o . ? s a ? type . } GROUP BY ? type ORDER BY DESC(? cnt ) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  227. 227. Step 2 - Obtain axiom type and entity specic data SELECT ? type (COUNT(DISTINCT ? s ) AS ? cnt ) WHERE { ? s dbo : author ?o . ? s a ? type . } GROUP BY ? type ORDER BY DESC(? cnt ) type cnt owl:Thing 30284 dbo:Work 30284 schema:CreativeWork 30284 dbo:WrittenWork 25730 dbo:Book 24673 schema:Book 24673 dbo:TelevisionShow 2567 dbo:Play 1057 . . . . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  228. 228. Step 2 - Obtain axiom type and entity specic data CONSTRUCT WHERE { ? ind dbo : author ?o . ? ind a ? type . } ORDER BY DESC(? ind ) LIMIT 1000 OFFSET 2000 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  229. 229. Step 2 - Obtain axiom type and entity specic data CONSTRUCT WHERE { ? ind dbo : author ?o . ? ind a ? type . } ORDER BY DESC(? ind ) LIMIT 1000 OFFSET 2000 ... dbpedia : The_Adventures_of_Tom_Sawyer dbo : author dbpedia : Mark_Twain ; rdf : type dbo : Book . dbpedia : The_Zombie_Survival_Guide dbo : author dbpedia : Max_Brooks ; rdf : type dbo : WrittenWork . dbpedia : Web_Therapy dbo : author dbpedia : Lisa_Kudrow ; rdf : type dbo : Book . ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  230. 230. Step 3 - Scoring dbpedia : The_Adventures_of_Tom_Sawyer dbo : author dbpedia : Mark_Twain ; r d f : type dbo : Book . dbpedia : The_Zombie_Survival_Guide dbo : author dbpedia : Max_Brooks ; r d f : type dbo : WrittenWork . dbpedia : Web_Therapy dbo : author dbpedia : Lisa_Kudrow ; r d f : type dbo : Book . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  231. 231. Step 3 - Scoring dbpedia : The_Adventures_of_Tom_Sawyer dbo : author dbpedia : Mark_Twain ; r d f : type dbo : Book . dbpedia : The_Zombie_Survival_Guide dbo : author dbpedia : Max_Brooks ; r d f : type dbo : WrittenWork . dbpedia : Web_Therapy dbo : author dbpedia : Lisa_Kudrow ; r d f : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= 1 3 ≈ 33.3% Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  232. 232. Step 3 - Scoring dbpedia : The_Adventures_of_Tom_Sawyer dbo : author dbpedia : Mark_Twain ; r d f : type dbo : Book . dbpedia : The_Zombie_Survival_Guide dbo : author dbpedia : Max_Brooks ; r d f : type dbo : WrittenWork . dbpedia : Web_Therapy dbo : author dbpedia : Lisa_Kudrow ; r d f : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= 1 3 ≈ 33.3% dbo : Book r d f s : subClassOf dbo : WrittenWork . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  233. 233. Step 3 - Scoring dbpedia : The_Adventures_of_Tom_Sawyer dbo : author dbpedia : Mark_Twain ; r d f : type dbo : Book . dbpedia : The_Zombie_Survival_Guide dbo : author dbpedia : Max_Brooks ; r d f : type dbo : WrittenWork . dbpedia : Web_Therapy dbo : author dbpedia : Lisa_Kudrow ; r d f : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= 1 3 ≈ 33.3% dbo : Book r d f s : subClassOf dbo : WrittenWork . Score(Domain(dbo:author, dbo:WrittenWork))= 3 3 = 100% Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  234. 234. Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  235. 235. Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Solution: Average of 95% condence interval (Wald method) p = s+2 m+4 s − #success m − #total min(1, p + 1.96 · p ·(1−p ) m+4 ) max(0, p − 1.96 · p ·(1−p ) m+4 ) In 95% of the intervals the true value is between ... and ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  236. 236. Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Solution: Average of 95% condence interval (Wald method) p = s+2 m+4 s − #success m − #total min(1, p + 1.96 · p ·(1−p ) m+4 ) max(0, p − 1.96 · p ·(1−p ) m+4 ) In 95% of the intervals the true value is between ... and ... Score(Domain(dbo:author, dbo:Book))≈ 57.3% Score(Domain(dbo:author, dbo:WrittenWork))≈ 69.1% Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  237. 237. More Complex Axioms Pattern Based Knowledge Base Enrichment, ISWC 2013 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 192 / 252
  238. 238. Outlook and Summary Schema in the Linked Data Web often shallow → tools needed to support knowledge engineers Showed some techniques for learning OWL axioms on large knowledge bases available as SPARQL endpoints More complex aioms require: OWL-SPARQL rewriting or Fragment extraction Small- and medium sized knowledge bases can be handled via techniques from Inductive Logic Programming All algorithms implemented in DL-Learner framework (http://dl-learner.org) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 193 / 252
  239. 239. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 194 / 252
  240. 240. Motivation increasing number of knowledge bases in the Semantic Web (see e.g. LOD cloud) maintenance of knowledge bases with expressive semantics is challenging Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 195 / 252
  241. 241. (Automatically) Detectable Ontology Problems Common problems: Syntactic Problems Structural Problems Semantic Problems (focus of talk) Task Based Problems: Reasoning Related Problems Linked Data Related Problems Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 196 / 252
  242. 242. Syntactic Problems Syntactic errors are mainly violations of conventions of the language in which the ontology is modelled. Example (Validity of XML) ? xml v e r s i o n= 1 . 0 ? r d f : R D F x m l n s : r d f= h t t p : //www. w3 . o r g /1999/02/22 − r d f − s y n t a x −n s# x m l n s : d c= h t t p : // p u r l . o r g / dc / e l e m e n t s / 1 . 1 / r d f : D e s c r i p t i o n r d f : a b o u t= h t t p : //www. w3 . o r g / d c : t i t l eWorld Wide Web C o n s o r t i u m/ d c : t i t l e / r d f : R D F FatalError: The element type rdf:Description must be terminated by the matching end-tag /rdf:Description.[Line = 7, Column = 3] Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 197 / 252
  243. 243. Structural Problems Problems in the taxonomy Example (Circularities) A B, B C , C A Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 198 / 252
  244. 244. Reasoning Related Problems Problems which negatively aect the performance of reasoning over expressive knowledge bases Example (A named concept is equivalent to an AllValues restriction) A ≡ ∀r .C Reasoning complexity: Universal restriction does not require to have a property value but only restricts the values for existing property values Any concept B for which instances cannot have r -llers satises the restriction, i.e. B ∀r .C , and becomes a subclass of A Typically leads to unintended inferences and additional inferences may eventually slow down reasoning performance Can be checked via Pellint (part of Pellet) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 199 / 252
  245. 245. Linked Data Related Problems Problems which are the specic to publishing RDF using the Linked Data principles Incorrect implementation of content negotiation Mixing up information and non-information resources Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 200 / 252
  246. 246. Semantic Problems Logical contradictions in the underlying knowledge base Example (Unsatisable classes) O = {A B C , C ¬B} |= A ⊥ Example (Inconsistent ontology) O = {A B C , C ¬B, A(x)} |= ⊥ Usually handled by Ontology Debugging Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 201 / 252
  247. 247. Ontology Debugging Problem: We have undesirable entailments Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  248. 248. Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  249. 249. Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Question: Which axioms? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  250. 250. Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Question: Which axioms? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  251. 251. Justication Justication For an ontology O and an entailment η where O |= η, a set of axioms J is a justication for η in O if J ⊆ O, J |= η and if J ⊂ J then J |= η. Minimal subsets of an ontology that are sucient for a given entailment to hold Synonyms: MUPS (Minimal Unsatisability Preserving Sub-TBoxes), MinAs (Minimal Axiom sets), Kernels Observations: there can be multiple justications for a single entailment an axiom can be part of multiple justications Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 203 / 252
  252. 252. Justication - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  253. 253. Justication - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  254. 254. Justication - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  255. 255. Justication - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  256. 256. Justication - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  257. 257. Justication Based Repair For a repair, at least one axiom from every justication needs to be removed. For a repair plan, all justications are needed. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 205 / 252
  258. 258. Justication Algorithms Single justication: Glass Box: Modifying underlying reasoning algorithm (tableau tracing) Black-Box: Using reasoner as oracle All justications: Reiter's Hitting Set Tree Algorithm (HST) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 206 / 252
  259. 259. Black-Box Expansion-Contraction Strategy Expansion: Add axioms to empty set until entailment holds Contraction: Remove axioms from set such that set becomes minimal and entailment still can be derived. CHAPTER 3. COMPUTING JUSTIFICATIONS 54 Expansion Contraction Axiom Axiom in justification Selected axiom Key: Figure 3.1: A Depiction of a Black-Box Expand-Contract Strategy 3.2 Black-Box Algorithms for Computing Sin- Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 207 / 252 Source: M. Horridge:Justication Based Explanation in Ontologies(PhD Thesis)
  260. 260. Hitting Set Tree Algorithm from eld of Model Based Diagnosis given a faulty system (ontology), it constructs nite tree whose nodes are labelled with conict sets (justications), and whose edges are labelled with components (axioms) nds all minimal hitting sets, which represent diagnoses for the conict sets in the system diagnosis = repair Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 208 / 252
  261. 261. Hitting Set Tree Algorithm - Example CHAPTER 3. COMPUTING JUSTIFICATIONS 63 Figure 3.2: An Example of a Hitting Set Tree J1 = {A B, B D} A B A ∃R.C {} B D {} A ∃R.C {} {} J2 = {A ∃R.C, ∃R. D} ∃R. D∃R. D J2 = {A ∃R.C, ∃R. D} bottom right hand successor to the node labelled with J2 and whose successor edge is labelled with ∃R. D was generated by considering O S whereLehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 209 / 252 O = {A B B D A ∃R.C ∃R. D} |= A D Source: M. Horridge:Justication Based Explanation in Ontologies(PhD Thesis)
  262. 262. Justication Scenarios A user can be faced with the following situations: Small number of small justications Easy and pleasant to inspect Small number of large justications Better than alternatives Large number of justications Pretty hopeless with current mechanisms Idea: Find source of unsatisability Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 210 / 252
  263. 263. Root Unsatisability - Denitions A root UC is a class whose unsatisability does not depend on another class, otherwise it is a derived UC. A derived UC for which there is some justication that is not a strict superset of a justication for another UC is a partial derived UC. Root Unsatisable Class A class A is a root unsatisable class if there is no justication J |= A ⊥ such that J is a strict superset of a justication for some other unsatisable class. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 211 / 252
  264. 264. Root Unsatisability - Approaches Approaches: 1: compute all justications for each unsatisable class and apply the denition → computationally often too expensive 2: heuristics for structural analysis of axioms Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 212 / 252 Debugging Unsatisable Classes in OWL Ontologies, Kalyanpur, Parsia, Sirin, Hendler, J. Web Sem, 2005.
  265. 265. Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  266. 266. Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  267. 267. Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  268. 268. Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} |= B ⊥ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  269. 269. Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} |= B ⊥ J4 = {1, 2} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  270. 270. Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} |= B ⊥ J4 = {1, 2} root Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  271. 271. Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} |= B ⊥ J4 = {1, 2} root partial Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  272. 272. Root Unsatisability - Example O = { B ∃r .D (1) B ∀r .¬D (2) A B C (3) B ¬C (4) A E (5) A ¬E F (6) } |= A ⊥ J1 = {1, 2, 3} J2 = {5, 6} J3 = {3, 4} |= B ⊥ J4 = {1, 2} root partial (J4 ⊂ J1) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  273. 273. Axiom Relevance resolving justication requires to delete or edit axioms ranking methods highlight the most probable causes for problems methods: frequency syntactic relevance semantic relevance Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 214 / 252
  274. 274. Repair Consequences after repairing process, axioms have been deleted or modied → desired entailments may be lost or new entailments obtained (including inconsistencies!) → user can decide to preserve them Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 215 / 252
  275. 275. SPARQL Endpoint Support Previously mentioned approaches are implemented in the ORE tool (http://ore-tool.net) ORE supports using SPARQL endpoints implements an incremental load procedure knowledge base is loaded in small chunks: count number of axioms by type priority based loading procedure e.g. disjointness axioms have higher priority than class assertion axioms uses Pellet incremental reasoning Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 216 / 252 Learning of OWL Class Descriptions on Very Large Knowledge Bases, Hellmann, Lehmann, Auer, Int. Journal Semantic Web Inf. Syst, 2009
  276. 276. SPARQL Endpoint Support II algorithm performs sanity checks, e.g. SPARQL queries which probe for typical inconsistent axiom sets can fetch additional Linked Data dierent termination criteria Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 217 / 252
  277. 277. SPARQL Endpoint Support II algorithm performs sanity checks, e.g. SPARQL queries which probe for typical inconsistent axiom sets can fetch additional Linked Data dierent termination criteria overall: ORE allows to apply state-of-the-art ontology debugging methods on a larger scale than was possible previously aims at stronger support for the web aspect of the Semantic Web and the high popularity of Web of Data initiative Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 217 / 252
  278. 278. DBpedia Live Demo Inconsistency in DBpedia Live: Individual: dbr:Purify_(album) Facts: dbo:artist dbr:Axis_of_Advance Individual: dbr:Axis_of_Advance Types: dbo:Organisation Class: dbo:Organisation DisjointWith dbo:Person ObjectProperty: dbo:artist Range: dbo:Person Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 218 / 252
  279. 279. DBpedia Live Demo 2 Inconsistency in DBpedia in combination with WGS84 (Linked Data): Individual: dbr:WKWS Facts: geo:long -81.76833343505859 Types: dbo:Organisation DataProperty: geo:long Domain: geo:SpatialThing Class: dbo:Organisation DisjointWith: geo:SpatialThing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 219 / 252
  280. 280. OpenCyc Demo Inconsistency in OpenCyc: Individual: 'PopulatedPlace' Types: 'ArtifactualFeatureType', 'ExistingStuffType' Class: 'ArtifactualFeatureType' SubClassOf: 'ExistingObjectType' Class: 'ExistingObjectType' DisjointWith: 'ExistingStuffType' Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 220 / 252
  281. 281. ORE - Screenshot Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 221 / 252
  282. 282. ORE - Screenshot Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 222 / 252
  283. 283. Related Tools Swoop can compute justications for unsatisability of classes and oers repair mode ne-grained justication computation algorithm is incomplete can also compute justications for an inconsistent ontology, but does not oer repair mode in this case does not extract locality-based modules, which leads to lower performance for large ontologies RaDON plugin for the NeOn toolkit oers a number of techniques for working with inconsistent or incoherent ontologies allows to reason with inconsistent ontologies and can handle sets of ontologies (ontology networks) no ne-grained justications, no repair impact analysis Pellint searches for common patterns which lead to potential reasoning performance problems integration in ORE planned Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 223 / 252
  284. 284. Related Tools II PION and DION developed in the SEKT project to deal with inconsistencies PION is an inconsistency tolerant reasoner (four-valued paraconsistent logic) DION oers the possibility to compute justications, but no repair Explanation Workbench Protégé plugin for reasoner requests like class unsatisability or inferred subsumption relations can compute regular and laconic justications motivated the ORE debugging interface current version of Explanation Workbench does not allow to remove axioms in laconic justications RepairTab supports the user in nding and detecting errors in ontologies RepairTab uses a modied tableau algorithm shows inferences which can no longer be drawn after removing an axiom (inspired ORE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 224 / 252
  285. 285. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking 6 Enrichment 7 Repair 8 Knowledge Base Exploration / Querying Interlinking / Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ Authoring Linked Data Lifecycle Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 225 / 252
  286. 286. Motivation User Query Interfaces: Knowledge Base Specic Interfaces Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 226 / 252
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×