The web of interlinked data and knowledge stripped
Linked Data for Enterprise Information
Integration
Dr. Sören Auer
Creating Knowledge
out of Interlinked Data
Web
server
Web
server
Problem: Try to search for these things on the current Web:
• Apartments near German-English bilingual childcare in Passau
• ERP service providers with offices in Vienna and London
• Researchers working on multimedia topics in Eastern Europe
Information is available on the Web, but opaque to current search.
Why do we need the Data Web?
passau.de
Has everything about
childcare in Passau.
Immobilienscout.de
Knows all about real estate
offers in GermanyDB
Web
server
DB
Web
server
Search engineHTML HTML
RDF
RDF
Solution: complement text on Web pages with structured linked
open data & intelligently combine/integrate/join such structured
information from different sources:
Creating Knowledge
out of Interlinked Data
1. Uses RDF Data Model
Linked Data in a Nutshell
KESW2012
St. Petersburg
1.10.2012
IFMO
organizes
starts
takesPlaceIn
2. Is serialised in triples:
IFMO organizes KESW2012 .
KESW2012 starts “20121001”^^xsd:date .
KESW2012 takesPlaceAt St._Petersburg .
3. Uses Content-negotiation
Subject Predicate Object
The emerging Web of Data
20082007
2008
2008
2008
2009
2009
2010
Linking Open Data cloud diagram, by
Richard Cyganiak and Anja Jentzsch.
Creating Knowledge
out of Interlinked Data
The situation at a world leading car manufacturer (€97.76 billion
revenue, 250.000 employees):
• 3.000 heterogeneous IT systems
• Different units (car, bus, truck etc.) with very different views
• No common language
• Inability to identify crucial entities (parts, locations etc.)
enterprise wide
There is no (can not be a) single Enterprise Information Model
A distributed, iterative, bottom-up integration approach such as
Linked Data might be able to help (pay-as-you-go).
Can Linked Data help to solve the EII problem in
a fortune-500 company?
Creating Knowledge
out of Interlinked Data
Inter-
linking/
Fusing
Classifi-
cation/
Enrichment
Quality
Analysis
Evolution /
Repair
Search/
Browsing/
Exploration
Extraction
Storage/
Querying
Manual
revision/
authoring
Linked Data
Lifecycle
Creating Knowledge
out of Interlinked Data
Extraction
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
Creating Knowledge
out of Interlinked Data
From unstructured sources
• NLP, text mining, annotation
From semi-structured sources
• DBpedia, LinkedGeoData, DataCube
From structured sources
• RDB2RDF
Extraction
Creating Knowledge
out of Interlinked Data
extract structured information from Wikipedia
& make this information available on the Web as LOD:
• ask sophisticated queries against Wikipedia (e.g.
universities in brandenburg, mayors of elevated towns, soccer
players),
• link other data sets on the Web to Wikipedia data
• Represents a community consensus
Recently launched DBpedia Live transforms Wikipedia
into a structured knowledge base
Transforming Wikipedia into an Knowledge
Base
S. Auer et al.: DBpedia - A Crystallization Point for the Web of Data. Journal of Web Semantics, Elsevier 2009. Most Cited Article 2006-10 Award
S. Auer et al.: DBpedia: A Nucleus for a Web of Open Data. 6th International Semantic Web Conference ISWC07.
S. Auer et al.: What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. 4th European Semantic Web Conf. ESWC07
Structure in Wikipedia
• Title
• Abstract
• Infoboxes
• Geo-coordinates
• Categories
• Images
• Links
– other language versions
– other Wikipedia pages
– To the Web
– Redirects
– Disambiguations
Infobox templates
{{Infobox Korean settlement
| title = Busan Metropolitan City
| img = Busan.jpg
| imgcaption = A view of the [[Geumjeong]] district in Busan
| hangul = 부산 광역시
...
| area_km2 = 763.46
| pop = 3635389
| popyear = 2006
| mayor = Hur Nam-sik
| divs = 15 wards (Gu), 1 county (Gun)
| region = [[Yeongnam]]
| dialect = [[Gyeongsang]]
}}
http://dbpedia.org/resource/Busan
dbp:Busan dbpp:title ″Busan Metropolitan City″
dbp:Busan dbpp:hangul ″부산 광역시″@Hang
dbp:Busan dbpp:area_km2 ″763.46“^xsd:float
dbp:Busan dbpp:pop ″3635389“^xsd:int
dbp:Busan dbpp:region dbp:Yeongnam
dbp:Busan dbpp:dialect dbp:Gyeongsang
...
Wikitext-Syntax
RDF representation
A vast multi-lingual, multi-domain
knowledge base
DBpedia extraction results in:
• descriptions of ca. 3.4 million things (1.5 million classified in a consistent
ontology, including 312,000 persons, 413,000 places, 94,000 music albums,
49,000 films, 15,000 video games, 140,000 organizations, 146,000
species, 4,600 diseases
• labels and abstracts for these 3.2 million things in up to 92 different languages;
1,460,000 links to images and 5,543,000 links to external web pages;
4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories,
and 75,000 YAGO categories
• altogether over 1 billion pieces of information (i.e. RDF triples): 257M from
English edition, 766M from other language editions
• DBpedia Live (http://live.dbpedia.org/sparql/) &
Mappings Wiki (http://mappings.dbpedia.org)
integrate the community into a refinement cycle
• Upcomming DBpedia inline
Creating Knowledge
out of Interlinked Data
SELECT ?name ?birth ?description ?person WHERE {
?person dbp:birthPlace dbp:Berlin .
?person skos:subject dbp:Cat:German_musicians .
?person dbp:birth ?birth .
?person foaf:name ?name .
?person rdfs:comment ?description .
FILTER (LANG(?description) = 'en') .
} ORDER BY ?name
DBpedia SPARQL Endpoint
Creating Knowledge
out of Interlinked Data
DBpedia Applications: Relfinder
2011/05/12 CONSEGI - Sören Auer: DBpedia 17
Creating Knowledge
out of Interlinked Data
Muddy Boots (BBC): Annotate actors in BBC News with DBpedia identifiers
Open Calais (Reuters): named entities connected via owl:sameAs to DBpedia
Faviki (social bookmarking): uses DBpedia to group tags & multi-language support
Topbraid Composer (ontology editor): links entities to DBpedia
DBpedia Applications (3rd party)
Creating Knowledge
out of Interlinked Data
Many different approaches: D2R, Virtuoso RDF Views, Triplify,
No agreement on a formal
semantics of RDF2RDF
mapping
• LOD readiness,
SPARQL-SQL translation
W3C RDB2RDF WG
Extraction Relational Data
Tool Triplify Sparqlify D2RQ
Virtuoso
RDF Views
Technology
Scripting
languages
(PHP)
Java Java
Whole
middleware
solution
SPARQL
endpoint
- X X X
Mapping
language
SQL
SPARQL
CONSTRUCT
Views + SQL
RDF based RDF based
Mapping
generation
Manual
Semi-
automatic
Semi-
automatic
Manual
Scalability
Medium-
high
(but no
SPARQL)
Very high Medium High
Malhotra, Auer, Erling, Hausenblas: W3C RDB2RDF Incubator Group Report. W3C RDB2RDF Incubator Group, 2009.
Creating Knowledge
out of Interlinked Data
Triplify Light-weight approach for Linked Data
publishing from relational databases
Auer, Tramp, Aumüller, Lehmann, Hellmann: Triplify - Light-weight Linked Data Publication from Relational Databases.
In 18th International World Wide Web Conference (WWW 2009).
Creating Knowledge
out of Interlinked Data
• Rationale: Exploit existing formalisms
(SQL, SPARQL Construct) as much as
possible
• flexible & versatile mapping language
• translating one SPARQL query into
exactly one efficiently executable SQL
query
• Solid theoretical formalization based on
SPARQL-relational algebra
transformations
• Extremely scalable through elaborated
view candidate selection mechanism
• Used to publish 20B triples for
LinkedGeoData
Sparqlify
Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases.
Submitted to VLDB-Journal.
SPARQL
Construct
SQL
View
Bridge
Creating Knowledge
out of Interlinked Data
Storage and Querying
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
Creating Knowledge
out of Interlinked Data
Querying still by a factor 3-20 slower than relational data
management (BSBM, DBpedia Benchmark), but more flexibility
Performance increases steadily
Comprehensive, well-supported open-source and commercial
implementations are available:
• OpenLink’s Virtuoso (os+commercial)
• Big OWLIM (commercial), Swift OWLIM (os)
• 4store (os)
• Dydra (hosted)
• Bigdata (distributed)
• Allegrograph (commercial)
• Mulgara (os)
RDF Data Management
Creating Knowledge
out of Interlinked Data
• Uses DBpedia as data and a
selection of 25 frequently
executed queries
• Can generate fractions and
multiples of DBpedia‘s size
• Does not resemble relational
data
Performance differences,
observed with other
benchmarks are amplified
DBpedia Benchmark
Geometric Mean
Morsey, Lehmann, Auer, Ngonga: DBpedia SPARQL
Benchmark – Performance Assessment with Real
Queries on Real Data. Int. Semantic Web Conf.
(ISWC2011). Best-paper award.
Creating Knowledge
out of Interlinked Data
1. Semantic (Text) Wikis
• Authoring of semantically
annotated texts
2. Semantic Data Wikis
• Direct authoring of
structured information
(i.e. RDF, RDF-Schema,
OWL)
Two Kinds of Semantic Wikis
Creating Knowledge
out of Interlinked Data
• Versatile domain-independent tool
• Serves as Linked Data / SPARQL endpoint on the Data Web
• Open-source project hosted at Google code
• Not just a Wiki UI, but a whole framework for the development of
Semantic Web applications
• Developed in PHP based on the Zend framework
• Very active developer and user community
• More than 500 downloads monthly
• Large number of use cases, including industry:
OntoWiki a semantic data wiki
[1] Auer, Dietzold, Riechert: OntoWiki - A Tool for Social, Semantic Collaboration. 5th International Semantic Web Conference, ISWC 2006.
[2] Riechert, Morgenstern, Auer, Tramp, Martin: Knowledge Engineering for Historians on the Example of the Catalogus Professorum
Lipsiensis 9th Int. Semantic Web Conference ISWC2010. Best paper award.
Creating Knowledge
out of Interlinked Data
The situation at a world leading car manufacturer (€97.76 billion
revenue, 250.000 employees):
• 3.000 heterogeneous IT systems
• Different units (car, bus, truck etc.) with very different views
• No common language
• Inability to identify crucial entities (parts, locations etc.)
enterprise wide
There is no (can not be a) single Enterprise Information Model
A distributed, iterative, bottom-up integration approach such as
Linked Data might be able to help (pay-as-you-go).
Can Linked Data help to solve the EII problem in
a fortune-500 company?
Creating Knowledge
out of Interlinked Data
Management of Enterprise Taxonomies with OntoWiki
Based on the W3C SKOS standard
Corporate Language Management: 500k concepts in 20
languages
Linked Data & Collaboration for the
Digital Humanities
Riechert, Morgenstern, Auer, Tramp, Martin:
Knowledge Engineering for Historians on the Example of the Catalogus Professorum Lipsiensis.
9th International Semantic Web Conference (ISWC2010). Best Paper award.
Creating Knowledge
out of Interlinked Data
In an uncontrolled
environment as the Data
Web, there will be a
proliferation of equivalent
or similar entity identifiers
Manual Link discovery:
• Sindice integration into UIs
• Semantic Pingback
Semi-automatic:
• SILK
• LIMES
Automatic/ Supervised:
• Raven [1]
Linking Entities on the Data Web
[1] Ngonga, Lehmann, Auer, Höffner: RAVEN -- Active Learning of Link Specifications, OM@ISWC, 2011.
Creating Knowledge
out of Interlinked Data
Similarity/Equality/relatedness of entities can be
often expressed using a distance metric (e.g.
strings - edit distance, POIs - euclidian distance)
Uses the characteristics of metric spaces
Esp. consequences of triangle inequality
d(x, y) < d(x, z) + d(z, y)
d(x, z) - d(z, y) < d(x, y) < d(x, z) + d(z, y)
Use pessimistic approximations of distances
instead of computing them
Only compute distances when needed
High-performance LIMES framework is available as open-
source and outperformes state-of-the-art by an order of
magnitude
LIMES: Link Discovery in Metric Spaces
Ngonga, Auer: LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data 22nd Int. Joint Conf.
on Artificial Intelligence (IJCAI2011).
Creating Knowledge
out of Interlinked Data
Active learning of link specifications:
Raven - Towards Zero-Conguration Link Discovery
Ngonga Ngomo, Lehmann, Auer, Höffner: RAVEN: Towards Zero-Configuration Link Discovery. In OM 2012.
Creating Knowledge
out of Interlinked Data
• Experiments even
with very large KBs
(Diseasome &
DBpedia) show that
with 10-20
examples a f-score
of >95% can be
achieved
• Learning iteration
takes <1s
Active learning of link specifications
Creating Knowledge
out of Interlinked Data
Enrichment
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
Creating Knowledge
out of Interlinked Data
Linked Data is mainly instance data!!!
ORE (Ontology Repair and Enrichment) tool allows to improve an
OWL ontology by fixing inconsistencies & making suggestions for
adding further axioms.
• Ontology Debugging: OWL reasoning to detect inconsistencies and
satisfiable classes + detect the most likely sources for the problems.
user can create a repair plan, while maintaining full control.
• Ontology Enrichment: uses the DL-Learner framework to suggest
definitions & super classes for existing classes in the KB. works if
instance data is available for harmonising schema and data.
http://aksw.org/Projects/ORE
Enrichment & Repair
Lehmann, Auer, Tramp: Class Expression Learning for Ontology Engineering. Journal of Web Semantics (JWS), 2011.
Creating Knowledge
out of Interlinked Data
Given:
• Background knowledge base
• Positive and negative examples
(example = individual in ontology)
Goal:
• Find an OWL Class Expression / DL
concept which
• covers as many positive examples as
possible
• covers as few negative examples as
possible
Concept C covers example a <=>
a is instance of C
Analogous problem can be defined for logic
programs => Inductive Logic Programming
Supervised Machine Learning Task
Improving Linked Data Quality by Ontology
Learning
Hellmann, Lehmann, Auer: Learning of OWL Class Descriptions on Very Large Knowledge Bases. Int. Journal on Semantic
Web & Information Systems (IJSWIS), Vol. 5, Issue 2, April-July 2009, ISSN: 1552-6283.
Creating Knowledge
out of Interlinked Data
Analysis
Quality
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
Creating Knowledge
out of Interlinked Data
Quality on the Data Web is varying a lot
• Hand crafted or expensively curated knowledge base
(e.g. DBLP, UMLS) vs. extracted from text or Web
2.0 sources (DBpedia)
Research Challenge
• Establish measures for assessing the authority,
provenance, reliability of Data Web resources
Opportunity for EII: Employ crowd-sourced
knowledge from the Data Web in the Enterprise
Linked Data Quality Analysis
FP7-IP DIACHRON Managing the Evolution and Preservation of the Data Web
Started April 2013
Creating Knowledge
out of Interlinked Data
• unified method, for data evolution &
ontology refactoring.
• modularized, declarative definition
of evolution patterns => simple
compared to imperative description
• RDF representation of evolution
patterns => patterns can be shared
and reused on the Data Web.
• declarative definition of bad smells
and corresponding evolution
patterns promotes the (semi-
)automatic improvement of
information quality.
EvoPat Pattern based KB Evolution
Rieß, Heino, Dietzold, Auer: EvoPat - Pattern-Based Evolution and Refactoring of RDF Knowledge Bases.
In: 9th International Semantic Web Conference ISWC2010.
Creating Knowledge
out of Interlinked Data
Exploration
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
Creating Knowledge
out of Interlinked Data
An ecosystem of LOD visualizations
LODExploration
Widgets
Spatial faceted-
browsing
Faceted-
browsing
Statistical
visualization
Entity-/faceted-
Based browsing
Domain specific
visualizations … …
LODDatasetsChoreography
layer
• Dataset analysis (size, vocabularies, property histograms etc.)
• Selection of suitable visualization widgets
Brunetti, Auer, García: The Linked Data Visualization Model. To appear in IJSWIS, 2012.
Creating Knowledge
out of Interlinked Data
LOD Life-(Washing-)cycle supported by Debian
based LOD2 Stack
http://stack.lod2.eu
Creating Knowledge
out of Interlinked Data
Linked Enterprise Intra Data Webs fill the gap
between Intra-/Extranets and EIS/ERP
Unstructured Information
Management
Structured Information
Management
Support the long tail of enterprise information domains
• Human-resources
• Requirements engineering
• Supply-chains
Creating Knowledge
out of Interlinked Data
When just data shall be exchanged and
integrated SOA is quite expensive
Facilitates data integration along value-chains
within and across enterprises
PricewaterhouseCoopers, Technology Forecast, 2009
Creating Knowledge
out of Interlinked Data
• Linked Data is a promising technology for closing the
gap between SOA and unstructured information
management
• wealth of knowledge available as LOD can be
leveraged as background knowledge for Enterprise
applications
• The application of Linked Data in the enterprise is still
largely unexplored (opportunity)
• Linked Data will make Enterprise Information Integration
more flexible, iterative, cost effective
Take home messages
Auer, Frischmuth, Klímek, Tramp, Unbehauen, Holzweißig, Marquardt: Linked Data in Enterprise Information Integration
Submitted to Semantic Web Journal.
Creating Knowledge
out of Interlinked Data
DBpedia
“Semantification” of
Wikipedia
AKSW: Bridging Theory with Applications
Triplify
“Semantification” of (small) Web
Applications
OntoWiki
Collaborative creation of explicit
knowledge via Semantic Wikis
LIMES
Link Discovery Framework
for metric spaces
Vakantieland
Building Data Web applications
SoftWiki
Distributed, stakeholder driven
Requirements Engineering
Foundations
Marrying databases with RDF
and ontologies Tools & Datasets
Applications
Bringing the Data Web to
end users
NLP2RDF
Integrating Natural Language
processing tool chains with LOD
Enterprise Knowledge Bases
Realizing knowledge hubs within
an Enterpise’s Data Intranet
Thesaurus Management
Defining corp. language & data
…
DL-Learner
Machine Learning for Ontologies
Catalogus Professorum
Prosopographical knowledge
base
LinkedGeoData
“Semantification” of
OpenStreetMaps
LESS
Semantification Syndication
RDB2RDF
Mapping relational data to RDF
ORE
Ontology Enrichment & Repair
EU-FP7 LOD2 Project Overview . Page 71 http://lod2.eu
Creating Knowledge out of Interlinked Data
AKSW Team
EU-FP7 LOD2 Project Overview . Page 72 http://lod2.eu
Creating Knowledge out of Interlinked Data
The LOD2 Gang
Creating Knowledge
out of Interlinked Data
Thanks for your attention!
Sören Auer
http://www.informatik.uni-leipzig.de/~auer | http://aksw.org | http://lod2.org
auer@informatik.uni-leipzig.de
Soon at: