Sebastian Hellmann

DBpedia - A Global Open Knowledge
Network
Sebastian Hellmann and Sören Auer
http://dbpedia.org

Outline
1. Concepts and DBpedia Strategy
2. Technologies
3. Outlook
2

Introduction
● Core and Context separation
○ Core Data: High value, low maintenance
○ Context data: :ow value, high maintenance
● Fraud detection (Credit Card institute)
○ Core data: Credit card transactions
○ Context data: Public information (ATMs, cities, flight plans, products, crime rate, etc.)
● Supply-chain management (Manufacturing)
○ Core data: Know-How of Manufacturing (What can you build )
○ Context data: Supplier market
● Publishers
○ Core data: Content
○ Context data: Taxonomies and items to describe the content (Persons, Places, Events)
3

Common Challenges for us
● Speed of data ingestion
○ How fast can you find, understand and integrate external data?
● Virtually no feedback mechanisms to data providers
● No effective collaboration on your data, although you are curating the same
data as others
How do we enable collaboration on data?
4

DBpedia Strategy Overview
Starting point
DBpedia is the most successful
open knowledge graph (OKG)
5
OKG Governance
Collaboration & Curation
Max. societal value
Medium term goals:
● 10 millions of users
● millions of active contributors
● thousands of new businesses
and initiatives
take DBpedia to a global level

Potentializing Societal Value by:
● OKG Governance - licensing, incubation, maturity model for OKGs
○ Apache Foundation for data
● OKG Collaboration & Curation - for individuals & organizations
○ Git and GitHub for data
● Providing a trustworthy global OKG infrastructure - for enterprises small
and large as well as non-profits and societal initiatives alike
● Maximizing societal value of open knowledge by incubating open
knowledge initiatives and businesses (e.g. in education, public health, open
science)
6

GitHub for Data
DBpedia aims to create a knowledge graph curation service, which allows
communities to collaborate on rich semantic representations.
● The knowledge graph uses the RDF data model as scaffold but is augmented
with rich metadata about provenance, discourse, evolution etc.
● Atomic units of the knowledge graph are facts/statements, which are
aggregated into resources/entity descriptions
● All contributions and changes are tracked and versioned
7

OKG Clearinghouse and Steward
DBpedia will be a clearinghouse for OKG contributions and steward for their sustainable
maintenance
● Open Data license and contributor agreements
● Incubation and maturity model for OKG assets
○ Based on an automatic and sample-based quality and coverage assessment
● Continuous integration for OKG assets
○ Automatic link generation
○ Execution of test cases
○ OKG publication in various human-readable and machine-readable formats
● Communication and collaboration infrastructure for OKG communities
8

Comparing with related initiatives
Use-case driven
● contrast with platform-first approaches (repositories like DataHub.io)
● We build the platform to support the use cases.
Collaboration-driven
● contrast with volunteer-driven (like Wikidata)
● improving completeness & correctness of areas that are most used by the partners (full-time
curators, instead of sporadic)
Knowledge-integration-driven
● contrast with loose collections (like data markets)
● We make every small piece of information identifiable & referenceable
⇒ knowledge melting pot.
9
We are willing and able to collaborate with and integrate other open data initiatives.

“Publish and Link” falls short for the Web of Data
Connecting data is about connecting people and organisations. 10

Collaboration in the Web of Data
Connecting data is about connecting people and organisations.
11
● DBpedia’s mission is to
○ serve as an access point for data
○ facilitate collaboration
○ disseminate data on a global scale

Data Incubator model
12
Counselling
Analysis
Integration
Colla-
boration
Full collaboration benefits
Shared OKG Governance
DBpedia Contributor Requirements
Excel anarchy, no governance
LVL 0
LVL 1
LVL 2
LVL 3
LVL 3: access to all
relevant data, links, users
of the ecosystem
By reaching LVL 3 cost for
maintaining LVL 2 as well
as OKG Governance and
Curation is shared
effectively with the
DBpedia ecosystem

LVL 0: Excel Anarchy, No Governance
● Each employee/department governs his own data (Anarchy)
● Intensive counselling required
● Best to build a parallel structure and show the value (KG prototype)
13

LVL 1: DBpedia Contributor Requirements
● Stable identifiers
● Good level of schema unification and management
● Data strategy & Knowledge Graph available
● Core and Context separation
○ Core Data: High value, low maintenance
○ Context data: Very low value, very high maintenance (commodity)
● What data maintenance can you outsource to DBpedia? (Analysis)
14

LVL 2: Shared OKG Governance
● Technical steps (Integration):
○ Identifier Linking
○ Schema Mapping
○ Release data into DBpedia
● Continuously maturing tool stack to improve these three steps
DBpedia Association consists of a huge network of universities
-> we mediate internships to tackle above tasks
15

LVL 3: Full collaboration benefits
● Link triangulation (Who links to you? subscription)
● Sources validation (Error reports)
● Data comparison (your data with all other data sources)
● Mediate contact to other organisations with the same data
● Any user feedback is directed to the sources
16

Incubator model
● Organisations…
○ can use the DBpedia incubator model to improve their OKG
○ each joining will upvalue DBpedia with data and experience
● DBpedia...
○ acts as the mediator
○ will distribute value to other orgs and users on a global scale
17

Technologies
● ID Management + Linking
● DataID (Metadata treatment)
● Data comparison and feedback
● SHACL - Test-driven data development

ID Management + Linking
● For each source ID, DBpedia assigns a local DBpedia ID
● Links are then grouped into clusters
● From the cluster a representative ID is chosen, others are redirects
● Properties:
○ Every imported entity is identifiable and traceable with local ID
○ Holistic identifier space -> allows a complete linkage
○ Stable IDs allow to improve link accuracy over time
● http://dbpedia.github.io/links/tools/linkviz/

DataID (Metadata treatment)
● FOAF and WebID -> you keep your data local, all online accounts are
updated automatically
● DataID -> DCAT extension, keep data description locally, all data repos will
be updated automatically

Data comparison and feedback
Show differences in the data:
http://downloads.dbpedia.org/temporary/crosswikifact/results/q84.html
http://wikidata.org/wiki/Q84
areaTotal of London
Population of London P1082

Test-driven data development
● Test-driven data development (2014)
● Dimitris Kontokostas (CTO of DBpedia) co-editor of the SHACL
● RDFUnit
○ uses Machine Learning (DL-Learner) to enrich the OWL schema
○ TAG - Test Autogenerators from enriched schema
○ 44,000 tests generated from the DBpedia Ontology
● Tests are transferred to sources (schema mapping)
● Tests are written collaboratively:
○ Universal: deathdate should not be before birthdate
○ Shared: specialised domain and application tests
Test-driven evaluation of linked data quality. Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland
Cornelissen, and Amrapali J. Zaveri in Proceedings of the 23rd International Conference on World Wide Web.

DBpedia in 10 years
● DBpedia connects hundreds of thousands of data spaces
(centralised-decentralised architecture)
● Data about the world is a commodity (freely available to everybody)
● Working with data will be fun

Become a supporter or an early adopter
This is not a vision of the far future, it is happening now:

Contact for the DBpedia Association (non-profit)
dbpedia@infai.org @dbpedia
wiki.dbpedia.org @dbpedia.org

Sebastian Hellmann

More Related Content

What's hot

Similar to Sebastian Hellmann

More from Connected Data World

Recently uploaded

Sebastian Hellmann