SlideShare a Scribd company logo
Serving DBpedia with DOLCE
More than Just Adding a Cherry on Top
DBpedia Extraction Framework
Mappings Wiki
DOLCE
Heiko Paulheim and Aldo Gangemi
10/13/15 Heiko Paulheim and Aldo Gangemi 2
DOLCE Mappings: Served Since DBpedia 2014
10/13/15 Heiko Paulheim and Aldo Gangemi 3
More than Just a Cherry on Top
• DOLCE adds a layer of formalization
– high level axioms
– additional domain and range restrictions
– fundamental disjointness (e.g., physical object vs. social object)
• Enriches DBpedia ontology
• Can be used for consistency checking
10/13/15 Heiko Paulheim and Aldo Gangemi 4
More than Just a Cherry on Top
Tim Berners-Lee Royal Society
Award
award
range
Description
subclass of
Social Agent
disjoint
with
DBpedia
ontology
DBpedia
instances
DOLCE
ontology
Organisation
is a
Social Person
equivalent class
subclass of
10/13/15 Heiko Paulheim and Aldo Gangemi 5
DBpedia in a Nutshell
• Raw extraction from Wikipedia infoboxes
• Infobox types and keys mapped to ontology
– Crowdsourcing process (aka Mappings Wiki)
– 735 classes
– ~2,800 properties
• Almost no disjointness
– only 24 disjointness axioms
– many of those are corner cases
• MovingWalkway vs. Person
f
10/13/15 Heiko Paulheim and Aldo Gangemi 6
DOLCE in a Nutshell
• A top level ontology
• Defines top level classes and relations
• Including rich axiomatization
10/13/15 Heiko Paulheim and Aldo Gangemi 7
DOLCE in a Nutshell
• Original DOLCE ontologies were too heavy weight
– thus: hardly used on the semantic web
– remember: a little semantics goes a long way!
• DOLCE-Zero
– Simplified version
– Contains both DOLCE and D&S (Descriptions and Situations)
– D&S introduces some high level design patterns
10/13/15 Heiko Paulheim and Aldo Gangemi 8
Systematic vs. Individual Errors in DBpedia
• Systematic errors
– occur frequently following a pattern
– e.g., organizations are frequently used as objects of the relation award
• are likely to have a common root cause
– wrong mapping from infobox to ontology
– error in the extraction code
– ...
Tim Berners-Lee Royal Society
Award
award
range
Description
subclass of
Social Agent
disjoint
with
DBpedia
ontology
DBpedia
instances
DOLCE
ontology
Organisation
is a
Social Person
equivalent class
subclass of
10/13/15 Heiko Paulheim and Aldo Gangemi 9
Overall Workflow
• Overall workflow
• For each statement
– add the statement plus all subject/object types to the ontology
– check consistency
– present inconsistent statements and explanations for inspection
; Reasoning 2
RDF Graph
Statements
+ Types
Inconsistent
Statements
+ explanations
User
10/13/15 Heiko Paulheim and Aldo Gangemi 10
Overall Workflow
• Inspecting a single statement takes 2.6 seconds
– on a standard laptop
• DBpedia 2014 has 15,001,543 statements
– in the dbpedia-owl namespace
→ consistency checking would take 451 days
• Solution
– cache results for signatures (predicate + subject types + object types)
– there are only 34,554 different signatures!
; Reasoning 2
RDF Graph
Statements
+ Types
Inconsistent
Statements
+ explanations
UserCache
10/13/15 Heiko Paulheim and Aldo Gangemi 11
Overall Workflow
• Overall, we find 3,654,255 inconsistent statements (24.4%)
– cf.: only 97,749 (0.7%) without DOLCE
• Too much to inspect!
– We are looking for systematic errors
– Cluster explanations w/ DBSCAN
– Each cluster represents a systematic error
; Reasoning Clustering 2
RDF Graph
Statements
+ Types
Inconsistent
Statements
+ explanations
Clusters
UserCache
10/13/15 Heiko Paulheim and Aldo Gangemi 12
Clustering Inconsistent Statements
• Each explanation as a binary vector
– with 0/1 for all axioms involved in the explanations
– 1,467 axioms (dimensions) in total
• DBSCAN
– Manhattan distance (i.e., number of axioms added/removed)
– MinPts=100 (minimum frequency for a systematic error)
– ε=4 (explanations in a cluster differ by two axioms at most)
10/13/15 Heiko Paulheim and Aldo Gangemi 13
Major Systematic Errors
• Inspection of the top 40 clusters
– they contain 96% of all inconsistent statements
• Overcommitment (19) – using properties in different contexts
– e.g., dbo:team is defined as a relation between persons and
sports teams
– but also used for relating participating teams to events
– fix: relax domain/range constraints, or introduce new properties
• Metonymy (11) – ambiguous language terms
– e.g., instances of dbo:Species contain both species
as well as single animals
– hard to refactor
10/13/15 Heiko Paulheim and Aldo Gangemi 14
Major Systematic Errors
• Misalignment (5) – classes/properties mapped to
the wrong concept in DOLCE
– occasionally occurs if intended and actual use differ
– e.g., dbo:commander is more frequently used with events (e.g., battles)
than military units – d0:hasParticipant rather than d0:coparticipatesWith
– fix: change alignment
• Version branching (3) – semantics of dbo concepts have changed
– e.g., dbo:team in DBpedia 3.9: career stations and teams,
in DBpedia 2014: athletes and teams
– fix: change alignment
10/13/15 Heiko Paulheim and Aldo Gangemi 15
`
A Look at the Long Tail
• DBSCAN identifies clusters and “noise”
– i.e., statements that are not contained in clusters
• Manual inspection of a sample of 100 instances
– 64 are erroneous
– 30 are false negatives (i.e., correct statements)
– 6 are questionable
• Typical error sources in the long tail
– are expected to be cross-cutting, i.e.,
occurring with various classes and properties
10/13/15 Heiko Paulheim and Aldo Gangemi 16
A Look at the Long Tail
• Typical error sources in the long tail
• Link in longer text (23)
– e.g., dbr:Cosmo_Kramer dbo:occupation dbr:Bagel .
• Wrong link in Wikipedia (9)
– e.g., dbr#Stone_(band) dbo:associatedMusicArtist dbr#Dementia .
– Dementia should link to the band, not the disease
• Redirects (7)
– e.g., dbpedia#Ben_Casey dbo:company dbpedia#Bing_Crosby .
– The target Bing_Crosby_Productions redirects to Bing_Crosby
• Links with Anchors (6)
– e.g., dbr:Tim_Berners-Lee dbo:award dbr:Royal_Society .
– the original link target is Royal_Society#Fellows
– anchors are ignored by DBpedia
10/13/15 Heiko Paulheim and Aldo Gangemi 17
Conclusions
• We have shown that
– DOLCE helps identifying inconsistent statements
– Cluster analysis allows for identifying systematic errors
– User interaction is minimized
• we analyzed one statement each from 40 clusters
• corresponding to 3,497,068 affected statements
• Outcomes for future DBpedia versions
– DBpedia ontology changes
– Mapping changes
– DOLCE alignment changes
– Bug reports
Serving DBpedia with DOLCE
More than Just Adding a Cherry on Top
DBpedia Extraction Framework
Mappings Wiki
DOLCE
Heiko Paulheim and Aldo Gangemi

More Related Content

Similar to Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top

Semantic Web - Ontology 101
Semantic Web - Ontology 101Semantic Web - Ontology 101
Semantic Web - Ontology 101
Luigi De Russis
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
Prateek Jain
 
Tutorial: Building and using ontologies - E.Simperl - ESWC SS 2014
 Tutorial: Building and using ontologies -  E.Simperl - ESWC SS 2014 Tutorial: Building and using ontologies -  E.Simperl - ESWC SS 2014
Tutorial: Building and using ontologies - E.Simperl - ESWC SS 2014
eswcsummerschool
 
Building and using ontologies
Building and using ontologies Building and using ontologies
Building and using ontologies
Elena Simperl
 
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...eswcsummerschool
 
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionOntology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Aldo Gangemi
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulations
tbruce
 
Ontologies: vehicles for reuse
Ontologies: vehicles for reuseOntologies: vehicles for reuse
Ontologies: vehicles for reuse
Guus Schreiber
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Brian Ulicny
 
2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt
HayomeTakele
 
BT02.pptx
BT02.pptxBT02.pptx
BT02.pptx
ThAnhonc
 
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML
 
RDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dotsRDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dotsLouise Spiteri
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
kelbedweihy
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2Seonho Kim
 
SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
SolrSherlock: Linkfinding among Biomolecules with Literature-based DiscoverySolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
Jack Park
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
Aidan Hogan
 
Temple University Digital Scholarship Center: Model of the Month Club: Septem...
Temple University Digital Scholarship Center: Model of the Month Club: Septem...Temple University Digital Scholarship Center: Model of the Month Club: Septem...
Temple University Digital Scholarship Center: Model of the Month Club: Septem...
Liz Rodrigues
 
RDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dotsRDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dotsLouise Spiteri
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
Bernhard Haslhofer
 

Similar to Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top (20)

Semantic Web - Ontology 101
Semantic Web - Ontology 101Semantic Web - Ontology 101
Semantic Web - Ontology 101
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Tutorial: Building and using ontologies - E.Simperl - ESWC SS 2014
 Tutorial: Building and using ontologies -  E.Simperl - ESWC SS 2014 Tutorial: Building and using ontologies -  E.Simperl - ESWC SS 2014
Tutorial: Building and using ontologies - E.Simperl - ESWC SS 2014
 
Building and using ontologies
Building and using ontologies Building and using ontologies
Building and using ontologies
 
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
 
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionOntology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulations
 
Ontologies: vehicles for reuse
Ontologies: vehicles for reuseOntologies: vehicles for reuse
Ontologies: vehicles for reuse
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13
 
2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt
 
BT02.pptx
BT02.pptxBT02.pptx
BT02.pptx
 
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
 
RDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dotsRDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dots
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 
SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
SolrSherlock: Linkfinding among Biomolecules with Literature-based DiscoverySolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Temple University Digital Scholarship Center: Model of the Month Club: Septem...
Temple University Digital Scholarship Center: Model of the Month Club: Septem...Temple University Digital Scholarship Center: Model of the Month Club: Septem...
Temple University Digital Scholarship Center: Model of the Month Club: Septem...
 
RDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dotsRDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dots
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 

More from Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
Heiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
Heiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Heiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
Heiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Heiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Heiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
Heiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
Heiko Paulheim
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
Heiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
Heiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
Heiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
Heiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
Heiko Paulheim
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
Heiko Paulheim
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
Heiko Paulheim
 

More from Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 

Recently uploaded

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 

Recently uploaded (20)

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 

Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top

  • 1. Serving DBpedia with DOLCE More than Just Adding a Cherry on Top DBpedia Extraction Framework Mappings Wiki DOLCE Heiko Paulheim and Aldo Gangemi
  • 2. 10/13/15 Heiko Paulheim and Aldo Gangemi 2 DOLCE Mappings: Served Since DBpedia 2014
  • 3. 10/13/15 Heiko Paulheim and Aldo Gangemi 3 More than Just a Cherry on Top • DOLCE adds a layer of formalization – high level axioms – additional domain and range restrictions – fundamental disjointness (e.g., physical object vs. social object) • Enriches DBpedia ontology • Can be used for consistency checking
  • 4. 10/13/15 Heiko Paulheim and Aldo Gangemi 4 More than Just a Cherry on Top Tim Berners-Lee Royal Society Award award range Description subclass of Social Agent disjoint with DBpedia ontology DBpedia instances DOLCE ontology Organisation is a Social Person equivalent class subclass of
  • 5. 10/13/15 Heiko Paulheim and Aldo Gangemi 5 DBpedia in a Nutshell • Raw extraction from Wikipedia infoboxes • Infobox types and keys mapped to ontology – Crowdsourcing process (aka Mappings Wiki) – 735 classes – ~2,800 properties • Almost no disjointness – only 24 disjointness axioms – many of those are corner cases • MovingWalkway vs. Person f
  • 6. 10/13/15 Heiko Paulheim and Aldo Gangemi 6 DOLCE in a Nutshell • A top level ontology • Defines top level classes and relations • Including rich axiomatization
  • 7. 10/13/15 Heiko Paulheim and Aldo Gangemi 7 DOLCE in a Nutshell • Original DOLCE ontologies were too heavy weight – thus: hardly used on the semantic web – remember: a little semantics goes a long way! • DOLCE-Zero – Simplified version – Contains both DOLCE and D&S (Descriptions and Situations) – D&S introduces some high level design patterns
  • 8. 10/13/15 Heiko Paulheim and Aldo Gangemi 8 Systematic vs. Individual Errors in DBpedia • Systematic errors – occur frequently following a pattern – e.g., organizations are frequently used as objects of the relation award • are likely to have a common root cause – wrong mapping from infobox to ontology – error in the extraction code – ... Tim Berners-Lee Royal Society Award award range Description subclass of Social Agent disjoint with DBpedia ontology DBpedia instances DOLCE ontology Organisation is a Social Person equivalent class subclass of
  • 9. 10/13/15 Heiko Paulheim and Aldo Gangemi 9 Overall Workflow • Overall workflow • For each statement – add the statement plus all subject/object types to the ontology – check consistency – present inconsistent statements and explanations for inspection ; Reasoning 2 RDF Graph Statements + Types Inconsistent Statements + explanations User
  • 10. 10/13/15 Heiko Paulheim and Aldo Gangemi 10 Overall Workflow • Inspecting a single statement takes 2.6 seconds – on a standard laptop • DBpedia 2014 has 15,001,543 statements – in the dbpedia-owl namespace → consistency checking would take 451 days • Solution – cache results for signatures (predicate + subject types + object types) – there are only 34,554 different signatures! ; Reasoning 2 RDF Graph Statements + Types Inconsistent Statements + explanations UserCache
  • 11. 10/13/15 Heiko Paulheim and Aldo Gangemi 11 Overall Workflow • Overall, we find 3,654,255 inconsistent statements (24.4%) – cf.: only 97,749 (0.7%) without DOLCE • Too much to inspect! – We are looking for systematic errors – Cluster explanations w/ DBSCAN – Each cluster represents a systematic error ; Reasoning Clustering 2 RDF Graph Statements + Types Inconsistent Statements + explanations Clusters UserCache
  • 12. 10/13/15 Heiko Paulheim and Aldo Gangemi 12 Clustering Inconsistent Statements • Each explanation as a binary vector – with 0/1 for all axioms involved in the explanations – 1,467 axioms (dimensions) in total • DBSCAN – Manhattan distance (i.e., number of axioms added/removed) – MinPts=100 (minimum frequency for a systematic error) – ε=4 (explanations in a cluster differ by two axioms at most)
  • 13. 10/13/15 Heiko Paulheim and Aldo Gangemi 13 Major Systematic Errors • Inspection of the top 40 clusters – they contain 96% of all inconsistent statements • Overcommitment (19) – using properties in different contexts – e.g., dbo:team is defined as a relation between persons and sports teams – but also used for relating participating teams to events – fix: relax domain/range constraints, or introduce new properties • Metonymy (11) – ambiguous language terms – e.g., instances of dbo:Species contain both species as well as single animals – hard to refactor
  • 14. 10/13/15 Heiko Paulheim and Aldo Gangemi 14 Major Systematic Errors • Misalignment (5) – classes/properties mapped to the wrong concept in DOLCE – occasionally occurs if intended and actual use differ – e.g., dbo:commander is more frequently used with events (e.g., battles) than military units – d0:hasParticipant rather than d0:coparticipatesWith – fix: change alignment • Version branching (3) – semantics of dbo concepts have changed – e.g., dbo:team in DBpedia 3.9: career stations and teams, in DBpedia 2014: athletes and teams – fix: change alignment
  • 15. 10/13/15 Heiko Paulheim and Aldo Gangemi 15 ` A Look at the Long Tail • DBSCAN identifies clusters and “noise” – i.e., statements that are not contained in clusters • Manual inspection of a sample of 100 instances – 64 are erroneous – 30 are false negatives (i.e., correct statements) – 6 are questionable • Typical error sources in the long tail – are expected to be cross-cutting, i.e., occurring with various classes and properties
  • 16. 10/13/15 Heiko Paulheim and Aldo Gangemi 16 A Look at the Long Tail • Typical error sources in the long tail • Link in longer text (23) – e.g., dbr:Cosmo_Kramer dbo:occupation dbr:Bagel . • Wrong link in Wikipedia (9) – e.g., dbr#Stone_(band) dbo:associatedMusicArtist dbr#Dementia . – Dementia should link to the band, not the disease • Redirects (7) – e.g., dbpedia#Ben_Casey dbo:company dbpedia#Bing_Crosby . – The target Bing_Crosby_Productions redirects to Bing_Crosby • Links with Anchors (6) – e.g., dbr:Tim_Berners-Lee dbo:award dbr:Royal_Society . – the original link target is Royal_Society#Fellows – anchors are ignored by DBpedia
  • 17. 10/13/15 Heiko Paulheim and Aldo Gangemi 17 Conclusions • We have shown that – DOLCE helps identifying inconsistent statements – Cluster analysis allows for identifying systematic errors – User interaction is minimized • we analyzed one statement each from 40 clusters • corresponding to 3,497,068 affected statements • Outcomes for future DBpedia versions – DBpedia ontology changes – Mapping changes – DOLCE alignment changes – Bug reports
  • 18. Serving DBpedia with DOLCE More than Just Adding a Cherry on Top DBpedia Extraction Framework Mappings Wiki DOLCE Heiko Paulheim and Aldo Gangemi