A Linked Data Scalability Challenge: Frequently Reused Concepts Lose their Meaning
Paolo Pareti
University of Edinburgh
ACM Web Science Conference 29/6/2015
The Semantic Richness
of Linked Data Concepts
Vocabulary Reuse Damages Semantics!
Semantic Richness
The more facts we can infer about :x,
knowing that :x it is a Cat,
the more Semantically Rich the concept Cat is.
Semantic Richness
The more facts we can infer about :x,
knowing that :x it is a Cat,
the more Semantically Rich the concept Cat is.
Does it have a tail?
Is it a mammal?
is a
So what do you actually know about :x,
if on the Web anything can be a Cat?
:x Cat
A Linked Data Challenge
The more a concept gets reused…
… the least Semantically Rich it becomes.
A Linked Data Challenge
The more a concept gets reused…
… the least Semantically Rich it becomes.
Frequently reused concepts lose their meaning.
http://www.w3.org/2002/07/owl#sameAs
This problem already affects highly reused concepts,
such as owl:sameAs *
* H. Halpin, P. J. Hayes, J. P. McCusker, D. L. McGuinness, and H. S. Thompson. When owl:sameAs Isn’t the Same: An
Analysis of Identity in Linked Data. In The Semantic Web - ISWC 2010, volume 6496 of Lecture Notes in Computer Science,
pages 305–320. Springer Berlin Heidelberg, 2010.
A Simple Measure of Semantic Richness
We define a measure based on:
● the number of common patterns,
● and their frequency.
For example:
if X is a cat, what can we say about X?
● X is a mammal (frequency: 1.00)
● X has a tail (frequency: 0.99)
● ...
A Simple Measure of Semantic Richness
Intuitively:
● The more patterns, and the more frequent they are,
the more semantically rich the concept is.
Measure motivated by:
● Number of Features theory
● Inductive Learning
Main advantage:
● Can be automatically and efficiently computed over
large datasets.
DBpedia Ontology
The DBpedia ontology tree, plotted according to the Semantic Richness
of its concepts (each line represents a subclass relation). As we would
expect, Semantic Richness is highly correlated with specificity.
Loss of Semantic Richness in foaf:Person
How quickly does Semantic Richness decrease when reusing
a concept? We looked at the concept of foaf:Person as defined in ten
different datasets.
Loss of Semantic Richness in foaf:Person
As we add external entities of type foaf:Person into a dataset, the
Semantic Richness of this concept quickly decreases.
In particular, it falls below the average Semantic Richness
of the original datasets (dotted line).
The Challenge
How can concepts be openly reused on the Web,
while at the same time remaining semantically rich?