Semantic Web languages: Expressivity vs scalability

Semantic Web languages: Expressivity vs scalability
Semantic Web languages:
Expressivity vs scalability
Nicola Vitucci
Dipartimento di Elettronica e Informazione
Politecnico di Milano
December 17, 2012

Summary
1 Introduction
2 Semantic Web languages
3 Description Logics
4 Queries
5 Storage
6 Conclusions

Introduction
Semantic Web languages
Semantic Web languages are built on the notion of Semantic
Web, an “extended version” of the Web where metadata enrich
semantically the content of a Web page
They are used in several applications for:
Building a knowledge base (a “richer” database where queries can
be performed also on the ER model itself)
Providing a shared vocabulary
Integrating diﬀerent sources of information
Discovering new information by performing automatic reasoning

Introduction
The Semantic Web “layer cake”
Taken from http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/#(24)

RDF, RDFS, OWL
RDF is a data model
describing resources and
their relations
RDFS provides a structure
for RDF resources
OWL (and the newer
version OWL 2) is a family
of three languages which
extend RDFS:
OWL Full
OWL DL
OWL Lite
OWL Full
RDFS
OWL DL
OWL Lite
All the languages can be serialized using formats such as
RDF/XML, N3, N-Triples or Turtle

OWL 2 proﬁles
OWL 2 DL can be seen as a “group” of sublanguages called
proﬁles:
OWL 2 EL, suitable for big and
relatively simple taxonomies
OWL 2 QL, suitable for
conjunctive queries on many
instances
OWL 2 RL, a sort of
“compromise” between
expressivity and scalability
inspired by rule-based
reasoning
EL QL
RL
OWL DL
OWL Full
Recent proposal: OWL-LD (OWL for the Linked Data)
→ http://semanticweb.org/OWLLD/

Too many languages?
Are there too many OWLs?
“OWL 2 is the standard,
let’s use it”
Too easy to say!
Several issues:
Complexity of reasoning
Representation needs
Queries
Storage
OWL 2 proﬁles have been introduced to solve such issues by
sacriﬁcing some “power” of OWL 2

Description Logics
OWL and the Description Logics
The OWL (2) DL language belongs to the family of description
logics (DLs). A description logic:
is a family of logics (in the math sense)
is more powerful than propositional logic
is less powerful than First Order Logic (FOL) but decidable
has a formal semantics which allows to build ontologies and to
reason over them

Description Logics
Key concepts in DLs
The key elements have to be thought within the framework of
set theory
Individuals (single elements)
Concepts: sets of instances
Roles: relations between instances
Terminology is expressed through TBox axioms such as
Researcher Employee
ResearchCompany ≡ Company ∃hasEmployee.Researcher
Factual information about individuals is represented by ABox
axioms such as:
a : C (concept assertion)
(a, b) : R (role assertion)

Description Logics
Basic DLs
Several basic DLs exist, among which:
AL: provides atomic concept negation (¬C, where C is an atomic
concept), concept intersection (C D), universal restrictions
(∀R.C) and limited existential quantiﬁcation (∃R. )
EL: provides concept intersection and full existential quantiﬁcation
(∃R.C)
Such logics can be extended by the use of several constructs (see
next slide)

Description Logics
Constructs
Symbol meaning
E full existential quantification
U concept union (C D)
C complex concept negation (¬D); includes U and E
H role hierarchy (R S, where R and S are roles)
R inverse roles, intersection and union of roles etc., reflexivity and
irreflexivity, role disjointness; includes H
O nominals (Letter ≡ {a, b, c}, RedObject ≡ hasColor.{red})
I inverse properties (S ≡ R−
)
F functional properties
N cardinality restrictions (C ≡ nR with n 0); includes F
Q qualified cardinality restrictions (e.g. C ≡ nR.D with n 0);
includes N
(D) datatype properties (e.g. strings, numbers etc.)
S is an alias for ALC+
(ALC with transitive roles), EL++
for ELRO

Description Logics
Complexity
The complexity of a DL depends on the constructs it supports
OWL 1 Lite = SHIF(D) (restricted)
OWL 1 DL = SHOIN(D)
OWL 2 DL = SROIQ(D)
OWL 2 EL is based on EL++
OWL 2 QL is based on DL-Lite, a subset of ALC using
optionally H, F, N
OWL 2 RL is based on Description Logic Programs (DLP),
sharing many features with OWL Lite
How complex are the reasoning tasks then?

Description Logics
Complexity
The complexity of reasoning tasks depends not only on the
presence of some constructs in the used logic, but also on their
combination:
ALCQI, ALCQO: PSpace
ALCIO: ExpTime (I and O together raise the complexity)
ALCQIO, SHOIN, SHOIQ: NExpTime (I + O + N/Q)
SROIQ: N2ExpTime
Thus, care should be taken when considering the constructs which
are really needed for one’s application
More on complexity of reasoning in description logics:
http://www.cs.man.ac.uk/~ezolin/dl/

Description Logics
Complexity
Language Reasoning problems1
Complexity2
OWL 2 DL
Cons, Sat, Sub, Check 2NExpTime-Complete
Query ???
OWL 2 EL
Cons, Sat, Sub, Check PTime-Complete
Query ExpTime-Complete
OWL 2 QL
Cons, Sat, Sub, Check NLogSpace-Complete
Query NP-Complete
OWL 2 RL
Cons PTime-Complete
Sat, Sub, Check co-NP-Complete
Query NP-Complete
1
Ontology Consistency, Class Expression Satisﬁability, Class Expression
Subsumption, Instance Checking, Conjunctive Query Answering
2
More about complexity on http://www.w3.org/TR/owl2-profiles/
#Computational_Properties

Description Logics
Sources of complexity
Sources of complexity for a DL include:
Non-determinism: disjunction (or negation and conjunction),
maximum cardinality restrictions
Exponential complexity: combination of ∃ and ∀
For this reason, all the OWL 2 proﬁles disallow or restrict the use
of such constructs (see next slide)

Description Logics
Use of profiles
Intersection: always allowed but on the left side in OWL 2 QL;
Union: never allowed but on the left side in OWL 2 RL
(although A B C is the same as A C, B C , so this
does not add up to the complexity);
Negation: allowed only on the right side in OWL 2 RL/QL;
Inverses: allowed in OWL 2 RL/QL but not in OWL 2 EL;
Existential quantifiers: allowed completely in OWL 2 EL, with
restrictions on the left side in OWL 2 QL, only on the left side in
OWL 2 RL;
Universal quantifiers: allowed in OWL 2 RL (on the right side)
but not in OWL 2 EL/QL.

Description Logics
The EL proﬁle
No inverse or symmetric properties, disjunctions, negations
The EL proﬁle is suitable for biomedical ontologies such as
SNOMED
Example axiom:
ViralUpperRespiratoryTractInfection ≡
UpperRespiratoryInfection ViralRespiratoryInfection
∃CausativeAgent.Virus
∃FindingSite.UpperRespiratoryTractStructure
∃PathologicalProcess.InfectiousProcess
Suitable reasoners: Snorocket, CEL, jCEL, ELK
Often individuals are not supported
Queries are reasoner-based

Description Logics
The RL profile
Inference as a set of rules
Has universal quantifiers, inverses, (a)symmetric properties
Constructs are restricted on the two sides of a subclass axiom
This plays a role in inference
D ∃R.C (not allowed) is different from ∃R.C D (allowed)
thus, equivalences such as D ≡ ∃R.C are not allowed
Suitable reasoners: OWLIM, Jena
Queries can be performed on the model or on the instances

Description Logics
Value partition
Can use nominals instead of classes, but this would require the O
constructor and would prevent further partitions
“An object can be long, medium or short”
Object ∃hasLength.Length
Length ≡ Long Medium Short (all subclasses are disjoint)
N-ary (object or datatype) properties
“A ball is painted with a color by a certain percentage”
Painting ∃color.Color ∃percentage.Percentage
hasPainting ◦ color hasColor
hasPainting ◦ percentage hasPercentage

Description Logics
Exceptions
“Birds have feathers and ﬂy, penguins are birds but they don’t ﬂy”
Bird ∃hasFeathers
FlyingBird Bird ∃hasAbility.Fly
NonFlyingBird Bird ¬∃hasAbility.Fly
Some of these situations can be modeled using Ontology Design
Patterns (ODPs), but it is necessary to assess the required
expressivity

Description Logics
Fuzzy extensions
“The world is not black or white”
“How old is an adult?”
“A basketball has to be round and orange:
what is more important?”
Fuzzy extensions and weighted axioms
require a higher expressivity
Adult ∃age.right-shoulder(0,100,20,40)
Basketball ≡ Round0.75 Orange0.25
Available reasoners:
fuzzyDL (f-SHIF + other fuzzy constructs)
FiRE (f-SHIN)
There is no standard yet

Queries
Querying a knowledge base
Conjunctive query answering is “non-standard” reasoning
SPARQL queries:
work on ABox and TBox
are not always supported
over entailments
allow for a weak form of
closed world assumption
scale well on big knowledge
bases
are low-level and diﬃcult to
use for TBox queries
DL queries:
are limited to the TBox
are not always supported by
all reasoners
do not allow for closed
world negation
can be slow when reasoning
with many individuals
are easy to write and
interpret
SPARQL-DL/SPARQL-OWL queries:
“bridge” between the two approaches
are not (yet) a W3C standard
do not have “industrial” strength (are still experimental)

Queries
SPARQL queries
Queries on instances are very ﬂexible due to the power of the
SPARQL language, which in the 1.1 version supports:
Property paths
Aggregates (COUNT, SUM, MIN, MAX, AVG)
Subqueries
Updates
A weak form of CWA (using MINUS and NOT EXISTS)

Queries
DL queries
On the contrary, DL queries in SPARQL are complicated
Example:
“Find C where ∃hasShape.Round C”
PREFIX [...]
SELECT DISTINCT ?q
WHERE {
?x rdfs:subClassOf ?q ;
a owl:Restriction ;
owl:onProperty :hasShape ;
owl:someValuesFrom :Round
}

Queries
Inference in big KBs
Support for reasoning is needed if the used language is not only
RDFS
Some inference can be performed using SPARQL itself (e.g.
class hierarchy using property paths)
If a more expressive language is used, two choices:
a reasoner makes inferred data available
a reasoner rewrites the query in order to incorporate the ontology
When the knowledge base is big, several strategies can be used:
Query approximation
Theory approximation
Ontology modularization

Storage
Storage
When an knowledge base is big, suitable storage solutions are
needed
“Traditional” approaches use single ﬁles for every ontology
(reasoning is performed on in-memory models)
Triple (or quad) stores can store and retrieve many triples
eﬃciently
OWLIM (OWLIM-Lite, OWLIM-SE, OWLIM-Enterprise)
Jena (TDB, SDB, with PostgreSQL)
Sesame
AllegroGraph
OpenLink Virtuoso
Dydra (storage in the cloud)
Is inference performed? How?
Custom engines vs existing engines
Rule-based engines: forward chaining vs backward chaining

Storage
OWLIM
Website: http://owlim.ontotext.com
Family of three semantic repositories of industrial strength
Uses a rule engine supporting RDFS, OWL-Horst, OWL 2 QL,
OWL 2 RL
Supports the full SPARQL 1.1 (+ Update)
VERY scalable: the Lite (free) version scales up to tens of
millions of triples

Storage
Dydra
Website: http://dydra.com
Software as a Service (SaaS) with proprietary implementation
Quad store, no reasoning, supports most of SPARQL 1.1
Can try the SPARQL endpoints (w/ and w/o inferences):
http://dydra.com/nick/milantransport/sparql
http://dydra.com/nick/milantransport_inf/sparql
PREFIX : <http://www.semanticweb.org/owlapi/
ontologies/MilanTransportOntology#>
SELECT DISTINCT ?n
WHERE {?f :name "S.BABILA" .
?f :connected{2} ?t .
?t :name ?n
FILTER(?t != ?f)}

Conclusions
Conclusions
Semantic Web and huge data sources are becoming more and
more popular
Reasoning should scale well, but the whole point of DLs is to be
expressive
Diﬀerent approaches to representation and to reasoning are
needed
Research is moving towards scalable reasoning for expressive
logics

Conclusions
Conclusions
The (sub)language, the storage model, the inference engine and
the query language have to be chosen as a whole
Reasoners for expressive languages make it appealing to use
their own APIs for queries and are currently most used for query
rewriting, but may not scale well with many data
Native storage can be extremely scalable for big ABoxes and
makes it possible to use standard query languages such as
SPARQL, but such use is complex and the supported
(sub)languages are less expressive than the full OWL
The level of expressivity and the expected scale should be
assessed beforehand

Semantic Web languages: Expressivity vs scalability

More Related Content

What's hot

Similar to Semantic Web languages: Expressivity vs scalability

Recently uploaded

Semantic Web languages: Expressivity vs scalability