2. Need
The key idea of the Semantic Web is to represent
information on and about the current Web using formal
languages that computers can process and reason with.
Recapturing the information on the current Web and
adding additional descriptions of Web resources
(metadata) would allow our machines to support us in
performing intelligent tasks such as providing analysis by
combining information from multiple sources.
3. Approach
The field of expert systems has developed a number of
logic-based knowledge representation languages for
describing both the domain knowledge and task
knowledge of such systems.
Domain models (a conceptual model of the domain
that incorporates both behavior and data) that
captured the agreement of a group of experts and were
represented using formal languages for reusability came
to be referred to by the term ontology.
Although the goal of modelling is similar (adding domain
knowledge to an information system) the context of the
Semantic Web is different from these use cases.
4. Approach
The Semantic Web is envisioned to
connect knowledge bases on a web-
wide scale. The particular
challenges related to this situation
necessitated the design of new
languages.
In this unit we will have a non-
exhaustive introduction to
knowledge representation
(modelling) using Semantic Web
languages such as RDF and OWL.
5. Ontology-based Knowledge
Representation
The key challenge of the Semantic Web is to ensure
a shared interpretation of information.
Related information sources should use the same
concepts to reference the same real-world entities or
at least there should be a way to determine if two
sources refer to the same entities, but possibly using
different vocabularies.
Ontologies and ontology languages are the key
enabling technology in this respect.
An ontology, by its most cited definition in AI, is a
shared, formal conceptualization of a domain, i.e.
a description of concepts and their relationships.
6. Ontology-based Knowledge
Representation
Ontologies are domain models with two special
characteristics, which lead to the notion of shared
meaning or semantics:
◦ Ontologies are expressed in formal languages with
a well-defined semantics.
◦ Ontologies build upon a shared understanding
within a community. This understanding represents
an agreement among members of the community
over the concepts and relationships that are
present in a domain and their usage.
8. Ontology-based
Knowledge
Representation
Glossary: An alphabetical
list of words relating to a
specific subject, text, or
dialect, with explanations;
a brief dictionary.
Semantic Network: A
knowledge structure that
depicts how concepts are
related to one another
and illustrates how they
interconnect.
9. Ontology-based Knowledge
Representation
Thesaurus: A book that lists words in groups of
synonyms and related concepts.
Folksonomy: A user-generated system of classifying
and organizing online content into different categories
by the use of metadata such as electronic tags.
10. Ontology-based Knowledge
Representation
Lightweight Ontology:
◦ Typically applied to ontologies that make a distinction between classes,
instances and properties, but contain minimal descriptions of them.
◦ Concepts are connected by rather general associations than strict formal
connections.
Heavyweight Ontologies:
◦ Allow to describe more precisely how classes are composed of other
classes, and provide a richer set of constructs to constrain how
properties can be applied to classes.
Full expressivity of first order logic (FOL):
◦ Define to a great detail the kind of instances a concept may have and in
which cases two instances may be related using a certain relationship.
◦ Sufficiently expressive to represent the natural language statements in a
concise way.
13. The Resource Description
Framework (RDF) and RDF
Schema
The Resource Description Framework (RDF) is a
World Wide Web Consortium (W3C) standard
originally designed as a data model for metadata. It
has come to be used as a general method for
description and exchange of graph data. RDF
provides a variety of syntax notations and data
serialization formats with Turtle (Terse RDF Triple
Language) currently being the most widely used
notation.
RDF provides a data model that supports fast
integration of data sources by bridging semantic
differences. It is often used (and was initially
designed) for representing metadata about other Web
resources such as XML files.
18. The Resource Description Framework (RDF)
and RDF Schema
The Resource Description Framework (RDF) was originally created to
describe resources on the World Wide Web in particular web pages and
other content.
RDF is domain-independent and can be used to model both real-world
objects and information resources.
An RDF document describes a directed graph, i.e. a set of nodes that are
linked by directed edges (“arrows”).
Both nodes and edges are labeled with identifiers to distinguish them.
19. RDF: Why Graph Model?
RDF was not conceived for the task of structuring documents like XML (Tree), but
rather for describing general relationships between objects of interest.
RDF was intended to serve as a description language for data on the WWW and other
electronic networks. Information in these environments is typically stored and managed
in decentralized ways, and indeed it is very easy to combine RDF data from multiple
sources.
The simple union of two tree structures is not a tree anymore, so that additional
choices must be made to even obtain a well-formed XML document when combining
multiple inputs.
20. RDF:URIs
Issue with General names in RDFs:
◦ Different names (Identifiers) for same resource.
◦ Same name (Identifier) may be used for different resources.
◦ Relating RDFs in such case becomes an issue.
Moreover, it allows an RDF-aware system to access a new non-RDF data
source.
An RDF adapter first assigns unique resource identifiers (URIs) to resources
in the non-RDF data source (when URIs are not already available) and then
generates statements describing resource properties.
21. RDF:URIs
RDF uses so-called Uniform Resource Identifiers (URIs) that are
generalization of URLs.
In general this might be any object that has a clear identity in the context of
the given application: books, places, people, publishing houses, events,
relationships among such things, all kinds of abstract concepts, and many
more.
Such resources can obviously not be retrieved online and hence their URIs
are used exclusively for unique identification.
URIs that are not URLs are sometimes also called Uniform Resource Names
(URNs).
22. RDF:URIs
Construction Scheme of URIs
Scheme:[//authority]path[?query][#fragment]
Note: Parts in brackets are optional
◦ Scheme:
The name of a URI scheme that classifies the type of URI.
http, ftp, mailto, file, irc
◦ Authority:
URIs of some URI schemes refer to “authorities” for structuring the available
identifiers further. On the Web, this is typically a domain name, possibly with
additional user and port details.
semantic-web-book.org, john@example.com, example.org:8080
23. RDF:URIs
◦ Path:
Location of the resource
◦ Query:
The query is an optional part of the URI that provides additional non-
hierarchical information
◦ Fragment
The optional fragment part provides a second level of identifying resources
24. RDF: Model
The RDF data model vaguely resembles an object-oriented data model.
It consists of entities, represented by unique identifiers, and binary
relationships, or statements, between those entities.
RDF expressions are formed by making statements (triples) of the form
(subject, predicate, object).
The subject of a statement must be a resource (blank or with a URI), the
predicate must be a URI and the object can be either kind of resource or a
literal.
In a graphical representation of an RDF statement, the source of the
relationship is called the subject, the labeled arc is the predicate (also called
property), and the relationship’s destination is the object.
Both statements and predicates are first-class objects, which means they can
be used as the subjects or objects of other statements (see the section on
reifying statements)
25. RDF
Example: Delhi is capital of India
The triple generated from this sentence is:
<Delhi> <capital of> <India>.
where Delhi is the subject, capital of is the predicate
and India is the object.
26. RDF
The triples can also be represented in the form of
URIs (Uniform Resource Identifier).
1. <http://www.abc.org/subject/Delhi>
2. <http://www.abc.org/predicate/capitalOf>
3. <http://www.abc.org/object/India>.
Every statement is terminated by a full-stop in RDF
triple.
27. RDF: Model
The shown is an example statement, which can be read as:
“The resource http://www.daml.org/projects/#11 has a property
hasHomepage (described in
http://www.semanticweb.org/schema-daml01/#hasHomepage)
the value of which is the resource http://www-
db.stanford.edu/OntoAgents.”
The three parts of this statement are
◦ the subject http://www.daml.org/projects/#11
◦ the predicate http://www.semanticweb.org/schema-daml01/#hasHomepage
◦ the object http://www-db.stanford.edu/OntoAgents.
A set of statements can be visualized as a graph.
28. RDF: Model By adding the property http://purl.org/dc/elements/1.1/Creator
with value Stefan Decker (a literal).
29. RDF: Model
The RDF data model distinguishes between resources, which are object
identifiers represented by URIs, and literals, which are just strings.
The subject and the predicate of a statement are always resources, while the
object can be a resource or a literal.
In RDF diagrams, resources are always drawn as ovals, and literals are
drawn as boxes.
The XML-namespace syntax is used to abbreviate URIs in statements.
For instance, we can define the substitution of the namespace-prefix sw for
http://www.SemanticWeb.org/schema-daml01/#, and then write simply
sw:hasHomepage.
30. RDF:
The above brief RDF document describes a person named
Rembrandt.
31. RDF:
The following brief RDF document describes a
person named Rembrandt.
In this visualization the nodes are the subjects and
objects of statements, labelled with the URI or literal
or left blank, and the edges connect the subjects and
objects of statements and are labelled with the URI of
the property.
33. RDF
The resources from the FOAF namespace that we
use are defined separately in the so-called Friend-
of-a-Friend (FOAF) ontology, which resides at
http://xmlns.com/foaf/0.1/index.rdf.
RDF document on the Web defining some of the
terms that we are using in the exampleresides at
http://www.w3.org/1999/02/22-rdf-syntax-ns.
The RDF language provides the basic term to
assign a type to a resource (rdf :type) and to declare
a resource as a property.
38. RDFS
We saw how propositions about
single resources can be made in
RDF.
RDF Schema is a simple extension
of RDF defining a modelling
vocabulary with notions of classes
and subclasses.
RDF Schema provides a data-
modelling vocabulary for RDF
data.
39. RDFS
When, we want to link various RDF documents to connect different
sources in W3C, one may usually introduce new terms not only for
subjects/objects but also to the predicates.
When introducing and employing such a vocabulary, the user will naturally
have a concrete idea about the used terms’ meanings.
From the “perspective” of a computer system, however, all the terms
introduced by the user are merely character strings without any prior fixed
meaning.
Thus, the semantic interrelations must be explicitly communicated to the
system in some format in order to enable it to draw conclusions that rely
on this kind of human background knowledge.
40. RDFS
By virtue of RDF Schema (short RDFS), a further part of the W3C RDF
recommendation which we will deal with in the following sections, this
kind of background information – so-called terminological knowledge or
alternatively schema knowledge – about the terms used in the vocabulary
can be specified.
In the first place, RDFS is nothing but another particular RDF vocabulary.
Consequently, every RDFS document is a well-formed RDF document.
This ensures that it can be read and processed by all tools that support
just RDF, whereby, however, a part of the meaning specifically defined for
RDFS (the RDFS semantics) is lost.
41. RDFS: Classes & Instances
The predefined URI rdf:type is used to mark resources as instances of a
class.
In order to clearly separate semantics and syntax, we always use the
term “class” to denote a set of resources (being entities of the real world),
whereas URIs which represent or refer to a class are called class names.
An URI does not provide direct information whether it refers to a single
object or a class.
42. RDFS: Classes & Instances
Therefore, RDFS provides the possibility to indicate
class names by explicitly “typing” them as classes.
In other words: it can be specified that, e.g., the
class ex:Textbook belongs to the class of all
classes.
43. RDFS: Subclasses and Class
Hierarchies
If we now searched for instances of the class of books
denoted by ex:Book, the URI book:uri denoting
“Foundations of Semantic Web Technologies” would not
be among the results.
Human background knowledge entails that every
textbook is a book and consequently every instance of
the ex:Textbook class is also an instance of the ex:Book
class.
Yet, an automatic system not equipped with this kind of
linguistic background knowledge is not able to come up
with this conclusion.
44. RDFS: Subclasses and Class
Hierarchies
There would be the option to simply add the
following triple to the document, explicitly stating an
additional class membership:
But, for every such case, we have to explicitly add
this in the corresponding RDF document.
A much more reasonable and less laborious way
would be to just specify (one may think of it as a
kind of “macro”) that every textbook is also a book.
45. RDFS: Subclasses and Class Hierarchies
The RDFS vocabulary provides a predefined way to
explicitly declare this subclass relationship between
two classes, namely, via the predicate
rdfs:subClassOf.
46. RDFS: Properties and Sub-Properties
Special role is played by those URIs used in triples
in the place of the predicate.
Some examples include ex:hasIngredient,
ex:publishedBy and rdf:type.
For expressing that a URI refers to a property (or
relation), the RDF vocabulary provides the class
name rdf:Property which by definition denotes the
class of all properties.
47. RDFS: Properties and Sub-Properties
RDFS allows for the specification of subproperties.
This results in interpreting
as
49. Web Ontology Language (OWL)
The Web Ontology Language (OWL) was designed
to add the constructs of Description Logics (DL) to
RDF, significantly extending the expressiveness of
RDF Schema both in characterizing classes and
properties.
DLs are a family of formal knowledge representation
languages, used in artificial intelligence to describe
and reason about the relevant concepts of an
application domain.
DLs are of particular importance in providing a
logical formalism for ontologies and the Semantic
Web.
50. Web Ontology Language (OWL)
The Web Ontology Language is in fact a set of three
languages with increasing expressiveness: OWL
Lite, OWL DL and OWL Full. These languages are
extensions of each other (OWL Lite ⊆ OWL DL ⊆
OWL Full) both syntactically and semantically.
Although it is generally believed that languages of
the OWL family would be an extension of RDF(S) in
the same sense, this is only true for OWL Full, the
most expressive of the family (RDF(S) ⊆ OW LFull).
54. Comparison to the Unified Modelling
Language (UML)
UML is most commonly used in the requirements
specification and design of object oriented software
in the middle tier of enterprise applications.
The chief difference between UML and
RDF(S)/OWL is their modelling scope.
UML contains modelling primitives specific for a
special kind of information resource, namely objects
in an information system characterized by their
static attributes and associations, but also their
dynamic behavior.
55. Comparison to the Unified Modelling
Language (UML)
Unique features of RDF/OWL
◦ Less Constrained than the UML, , which means that many
RDF models have no equivalent in UML.
◦ OWL allows to describe defined classes, i.e. definitions that
give necessary and sufficient conditions for an instance to be
considered as a member of the class.
◦ RDF/OWL both treat properties as first class citizens of the
language. Properties are global: they do not belong to any
class, while UML attributes and associations are defined as
part of the description of a certain class.
◦ Properties can be defined as subproperties of other
properties, This is possible, but much less straightforward in
UML.
56. Comparison to the Unified Modelling
Language (UML)
Unique features of RDF/OWL
◦ Classes can be treated as instances, allowing for meta-
modelling.
◦ RDF reification is more flexible than the association class
mechanism of UML. For example, statements concerning
literal values can also be reified in RDF. These would be
modelled as attributes in UML and association classes cannot
be attached to attributes.
◦ All non-blank RDF resources are identified with a URI, UML
classes, instances, attributes etc. do not have such an ID.
◦ Instances can and usually have multiple types. (This is not to
be confused with multiple inheritance, which is supported by
both UML and RDF/OWL.)
57. Comparison to the Unified Modelling
Language (UML)
Unique features of UML
◦ UML has the notion of relationship roles, which is not
present in RDF/OWL.
◦ UML allows n-ary relations, which are not part of RDF,
although they can be re-represented in a number of ways.
◦ Two common types of part-whole relations are available in
UML (aggregation and composition). These can be
remodelled in OWL to some extent.
◦ UML makes a distinction between attributes and
associations. This is also different from the distinction
between datatype and object-properties in OWL.
58. Comparison to E/R model and the relational
model
The Entity/Relationship (E/R) model is commonly used
in information modelling for the data storage layer of
applications, because it maps easily to the relational
model used for defining data structures in database
management systems (RDBMS).
The E/R language contains the constructs that are
necessary for modelling information on the basis of
relationships.
Relationships are characterized in terms of the arity of
the relationship and the participating entity sets.
Similar to the reification features in RDF and UML, the
E/R model also allows attaching attributes to a
relationship and including relationships in other
relationships.
59. Comparison to E/R model and the relational
model
Entity sets roughly correspond to classes in UML. The modelling of
entity sets is less of a concern to the E/R model, the only
predefined relationship type between entity sets being
generalization.
The semantics of this construct is limited to attribute inheritance,
i.e. it simply means that the lower level entity sets have all the
attributes of the higher level entity sets. Unlike in RDF,
generalization between relationships (rdfs:subPropertyOf ) is not
allowed.
A special feature of the E/R model is the notion of keys of entity
sets, i.e. sets of attributes whose values together uniquely identify
a given entity. As we will see, keys of single attributes can be
modelled in RDF by making the given property functional and
inverse functional. Complex keys, however, cannot be accurately
modelled in RDF/OWL.
61. Comparison to the Extensible Markup
Language (XML) and XML Schema
Up to date XML is the most commonly used technology for the exchange of
structured information between systems and services. From all languages
discussed the role of XML is thus the most similar to RDF in its purpose.
The most commonly observed similarity between XML and RDF is a
similarity between the data models: a directed graph for RDF, and a
directed, ordered tree for XML.
Much like RDF, XML itself is merely a common conceptual model and
syntax for domain specific languages each with their own vocabulary.
schemas written in XML schema languages not only define the types of
elements and their attributes but also prescribe syntax i.e. the way elements
are allowed to be nested in the tree.
Unlike XML, Schema languages for RDF (RDF Schema and OWL) do not
impose constraints directly on the graph model but effect the possible
interpretations of metadata.