Semantic Web: From
Representations to Applications
Guus Schreiber
Free University Amsterdam
Co-chair W3C Web Ontology Working Group
Co-chair W3C Semantic Web Best Practices
and Deployment Working Group
Overview
 Representations:
 Reflections on the making of OWL and its
relation to RDF
 Using representations
 Best practices (as far as we know then
now) to help application developers
 Applications
 Examples from the SWBPD weblog
Disclaimer
 This presentation describes work of
many different people, including
many participating in respective W3C
Working Groups as well as others
W3C Web Ontology Working Group
 Chartered to develop the Ontology
Vocabulary for the Semantic Web.
 Starting point: DAML+OIL
 Started in November 2001
 Factions:
 logicians (Description Logic, KIF)
 knowledge/ontology engineers
 RDF developers
 OWL Recommendation published 10
February 2004
Working group communication
 Mailing lists
 working-group list: 8,000 messages in two years
 public comments list: 600 messages in 18
months
 Telecons
 60 telecons of 60-90 minutes with 10-30 people
 simultaneous scribing in IRC (chat) channel
 Face-to-face meetings
 five two-day meetings during first 15 months
 All proceedings in the public domain:
http://www.w3.org/2001/sw/WebOnt
Key issue to be resolved:
5.3 Semantic Layering
 OWL is expected to
be semantically
compatible with
RDF(S).
 Problems were
foreseen with
aligning a DL-style
model theory with
the RDF model
theory
The Semantic Layering debate
 Excerpt from a telecon debate:
"You are not creating a semantic web, but
semantic islands with high fences"
"But your are creating a semantic swamp,
with crocodiles and snakes"
 What do you prefer?
Consensus on Semantic Layering
 OWL Full ("Large OWL", "Great Horned OWL")
 Free mixing of OWL and RDF = high expressivity
 Non-standard formalization
 Tractability not guaranteed
 OWL DL ("Fast OWL")
 Maximize expressiveness while retaining tractability
 Standard formalization
 Same language constructs as OWL Full
 Constraints on RDF/OWL vocabulary use
 Correspondence theorem links the two styles of
semantics: entailments in OWL DL also hold in OWL
Full.
RDF/OWL schema constructs
 RDFS Schema
 (sub)classes, (sub)properties, domain, range,
datatypes (using XML Schema)
 OWL Lite
 cardinality 0/1, local property restrictions,
inverse/transitive/symmetric properties,
(in)equality of classes/instances
 OWL DL
 enumeration, disjunction, conjunction, negation,
hasValue
 OWL Full
 meta-modeling
Is RDF/OWL just another
datamodelling/KR language?
Key differences:
 All classes/properties/individuals have a URI as
identifier
 RDF/XML exchange syntax enables
interoperability
XML features
 UTF-8 character set
 Support for multilinguality
 Use of XML Schema datatypes: numeric, date,
time, etc.
For the rest: RDF/OWL is state-of-the-art
concept language
Semantic Web Best Practices and
Deployment Working Group
 Started 1 March 2004 => early days
 Co-chairs David Wood and Guus Schreiber
 30+ participants
 Objective: support for semantic-web
application developer
 Focus on “low hanging fruit”
 Publication of key ontologies/vocabularies,
development guidelines, ontology-design
patterns, repositories, links to related
techniques, ……
 High expectations, not much effort (yet)
Issues for publishing ontologies:
good and bad ontologies?!
 Good ontologies are used
 Good ontologies represent some form
of consensus in a community
 Good ontologies are maintained
 Good ontologies do not need to be
complex
 Good ontologies may contain
“mistakes”
ontology = community consensus
N.B.
It is a contradiction
in terms to talk
about “creating my
own ontology”!
Source: Financial Times,
e-procurement, Oct. 2000
Thesauri and ontologies
 ISWC’03 Semantic Web Challenge
showed that thesauri are important
resources for SW applications
 Typically weak semantic structure
 Approach in Best Practices WG:
 Phase 1: “as-is” conversion
 Phase 2: additional ontological
interpretations/extensions
Human-readable syntax?!
<owl:Class rdf:ID="MozartDaPonteOpera">
<owl:equivalentClass>
<owl:Class>
<owl:oneOf rdf:parseType="Collection">
<Opera rdf:about="#NozzDiFigaro"/>
<Opera rdf:about="#DonGiovanni"/>
<Opera rdf:about="#CosiFanTutte"/>
</owl:oneOf>
</owl:Class>
</owl:equivalentClass>
</owl:Class>
OWL abstract syntax
 Used in a (very) sloppy fashion in this
presentation
 Developed for specifying the OWL DL
semantics
UML Profile for OWL
 Under
development at
OMG
 Not trivial, e.g.
RDF/OWL
properties are
different from
UML associations
N3 Turtle syntax
 See note by Dave Beckett (Bristol)
:pressureInHg a owl:ObjectProperty;
rdfs:domain :DiastolicBloodPressure;
rdfs:range xsd:nonNegativeInteger.
@prefix owl:
<http://www.w3.org/2002/07/owl#>.
@prefix xsd:
<http://www.w3.org/2001/XMLSchema#>.
Ontology engineering patterns
 Best practices for frequently occurring
modeling problems
 WG documents outline alternatives with
pros and cons
 Currently three notes published:
 Classes as values
 N-ary relations
 Specification of value sets
 Planned:
 Part-of, numeric constraints, QCRs
Representing value sets
 Intuitive representation of color value set:
class/datatype color
with instances/values “red”, “white”, etc.
 But suppose we want to talk about a subtype of
“red”, e.g. “vermillion red”
 Pattern:
 Represent values as subclass hierarchy of value type
 This preserves flexibility
 Use anonymous instances as property values
“This porcelain vase has as color some value of vermillion
red”
 See note by Alan Rector
http://www.w3.org/TR/swbp-specified-values/
Classes as values
 Common problem when using a hierarchy
of concepts for indexing purposes
 Example: indexing books with concepts
from the ACM computer-science subject
hierarchy
 See draft technical note by Noy:
 Four options with different merits
 See note by Natasha Noy:
http://www.w3.org/TR/swbp-classes-as-values/
Numeric constraints and user-
defined XML Schema datatypes
 Example: “an elevated diastolic blood
pressure is a diastolic blood pressure of 90
Hg or more”
 Currently no simple way to represent this in
OWL
 User-defined XML Schema datatypes could
provide a solution
 Currently not possible for detailed technical
reasons
 SWBPD task force is active in trying to
solve this problem (Jeremy Carroll, HP)
Representing a numeric constraint
through a datatype
Class(DiastolicBloodPressure)
Propert(pressureInHg
domain(DiastolicBloodPressure)
range(xsd:nonNegativeInteger))
Class(ElevatedDiastolicBloodPressure
subClassOf(DiastolicBloodPressure)
subClassOf(Restriction
onProperty(pressureInHg)
allValuesFrom(ex:NinetyPlus))
Plus corresponding definition for the
ex:NinetyPlus user-defined datatype
Pervasive issue: metamodelling
 OWL DL requires strict separation of
classes and instances
 But on the Semantic Web my
instances may be your classes!
 Metamodelling features especially
required in vocabulary/ontology
mapping and/or interpretation
 Cf. Protégé metamodelling facilities
RDF in XHTML
 How to mark up your (X)HTML page?
 Various proposals under discussion in
RDF-in-XHTML Task Force
http://lists.w3.org/Archives/Public/public-rdf-in-xht
 Link typing using “rel” attribute?!
 Consequences for HTTP GET?!
Other work (planned) in the W3C
Best Practices WG: a selection
 Tutorial page
http://www.w3.org/2001/sw/BestPractices/Tutorials
 Tools page
 May just build on work of others, e.g. see DAML
tool-assessment study
http://semwebcentral.org/
 Publication of vocabularies/ontologies
 WordNet is first on the list
 Units and measures is likely next target
 Links to MPEG, Topic Maps
Applications and demo’s
 Weblog
http://esw.w3.org/mt/esw/archives/cat_application
 Four examples
1. AKTive Space: CS research in the UK
2. DOPE: Drug Ontology Project at Elsevier
3. Building Finder (USC)
4. Finnish Museums on the Web
AKTive Space
 AKT project (Shadbolt et al. ), winner
Semantic Web Challenge 2003)
 Integration of heterogeneous sources
 Papers, researchers, projects
 430 Mb in total
 RDF/OWL used for syntactic interoperability
 Storage/access issues
 Schema mapping is required
 Referential integrity is a problem
 Use of owl:sameAs
 Use automatic techniques in combination with
user approval
DOPE: thesaurus-based search of
large document repositories
 Stuckerschmidt et al. (2003)
 EMTREE thesaurus (MesSH-based)
 Documents
 5M Medline abstracts
 500K full-text articles of Elsevier
 Automatic document indexing
 RDF used for syntactic interoperability
 RDF wrapper for SOAP-based access to documents
 Disambiguation of search terms
 Visualization of search results through semantic
categories
 Needed to prevent information overflow
Building Finder: integrating image
analysis and textual sources
 Knoblock et al. (USC/ISI)
 Multiple heterogeneous sources
 Satellite images (Microsoft Terraservice)
 Road map info (US)
 Address information (white pages)
 Image analysis techniques to map
satellite data to road map
 RDF used for syntactic interoperability
Finnish Museums on the
Semantic Web
 Hyvonen et al. (2004)
 Multiple museum collections
 Indexed with multiple ontologies
 Artifact, material, actor, location, time,
event
 RDF used for syntactic interoperability
 Ontologies used for query
specialization/generalization
Cultural heritage collections:
possible use case
 A person is interested
in Fauve paintings
 There is a digital
collection with images
of paintings of Andre
Derain
 The Derain images
match the query,
despite the fact that
“Fauve does not appear
in the annotation.
Application issues (1)
 Public domains are promising application
areas
 Medicine, cultural heritage, digital libraries
 Many existing vocabularies & annotations
 Application-pull
 Information integration/presentation is
prime use case
 Multimedia is important focus
 Requires multi-disciplinary approach
Application issues (2)
 Free access to vocabularies /
ontologies is a real problem
 AAT, EuroWordNet
 Similar applications can be built for
company intranets
 NOTE for academics: be conscious of
unfair criticism of application papers
More information
See home page of the Best Practices
WG:
http://www.w3.org/2001/sw/BestPractices/
All proceedings are public
Related European effort:
IST Knowledge Web network
 http://knowledgeweb.semanticweb.org
 Objectives (selection):
 Research integration
 Summer schools
 Educational material
 Showcase applications
 Industrial dissemination
 Started 1-1-2004 and runs for four years

Semantic Web: From Representations to Applications

  • 1.
    Semantic Web: From Representationsto Applications Guus Schreiber Free University Amsterdam Co-chair W3C Web Ontology Working Group Co-chair W3C Semantic Web Best Practices and Deployment Working Group
  • 2.
    Overview  Representations:  Reflectionson the making of OWL and its relation to RDF  Using representations  Best practices (as far as we know then now) to help application developers  Applications  Examples from the SWBPD weblog
  • 3.
    Disclaimer  This presentationdescribes work of many different people, including many participating in respective W3C Working Groups as well as others
  • 4.
    W3C Web OntologyWorking Group  Chartered to develop the Ontology Vocabulary for the Semantic Web.  Starting point: DAML+OIL  Started in November 2001  Factions:  logicians (Description Logic, KIF)  knowledge/ontology engineers  RDF developers  OWL Recommendation published 10 February 2004
  • 5.
    Working group communication Mailing lists  working-group list: 8,000 messages in two years  public comments list: 600 messages in 18 months  Telecons  60 telecons of 60-90 minutes with 10-30 people  simultaneous scribing in IRC (chat) channel  Face-to-face meetings  five two-day meetings during first 15 months  All proceedings in the public domain: http://www.w3.org/2001/sw/WebOnt
  • 6.
    Key issue tobe resolved: 5.3 Semantic Layering  OWL is expected to be semantically compatible with RDF(S).  Problems were foreseen with aligning a DL-style model theory with the RDF model theory
  • 7.
    The Semantic Layeringdebate  Excerpt from a telecon debate: "You are not creating a semantic web, but semantic islands with high fences" "But your are creating a semantic swamp, with crocodiles and snakes"  What do you prefer?
  • 8.
    Consensus on SemanticLayering  OWL Full ("Large OWL", "Great Horned OWL")  Free mixing of OWL and RDF = high expressivity  Non-standard formalization  Tractability not guaranteed  OWL DL ("Fast OWL")  Maximize expressiveness while retaining tractability  Standard formalization  Same language constructs as OWL Full  Constraints on RDF/OWL vocabulary use  Correspondence theorem links the two styles of semantics: entailments in OWL DL also hold in OWL Full.
  • 9.
    RDF/OWL schema constructs RDFS Schema  (sub)classes, (sub)properties, domain, range, datatypes (using XML Schema)  OWL Lite  cardinality 0/1, local property restrictions, inverse/transitive/symmetric properties, (in)equality of classes/instances  OWL DL  enumeration, disjunction, conjunction, negation, hasValue  OWL Full  meta-modeling
  • 10.
    Is RDF/OWL justanother datamodelling/KR language? Key differences:  All classes/properties/individuals have a URI as identifier  RDF/XML exchange syntax enables interoperability XML features  UTF-8 character set  Support for multilinguality  Use of XML Schema datatypes: numeric, date, time, etc. For the rest: RDF/OWL is state-of-the-art concept language
  • 11.
    Semantic Web BestPractices and Deployment Working Group  Started 1 March 2004 => early days  Co-chairs David Wood and Guus Schreiber  30+ participants  Objective: support for semantic-web application developer  Focus on “low hanging fruit”  Publication of key ontologies/vocabularies, development guidelines, ontology-design patterns, repositories, links to related techniques, ……  High expectations, not much effort (yet)
  • 12.
    Issues for publishingontologies: good and bad ontologies?!  Good ontologies are used  Good ontologies represent some form of consensus in a community  Good ontologies are maintained  Good ontologies do not need to be complex  Good ontologies may contain “mistakes”
  • 13.
    ontology = communityconsensus N.B. It is a contradiction in terms to talk about “creating my own ontology”! Source: Financial Times, e-procurement, Oct. 2000
  • 14.
    Thesauri and ontologies ISWC’03 Semantic Web Challenge showed that thesauri are important resources for SW applications  Typically weak semantic structure  Approach in Best Practices WG:  Phase 1: “as-is” conversion  Phase 2: additional ontological interpretations/extensions
  • 15.
    Human-readable syntax?! <owl:Class rdf:ID="MozartDaPonteOpera"> <owl:equivalentClass> <owl:Class> <owl:oneOfrdf:parseType="Collection"> <Opera rdf:about="#NozzDiFigaro"/> <Opera rdf:about="#DonGiovanni"/> <Opera rdf:about="#CosiFanTutte"/> </owl:oneOf> </owl:Class> </owl:equivalentClass> </owl:Class>
  • 16.
    OWL abstract syntax Used in a (very) sloppy fashion in this presentation  Developed for specifying the OWL DL semantics
  • 17.
    UML Profile forOWL  Under development at OMG  Not trivial, e.g. RDF/OWL properties are different from UML associations
  • 18.
    N3 Turtle syntax See note by Dave Beckett (Bristol) :pressureInHg a owl:ObjectProperty; rdfs:domain :DiastolicBloodPressure; rdfs:range xsd:nonNegativeInteger. @prefix owl: <http://www.w3.org/2002/07/owl#>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
  • 19.
    Ontology engineering patterns Best practices for frequently occurring modeling problems  WG documents outline alternatives with pros and cons  Currently three notes published:  Classes as values  N-ary relations  Specification of value sets  Planned:  Part-of, numeric constraints, QCRs
  • 20.
    Representing value sets Intuitive representation of color value set: class/datatype color with instances/values “red”, “white”, etc.  But suppose we want to talk about a subtype of “red”, e.g. “vermillion red”  Pattern:  Represent values as subclass hierarchy of value type  This preserves flexibility  Use anonymous instances as property values “This porcelain vase has as color some value of vermillion red”  See note by Alan Rector http://www.w3.org/TR/swbp-specified-values/
  • 21.
    Classes as values Common problem when using a hierarchy of concepts for indexing purposes  Example: indexing books with concepts from the ACM computer-science subject hierarchy  See draft technical note by Noy:  Four options with different merits  See note by Natasha Noy: http://www.w3.org/TR/swbp-classes-as-values/
  • 22.
    Numeric constraints anduser- defined XML Schema datatypes  Example: “an elevated diastolic blood pressure is a diastolic blood pressure of 90 Hg or more”  Currently no simple way to represent this in OWL  User-defined XML Schema datatypes could provide a solution  Currently not possible for detailed technical reasons  SWBPD task force is active in trying to solve this problem (Jeremy Carroll, HP)
  • 23.
    Representing a numericconstraint through a datatype Class(DiastolicBloodPressure) Propert(pressureInHg domain(DiastolicBloodPressure) range(xsd:nonNegativeInteger)) Class(ElevatedDiastolicBloodPressure subClassOf(DiastolicBloodPressure) subClassOf(Restriction onProperty(pressureInHg) allValuesFrom(ex:NinetyPlus)) Plus corresponding definition for the ex:NinetyPlus user-defined datatype
  • 24.
    Pervasive issue: metamodelling OWL DL requires strict separation of classes and instances  But on the Semantic Web my instances may be your classes!  Metamodelling features especially required in vocabulary/ontology mapping and/or interpretation  Cf. Protégé metamodelling facilities
  • 25.
    RDF in XHTML How to mark up your (X)HTML page?  Various proposals under discussion in RDF-in-XHTML Task Force http://lists.w3.org/Archives/Public/public-rdf-in-xht  Link typing using “rel” attribute?!  Consequences for HTTP GET?!
  • 26.
    Other work (planned)in the W3C Best Practices WG: a selection  Tutorial page http://www.w3.org/2001/sw/BestPractices/Tutorials  Tools page  May just build on work of others, e.g. see DAML tool-assessment study http://semwebcentral.org/  Publication of vocabularies/ontologies  WordNet is first on the list  Units and measures is likely next target  Links to MPEG, Topic Maps
  • 27.
    Applications and demo’s Weblog http://esw.w3.org/mt/esw/archives/cat_application  Four examples 1. AKTive Space: CS research in the UK 2. DOPE: Drug Ontology Project at Elsevier 3. Building Finder (USC) 4. Finnish Museums on the Web
  • 28.
    AKTive Space  AKTproject (Shadbolt et al. ), winner Semantic Web Challenge 2003)  Integration of heterogeneous sources  Papers, researchers, projects  430 Mb in total  RDF/OWL used for syntactic interoperability  Storage/access issues  Schema mapping is required  Referential integrity is a problem  Use of owl:sameAs  Use automatic techniques in combination with user approval
  • 30.
    DOPE: thesaurus-based searchof large document repositories  Stuckerschmidt et al. (2003)  EMTREE thesaurus (MesSH-based)  Documents  5M Medline abstracts  500K full-text articles of Elsevier  Automatic document indexing  RDF used for syntactic interoperability  RDF wrapper for SOAP-based access to documents  Disambiguation of search terms  Visualization of search results through semantic categories  Needed to prevent information overflow
  • 32.
    Building Finder: integratingimage analysis and textual sources  Knoblock et al. (USC/ISI)  Multiple heterogeneous sources  Satellite images (Microsoft Terraservice)  Road map info (US)  Address information (white pages)  Image analysis techniques to map satellite data to road map  RDF used for syntactic interoperability
  • 34.
    Finnish Museums onthe Semantic Web  Hyvonen et al. (2004)  Multiple museum collections  Indexed with multiple ontologies  Artifact, material, actor, location, time, event  RDF used for syntactic interoperability  Ontologies used for query specialization/generalization
  • 36.
    Cultural heritage collections: possibleuse case  A person is interested in Fauve paintings  There is a digital collection with images of paintings of Andre Derain  The Derain images match the query, despite the fact that “Fauve does not appear in the annotation.
  • 38.
    Application issues (1) Public domains are promising application areas  Medicine, cultural heritage, digital libraries  Many existing vocabularies & annotations  Application-pull  Information integration/presentation is prime use case  Multimedia is important focus  Requires multi-disciplinary approach
  • 39.
    Application issues (2) Free access to vocabularies / ontologies is a real problem  AAT, EuroWordNet  Similar applications can be built for company intranets  NOTE for academics: be conscious of unfair criticism of application papers
  • 40.
    More information See homepage of the Best Practices WG: http://www.w3.org/2001/sw/BestPractices/ All proceedings are public
  • 41.
    Related European effort: ISTKnowledge Web network  http://knowledgeweb.semanticweb.org  Objectives (selection):  Research integration  Summer schools  Educational material  Showcase applications  Industrial dissemination  Started 1-1-2004 and runs for four years