Semantic Web
Overview
 What is Semantic Web
 Semantic Web Vision
 Semantic Web Layers
 RDF, RDFS, OWL
 Tools
 GATE
 Applications
What is Semantic Web?
 Semantic means that the meaning of data
  can be discovered by computers
 "The Semantic Web is an extension of the
  current web in which information is given
  well-defined meaning, better enabling
  computers and people to work in
  cooperation." - Tim Berners-Lee
Definition
   The Semantic Web is a project to create a
    universal medium for information exchange by
    putting documents with computer-processable
    meaning (semantic) on the World Wide Web
   The Semantic Web extends the Web through the
    use of standards, markup languages and related
    processing tools
The aims of Semantic Web
 Indexing and retrieving information
 Annotation
 The Web as a interoperable database
 Machine retrieval of data
 Web based services
 Discovery of services
 Intelligent software agents
Semantic Web Vision
 Oriented toward machine-readable
  resources rather than human-readable
 Requires resources to be described so
  that machines know what they mean
     Description   in terms of metadata
   Use of logic interpretation for inference
Semantic Web Layers
Semantic Web Layers
 XML (Extensible Markup Language)- The
  language framework that is used to define
  nearly all new languages that are used to
  interchange data over the Web
 XML Schema -A language used to define
  the structure of specific XML language
Semantic Web Layers
 RDF (Resource Description Framework)-
  a language used to describe all sort of
  information and meta data
 RDF Schema-A framework that provides a
  means to specify basic vocabularies for
  specific RDF application language to use
Semantic Web Layers
 Ontology- defines vocabularies and
  establish the usage of words and terms in
  context of specific vocabulary
 Logic and Proof –is used to establish the
  consistency and correctness of data sets
  and to infer conclusion that aren’t explicitly
  stated
Semantic Web agents
 Metadata will be used to identify and
  extract information from Web sources.
 Ontologies will be used to assist in Web
  searches, to interpret retrieved
  information, and to communicate with
  other agents.
 Logic will be used for processing retrieved
  information and for drawing conclusions.
RDF
RDF
• “Resource Description Framework”
• RDF is a data model
   • Originally for describing metadata for web pages
   • Structured information
   • Universal, machine-readable data exchange model
   • Syntax uses XML for serialization
• Statements can be modeled with
   • Resources: an element, a URI, a literal
   • Properties: directed relation between two resources
   • Statements: triples of two resources linked by property
RDF
• Generally triple can be viewed as a graph
    • both “ object: and “ subject” are the graph nodes
    • “properties are the edges
• XML syntax is only the tools for practical usage instead of graph
• Components
    • URIs – for referencing resources
    • Literals – data values
    • Empty nodes (blank nodes) – talking about something which doesn’t
      have a name
RDF Example




  • Subject: URIs and empty nodes

  • Predicate: URIs ( also called properties)

  • Object: URIs and empty nodes and literals
XML syntax for RDF Example
RDF Example
RDF XML Code Example
1. <?xml version="1.0"?>
2. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3.        xmlns:dc="http://purl.org/dc/elements/1.1/"
4.        xmlns:exterms="http://www.example.org/terms/">

5.   <rdf:Description rdf:about="http://www.example.org/index.html">
6.      <exterms:creation-date>August 16, 1999</exterms:creation-date>
7.      <dc:language>en</dc:language>
8.      <dc:creator rdf:resource="http://www.example.org/staffid/85740"/>
9.   </rdf:Description>

10. </rdf:RDF>
A simple example
   “The book has the title War and Peace”
   Graphical RDF Statement

                         has the title       War and
         The book
         The book                             Peace

   RDF in a XML document
    <?xml version="1.0"?>
  <rdf:RDF
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:dc="http://purl.org/dc/element/1.1">
    <rdf:Description rdf:about="http://amazon.com/books">
      <dc:title> War and Peace</dc:title>
    </rdf:Description>
</rdf:RDF>
Ontology
 We can express ontology as:
  Ontology =<taxonomy, inference rules>
And we can express a taxonomy as:
  Taxonomy <{classes}, {relations}>

   Ontology Languages (RDFS, OWL) has formal
    foundations that allow us to infer additional (implicit)
    statements
RDF Schema
 Intended to structure RDF resources
 RDFS
     Set theory – rdfs:Class
     Relation – rdf:Property, rdfs:domain, rdfs:range
     Hierarchy – rdfs:subClassOf, rdfs:subPropertyOf
     Built-in Datatype – xsd:string, xsd:dataTime
RDF & RDFS
   RDF is graphical formalism ( + XML syntax + semantics)
      for representing metadata
      for describing the semantics of information in a
       machine- accessible way
   RDFS extends RDF with “schema vocabulary”, e.g.:
      Class, Property
      type, subClassOf, subPropertyOf
      range, domain
Limitations of RDF/RDFS
   No standard for expressing primitive data types such as
    integer, etc. All data types in RDF/RDFS are treated as
    strings.
   No standard for expressing relations of properties
    (unique, transitive, inverse etc.)
   No standard for expressing whether enumerations are
    closed.
   No standard to express equivalence, disjointedness etc.
    among properties
OIL and DAML
   RDFRDFS define a framework, however they have
    limitations. There is a need for new semantic web
    languages with following requirements
       They should be compatible with (XML, RDF/RDFS)
       They should have enough expressive power to fill in the gaps in
        RDFS
       They should provide automated reasoning support
   Ontology Inference Layer (OIL) and DARPA Agent Markup
    Language (DAML) are two important efforts developed to
    fulfill these requirements.
   Their combined efforts formed DAML+OIL declarative
    semantic language.
OIL and DAML
   DAML+OIL is built on top of RDFS.
       It uses RDFS syntax.
       It has richer ways to express primitive data types.

   DAML+OIL allows other relationships (inverse and
    transitivity) to be directly expressed.

   DAML+OIL provides well defined semantics, This
    provides followings:
       Meaning of DAML+OIL statements can be formally specified.
       Machine understanding and automated reasoning can be
        supported.
       More expressive power can be provided.
Example
Example: T. Rex is not herbivore and not a currently living
         species.
 This statement can be expressed in DAML+OIL, but not in
  RDF/RDFS since RDF/RDFS cannot express disjointedness.

   DAML+OIL provides automated reasoning by providing such
    expressive power.
     For instance, a software agent can find out the “list of all the carnivores
      that won’t be any threat today” by processing the DAML+OIL data
      representation of the example above.
     RDF/RDFS does not express “is not” relationships and exclusions.
OWL
Web Ontology Language = OWL

   OWL is an extra layer, a bit like RDFS
     own    namespace, own terms
     it relies on RDF Schemas
   It is a separate recommendation
     actually…   there is a 2004 version of OWL
      (“OWL 1”)
     and there is an update (“OWL 2”) published in
      2009
OWL- Web Ontology Language
   OWL is a vocabulary extension of the RDF and is
    derived from the DAML+OIL Web Ontology Language.
   OWL
       Description Logic
            Class, Thing, Nothing
            DatatypeProperty, ObjectProperty, AnnotationProperty,…
       Class
            oneOf, disjointWith, unionOf, complementOf, intersectionOf …
            Restriction, onProperty, cardinality, hasValue…
       Property
            inverseOf , TransitiveProperty , SymmetricProperty
            FunctionalProperty, InverseFunctionalProperty
     Equality– equivalentClass , sameAs , differentFrom…
     Ontology annotation – Ontology, imports, versionInfo
Term equivalences
   For classes:
     owl:equivalentClass:      two classes have the
      same individuals
     owl:disjointWith: no individuals in common
   For properties:
     owl:equivalentProperty
          remember the a:author vs. f:auteur?
     owl:propertyDisjointWith
Term equivalences
   For individuals:
     owl:sameAs:   two URIs refer to the same
      concept (“individual”)
     owl:differentFrom: negation of owl:sameAs
Example
              owl:equivalentProperty
   a:author                            f:auteur




              owl:equivalentClass
   a:Novel                             f:Roman
Property characterization
 In OWL, one can characterize the
  behavior of properties (symmetric,
  transitive, functional, reflexive, inverse
  functional…)
 One property can be defined as the
  “inverse” of another
What this means is…
    If the following holds in our triples:
:email rdf:type owl:InverseFunctionalProperty.
<A> :email "mailto:a@b.c".
<B> :email "mailto:a@b.c".
What this means is…
    If the following holds in our triples:
:email rdf:type owl:InverseFunctionalProperty.
<A> :email "mailto:a@b.c".
<B> :email "mailto:a@b.c".


 then, processed through OWL, the following
 holds, too:
<A> owl:sameAs <B>.
Keys
    “if two persons have the same emails and the same
    homepages then they are identical”


 Identification is based on the identical
  values of two properties
 The rule applies to persons only
Previous rule in OWL

:Person rdf:type owl:Class;
   owl:hasKey (:email :homepage) .
What it means is…
If:
<A> rdf:type :Person ;
   :email    "mailto:a@b.c";
   :homepage "http://www.ex.org".

<B> rdf:type :Person ;
   :email    "mailto:a@b.c";
   :homepage "http://www.ex.org".


then, processed through OWL, the following holds,
too:
<A> owl:sameAs <B>.
Classes in OWL
 In RDFS, you can subclass existing
  classes… that’s all
 In OWL, you can construct classes from
  existing ones:
     enumerate   its content
     through intersection, union, complement
     etc
Enumerate class content
:Currency
    rdf:type owl:Class;
    owl:oneOf (:€ :£ :$).




   I.e., the class consists of exactly of those
    individuals and nothing else
Union of classes
:Novel           rdf:type owl:Class.
:Short_Story     rdf:type owl:Class.
:Poetry          rdf:type owl:Class.
:Literature rdf:type owl:Class;
   owl:unionOf (:Novel :Short_Story :Poetry).




    Other possibilities: owl:complementOf,
     owl:intersectionOf, …
For example…
 If:
:Novel           rdf:type owl:Class.
:Short_Story     rdf:type owl:Class.
:Poetry          rdf:type owl:Class.
:Literature rdf:type owl:Class;
   owl:unionOf (:Novel :Short_Story :Poetry).

<myWork> rdf:type :Novel .



  then the following holds, too:

<myWork> rdf:type :Literature .
What we have so far…
 The OWL features listed so far are already
  fairly powerful
 E.g., various databases can be linked via
  owl:sameAs, functional or inverse
  functional properties, etc.
 Many inferred relationship can be found
  using a traditional rule engine
The most used Semantic Web
Tools
 RDF Gateway- it runs both a Web
  application server and database design to
  handle RDF content
 Jena -Java API for RDF
 Smore: Semantic Markup, Ontology and
  RDF Editor
 Drive - a C# API. It parses and validate
  RDF documents.
General Architecture for Text
   Engineering (GATE)
What is GATE?
An architecture
   A macro-level organisational picture for LE software systems.
A framework
    For programmers, GATE is an object-oriented class library that
implements the architecture.
A development environment
    For language engineers, computational linguists et al, GATE is a
graphical development environment bundled with a set of tools for doing
e.g. Information Extraction.
Some free components... ...and wrappers for other
people's components
Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue;
   ontologies; etc.
                                                                   46(21)
Where did GATE come from?
A number of researchers realised in the early- mid-1990s (e.g. in
  TIPSTER):
• Increasing trend towards multi-site collaborative projects
• Role of engineering in scalable, reusable, and portable HLT solutions
• Support for large data, in multiple media, languages, formats, and
  locations
• Lower the cost of creation of new language processing components
• Promote quantitative evaluation metrics via tools and a level playing field


History:
• 1996 – 2002: GATE version 1, proof of concept
• March 2002: version 2, rewritten in Java, component based, more users
• Fall 2003: new development cycle

                                                                    47(21)
Applications
 Swoogle
 DBpedia
 Flickr
 PhotoStuff
Swoogle
•   Swoogle is a crawler based indexing and retrieval
    system for Semantic Web

•   Swoogle crawls and discovers documents written in
    RDF,OWL

•   Swoogle classifies a Semantic Web
    Document(SWD) as –
    • Semantic Web Ontology (SWO) – Defines new

      terms
    • Semantic Web Databases (SWDB) – Makes

      assertions about individuals
Reference & Resources
   http://www.w3.org/TR/rdf-primer/
   http://www.w3.org/DesignIssues/Semantic.html
   http://www.w3.org/TR/rdf-syntax-grammar/
   http://www.authorstream.com/Presentation/sneha.ch
   http://www.w3.org/TR/owl-semantics/

Semantic web

  • 1.
  • 2.
    Overview  What isSemantic Web  Semantic Web Vision  Semantic Web Layers  RDF, RDFS, OWL  Tools  GATE  Applications
  • 3.
    What is SemanticWeb?  Semantic means that the meaning of data can be discovered by computers  "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." - Tim Berners-Lee
  • 4.
    Definition  The Semantic Web is a project to create a universal medium for information exchange by putting documents with computer-processable meaning (semantic) on the World Wide Web  The Semantic Web extends the Web through the use of standards, markup languages and related processing tools
  • 5.
    The aims ofSemantic Web  Indexing and retrieving information  Annotation  The Web as a interoperable database  Machine retrieval of data  Web based services  Discovery of services  Intelligent software agents
  • 6.
    Semantic Web Vision Oriented toward machine-readable resources rather than human-readable  Requires resources to be described so that machines know what they mean  Description in terms of metadata  Use of logic interpretation for inference
  • 7.
  • 8.
    Semantic Web Layers XML (Extensible Markup Language)- The language framework that is used to define nearly all new languages that are used to interchange data over the Web  XML Schema -A language used to define the structure of specific XML language
  • 9.
    Semantic Web Layers RDF (Resource Description Framework)- a language used to describe all sort of information and meta data  RDF Schema-A framework that provides a means to specify basic vocabularies for specific RDF application language to use
  • 10.
    Semantic Web Layers Ontology- defines vocabularies and establish the usage of words and terms in context of specific vocabulary  Logic and Proof –is used to establish the consistency and correctness of data sets and to infer conclusion that aren’t explicitly stated
  • 11.
    Semantic Web agents Metadata will be used to identify and extract information from Web sources.  Ontologies will be used to assist in Web searches, to interpret retrieved information, and to communicate with other agents.  Logic will be used for processing retrieved information and for drawing conclusions.
  • 12.
  • 13.
    RDF • “Resource DescriptionFramework” • RDF is a data model • Originally for describing metadata for web pages • Structured information • Universal, machine-readable data exchange model • Syntax uses XML for serialization • Statements can be modeled with • Resources: an element, a URI, a literal • Properties: directed relation between two resources • Statements: triples of two resources linked by property
  • 14.
    RDF • Generally triplecan be viewed as a graph • both “ object: and “ subject” are the graph nodes • “properties are the edges • XML syntax is only the tools for practical usage instead of graph • Components • URIs – for referencing resources • Literals – data values • Empty nodes (blank nodes) – talking about something which doesn’t have a name
  • 15.
    RDF Example • Subject: URIs and empty nodes • Predicate: URIs ( also called properties) • Object: URIs and empty nodes and literals
  • 16.
    XML syntax forRDF Example
  • 17.
  • 18.
    RDF XML CodeExample 1. <?xml version="1.0"?> 2. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 3. xmlns:dc="http://purl.org/dc/elements/1.1/" 4. xmlns:exterms="http://www.example.org/terms/"> 5. <rdf:Description rdf:about="http://www.example.org/index.html"> 6. <exterms:creation-date>August 16, 1999</exterms:creation-date> 7. <dc:language>en</dc:language> 8. <dc:creator rdf:resource="http://www.example.org/staffid/85740"/> 9. </rdf:Description> 10. </rdf:RDF>
  • 19.
    A simple example  “The book has the title War and Peace”  Graphical RDF Statement has the title War and The book The book Peace  RDF in a XML document <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/element/1.1"> <rdf:Description rdf:about="http://amazon.com/books"> <dc:title> War and Peace</dc:title> </rdf:Description> </rdf:RDF>
  • 20.
    Ontology  We canexpress ontology as: Ontology =<taxonomy, inference rules> And we can express a taxonomy as: Taxonomy <{classes}, {relations}>  Ontology Languages (RDFS, OWL) has formal foundations that allow us to infer additional (implicit) statements
  • 21.
    RDF Schema  Intendedto structure RDF resources  RDFS  Set theory – rdfs:Class  Relation – rdf:Property, rdfs:domain, rdfs:range  Hierarchy – rdfs:subClassOf, rdfs:subPropertyOf  Built-in Datatype – xsd:string, xsd:dataTime
  • 22.
    RDF & RDFS  RDF is graphical formalism ( + XML syntax + semantics)  for representing metadata  for describing the semantics of information in a machine- accessible way  RDFS extends RDF with “schema vocabulary”, e.g.:  Class, Property  type, subClassOf, subPropertyOf  range, domain
  • 23.
    Limitations of RDF/RDFS  No standard for expressing primitive data types such as integer, etc. All data types in RDF/RDFS are treated as strings.  No standard for expressing relations of properties (unique, transitive, inverse etc.)  No standard for expressing whether enumerations are closed.  No standard to express equivalence, disjointedness etc. among properties
  • 24.
    OIL and DAML  RDFRDFS define a framework, however they have limitations. There is a need for new semantic web languages with following requirements  They should be compatible with (XML, RDF/RDFS)  They should have enough expressive power to fill in the gaps in RDFS  They should provide automated reasoning support  Ontology Inference Layer (OIL) and DARPA Agent Markup Language (DAML) are two important efforts developed to fulfill these requirements.  Their combined efforts formed DAML+OIL declarative semantic language.
  • 25.
    OIL and DAML  DAML+OIL is built on top of RDFS.  It uses RDFS syntax.  It has richer ways to express primitive data types.  DAML+OIL allows other relationships (inverse and transitivity) to be directly expressed.  DAML+OIL provides well defined semantics, This provides followings:  Meaning of DAML+OIL statements can be formally specified.  Machine understanding and automated reasoning can be supported.  More expressive power can be provided.
  • 26.
    Example Example: T. Rexis not herbivore and not a currently living species.  This statement can be expressed in DAML+OIL, but not in RDF/RDFS since RDF/RDFS cannot express disjointedness.  DAML+OIL provides automated reasoning by providing such expressive power.  For instance, a software agent can find out the “list of all the carnivores that won’t be any threat today” by processing the DAML+OIL data representation of the example above.  RDF/RDFS does not express “is not” relationships and exclusions.
  • 27.
  • 28.
    Web Ontology Language= OWL  OWL is an extra layer, a bit like RDFS  own namespace, own terms  it relies on RDF Schemas  It is a separate recommendation  actually… there is a 2004 version of OWL (“OWL 1”)  and there is an update (“OWL 2”) published in 2009
  • 29.
    OWL- Web OntologyLanguage  OWL is a vocabulary extension of the RDF and is derived from the DAML+OIL Web Ontology Language.  OWL  Description Logic  Class, Thing, Nothing  DatatypeProperty, ObjectProperty, AnnotationProperty,…  Class  oneOf, disjointWith, unionOf, complementOf, intersectionOf …  Restriction, onProperty, cardinality, hasValue…  Property  inverseOf , TransitiveProperty , SymmetricProperty  FunctionalProperty, InverseFunctionalProperty  Equality– equivalentClass , sameAs , differentFrom…  Ontology annotation – Ontology, imports, versionInfo
  • 30.
    Term equivalences  For classes:  owl:equivalentClass: two classes have the same individuals  owl:disjointWith: no individuals in common  For properties:  owl:equivalentProperty  remember the a:author vs. f:auteur?  owl:propertyDisjointWith
  • 31.
    Term equivalences  For individuals:  owl:sameAs: two URIs refer to the same concept (“individual”)  owl:differentFrom: negation of owl:sameAs
  • 32.
    Example owl:equivalentProperty a:author f:auteur owl:equivalentClass a:Novel f:Roman
  • 33.
    Property characterization  InOWL, one can characterize the behavior of properties (symmetric, transitive, functional, reflexive, inverse functional…)  One property can be defined as the “inverse” of another
  • 34.
    What this meansis…  If the following holds in our triples: :email rdf:type owl:InverseFunctionalProperty. <A> :email "mailto:a@b.c". <B> :email "mailto:a@b.c".
  • 35.
    What this meansis…  If the following holds in our triples: :email rdf:type owl:InverseFunctionalProperty. <A> :email "mailto:a@b.c". <B> :email "mailto:a@b.c". then, processed through OWL, the following holds, too: <A> owl:sameAs <B>.
  • 36.
    Keys “if two persons have the same emails and the same homepages then they are identical”  Identification is based on the identical values of two properties  The rule applies to persons only
  • 37.
    Previous rule inOWL :Person rdf:type owl:Class; owl:hasKey (:email :homepage) .
  • 38.
    What it meansis… If: <A> rdf:type :Person ; :email "mailto:a@b.c"; :homepage "http://www.ex.org". <B> rdf:type :Person ; :email "mailto:a@b.c"; :homepage "http://www.ex.org". then, processed through OWL, the following holds, too: <A> owl:sameAs <B>.
  • 39.
    Classes in OWL In RDFS, you can subclass existing classes… that’s all  In OWL, you can construct classes from existing ones:  enumerate its content  through intersection, union, complement  etc
  • 40.
    Enumerate class content :Currency rdf:type owl:Class; owl:oneOf (:€ :£ :$).  I.e., the class consists of exactly of those individuals and nothing else
  • 41.
    Union of classes :Novel rdf:type owl:Class. :Short_Story rdf:type owl:Class. :Poetry rdf:type owl:Class. :Literature rdf:type owl:Class; owl:unionOf (:Novel :Short_Story :Poetry).  Other possibilities: owl:complementOf, owl:intersectionOf, …
  • 42.
    For example… If: :Novel rdf:type owl:Class. :Short_Story rdf:type owl:Class. :Poetry rdf:type owl:Class. :Literature rdf:type owl:Class; owl:unionOf (:Novel :Short_Story :Poetry). <myWork> rdf:type :Novel . then the following holds, too: <myWork> rdf:type :Literature .
  • 43.
    What we haveso far…  The OWL features listed so far are already fairly powerful  E.g., various databases can be linked via owl:sameAs, functional or inverse functional properties, etc.  Many inferred relationship can be found using a traditional rule engine
  • 44.
    The most usedSemantic Web Tools  RDF Gateway- it runs both a Web application server and database design to handle RDF content  Jena -Java API for RDF  Smore: Semantic Markup, Ontology and RDF Editor  Drive - a C# API. It parses and validate RDF documents.
  • 45.
    General Architecture forText Engineering (GATE)
  • 46.
    What is GATE? Anarchitecture A macro-level organisational picture for LE software systems. A framework For programmers, GATE is an object-oriented class library that implements the architecture. A development environment For language engineers, computational linguists et al, GATE is a graphical development environment bundled with a set of tools for doing e.g. Information Extraction. Some free components... ...and wrappers for other people's components Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc. 46(21)
  • 47.
    Where did GATEcome from? A number of researchers realised in the early- mid-1990s (e.g. in TIPSTER): • Increasing trend towards multi-site collaborative projects • Role of engineering in scalable, reusable, and portable HLT solutions • Support for large data, in multiple media, languages, formats, and locations • Lower the cost of creation of new language processing components • Promote quantitative evaluation metrics via tools and a level playing field History: • 1996 – 2002: GATE version 1, proof of concept • March 2002: version 2, rewritten in Java, component based, more users • Fall 2003: new development cycle 47(21)
  • 48.
  • 49.
    Swoogle • Swoogle is a crawler based indexing and retrieval system for Semantic Web • Swoogle crawls and discovers documents written in RDF,OWL • Swoogle classifies a Semantic Web Document(SWD) as – • Semantic Web Ontology (SWO) – Defines new terms • Semantic Web Databases (SWDB) – Makes assertions about individuals
  • 50.
    Reference & Resources  http://www.w3.org/TR/rdf-primer/  http://www.w3.org/DesignIssues/Semantic.html  http://www.w3.org/TR/rdf-syntax-grammar/  http://www.authorstream.com/Presentation/sneha.ch  http://www.w3.org/TR/owl-semantics/