Introduction to ontologies



        Olivier Dameron




INSERM U 936 – Université Rennes 1 (France)
     http://www.u936.univ-rennes1.fr
                                   2010-11-29
Disclaimer

This presentation:
  Identifies general problems (also relevant to
  ecoOnto)
  Explains what ontologies are and how they
  can contribute to the project
Outline

Automatic (intelligent) data processing
  motivations
  requirements: annotations, integration,
  interpretation
Ontologies
  Definition
  Principles
  Difficulties
Automatic data processing
Data: evolution
●   Increasing quantity (not only in bio world)
    ●   Songs
    ●   Pictures
    ●   Personal notes
    ●   Articles, documentation
    ●   Clinical records
●   This trend will probably continue...
Data: evolution
●   Increased complexity (1/2)
    ●   Pictures
               ●   Metadata
                      Date, time
                      ●


                    ● Aperture, focus,...


                    ● Author, copyright|copyleft,...


               ● Tags


               ●   Geotags
Data: evolution
●   Increased complexity (2/2)
    ●   Clinical records are not what they used to be :-)
               ●   From plain text to structured info
               ●   Refer to external sources (ICD,...)
               ●   Multimedia (pacemaker, images, 3D)
               ●   Soon: genetic info, link to ancestors'
                    EHR...
Data: evolution
●   Increased sharing/reuse
    ●   Possible now that data are available
        electronically
    ●   Cumulative effect (specially in complex
        domains such as bio, with lots of inter-
        dependencies)
    ●   Sometimes in purposes not originally forseen
Data
●   Increased quantity
●   Increased complexity
●   Increased sharing/reuse


Shifting from direct consumption by humans
to consumption by program(s) for other
programs or for humans
Data: requirements
●   Annotation
●   Integration
●   Interpretation
Requirement1: data annotation
●   Proxy so that the whole dataset does not
    have to be examined at each query
    ●   Annotations can be difficult or time-consuming to
        produce
    ●   Easier or faster or better results when considering
        the annotations instead of the data
●   Share not only the data, but their
    annotations as well!
    ●   Annotations become data of their own (although we
        seldom annotate them :-)
Requirement1: data annotation

Figuring out the correct and relevant
information is easy for Homo Sapiens...
  Ex: how much does “The Semantic Web primer”
  costs?
Requirement1: data annotation

Figuring out the correct and relevant
information is easy for Homo Sapiens...
... but difficult for Programus Simplex
Requirement2 : data integration

Aggregate and compose information
  Ex: how old were the Nebula award winners when
  they won the prize?
  Ex: how many books had they published?
  Ex: average age of the canadian Nebula winners
Requirement3 : data interpretation

 Google query on owl
 Retrieve all the pictures of a sailing boat in a
 harbor in Brittany
 Retrieve all the radiological exams of a
 fracture of the leg
Requirement3: data interpretation

Google for “owl”
  Noise : owl (bird) VS. owl (DL language)
  Silence : a page mentioning “Web Ontology
  Language” but not “OWL” would be ignored
How about looking for an OWL ontology
about owls (the birds)? :-)


Annotations are great but not enough
The meaning associated to these annotations is
important too
Requirement3: data interpretation

  Retrieve all the pictures of a sailing boat in a
  harbor in Brittany
                                        :-)
:-(

                        :-)
Ontologies
Ontologies: what they are

Ontologies: formal representation of a
shared conceptualization
  [Gruber]
  [Chandrasekaran]
Annotations underlying structure
Oftentimes, everything that is implicit in a factual
document (clinical record, factual report...)
Ontologies: what they are not

Ontology (the branch of philosophy)
Controlled vocabulary, terminologies,...
(although both are useful)
Sets of annotated data (genericity is the key)
Ontologies: principles

Individuals: things
  They are instances of classes
  We hardly see them in ontologies (genericity)...
  … except when they represent things that are widely
  reused (e.g. geographic entities
Ontologies: principles

Properties: binary relation btw individuals
  Ontologies can specify domain and range
  Additional features : transitivity, functionnality,
  symmetry, reflexivity,...
Ontologies: principles

Classes: sets of things (think genericity)
  e.g. Rabbit (as opposed to Bugs Bunny)
  Organized hierarchically (taxonomy) from the more
  general to the more specific (multiple inherit. ok)
  Inheritance of properties
  True path rule: if class A annotates some data, then
  all the ancestors of A are also valid annotations
  (so if you tag a picture as BugsBunny, you do not
  need to mention Rabbit, CartoonCharacter,...)
  Can represent constraints on the properties of their
  instances
Data and ontologies: example




       rdfs:subClassOf
                     Sci-Fi   CLASSES
  Book
                     Book     General knowledge
                              (RDFS realm)

 rdf:type          rdf:type
                              INSTANCE(S)
            Dune
                              Data-specific,
                              No generalization
                              (RDF realm)
Data and ontologies: example


                              The semantics of RDFS
                              allows us to infer that
                              Dune is an instance of
                              Book!
       rdfs:subClassOf
                              (so we do not need
  Book               Sci-Fi   to say it explicitly in
                     Book     the RDF file anymore)


 rdf:type          rdf:type
            Dune
Data and ontologies: example
Litterat.   Sci-Fi          Book
 Award      Award                          Person

                                               rdfs:subClassOf
 rdfs:subClassOf      rdfs:subClassOf
                                                            Country

       Nebula              Sci-Fi               Author
       Award               Book                               rdf:type

                                               rdf:type   United
                     rdf:type       rdf:type              States
 rdf:type
                           Dune
                                                      citizenOf
        Nebula
                                authorOf
        Award wonAward                       Frank
         1965                               Herbert
Synthesis
Synthesis

Annotations are important for efficient data
description
  Integration (incl. future reuse)
  Interpretration
  Focus on describing data as precisely as possible
Ontologies are important for interpreting these
description
  General knowledge about a domain
  Reusable
  Support automatic reasoning
Synthesis

Building ontologies is difficult
  We have a strong experience in building bad
  ontologies
… but having a wide adoption is more important
  The lesson learned from Gene Ontology

Ontologies introduction - ecoOnto meeting

  • 1.
    Introduction to ontologies Olivier Dameron INSERM U 936 – Université Rennes 1 (France) http://www.u936.univ-rennes1.fr 2010-11-29
  • 2.
    Disclaimer This presentation: Identifies general problems (also relevant to ecoOnto) Explains what ontologies are and how they can contribute to the project
  • 3.
    Outline Automatic (intelligent) dataprocessing motivations requirements: annotations, integration, interpretation Ontologies Definition Principles Difficulties
  • 4.
  • 5.
    Data: evolution ● Increasing quantity (not only in bio world) ● Songs ● Pictures ● Personal notes ● Articles, documentation ● Clinical records ● This trend will probably continue...
  • 6.
    Data: evolution ● Increased complexity (1/2) ● Pictures ● Metadata Date, time ● ● Aperture, focus,... ● Author, copyright|copyleft,... ● Tags ● Geotags
  • 7.
    Data: evolution ● Increased complexity (2/2) ● Clinical records are not what they used to be :-) ● From plain text to structured info ● Refer to external sources (ICD,...) ● Multimedia (pacemaker, images, 3D) ● Soon: genetic info, link to ancestors' EHR...
  • 8.
    Data: evolution ● Increased sharing/reuse ● Possible now that data are available electronically ● Cumulative effect (specially in complex domains such as bio, with lots of inter- dependencies) ● Sometimes in purposes not originally forseen
  • 9.
    Data ● Increased quantity ● Increased complexity ● Increased sharing/reuse Shifting from direct consumption by humans to consumption by program(s) for other programs or for humans
  • 10.
    Data: requirements ● Annotation ● Integration ● Interpretation
  • 11.
    Requirement1: data annotation ● Proxy so that the whole dataset does not have to be examined at each query ● Annotations can be difficult or time-consuming to produce ● Easier or faster or better results when considering the annotations instead of the data ● Share not only the data, but their annotations as well! ● Annotations become data of their own (although we seldom annotate them :-)
  • 12.
    Requirement1: data annotation Figuringout the correct and relevant information is easy for Homo Sapiens... Ex: how much does “The Semantic Web primer” costs?
  • 14.
    Requirement1: data annotation Figuringout the correct and relevant information is easy for Homo Sapiens... ... but difficult for Programus Simplex
  • 16.
    Requirement2 : dataintegration Aggregate and compose information Ex: how old were the Nebula award winners when they won the prize? Ex: how many books had they published? Ex: average age of the canadian Nebula winners
  • 20.
    Requirement3 : datainterpretation Google query on owl Retrieve all the pictures of a sailing boat in a harbor in Brittany Retrieve all the radiological exams of a fracture of the leg
  • 22.
    Requirement3: data interpretation Googlefor “owl” Noise : owl (bird) VS. owl (DL language) Silence : a page mentioning “Web Ontology Language” but not “OWL” would be ignored How about looking for an OWL ontology about owls (the birds)? :-) Annotations are great but not enough The meaning associated to these annotations is important too
  • 23.
    Requirement3: data interpretation Retrieve all the pictures of a sailing boat in a harbor in Brittany :-) :-( :-)
  • 24.
  • 25.
    Ontologies: what theyare Ontologies: formal representation of a shared conceptualization [Gruber] [Chandrasekaran] Annotations underlying structure Oftentimes, everything that is implicit in a factual document (clinical record, factual report...)
  • 26.
    Ontologies: what theyare not Ontology (the branch of philosophy) Controlled vocabulary, terminologies,... (although both are useful) Sets of annotated data (genericity is the key)
  • 27.
    Ontologies: principles Individuals: things They are instances of classes We hardly see them in ontologies (genericity)... … except when they represent things that are widely reused (e.g. geographic entities
  • 28.
    Ontologies: principles Properties: binaryrelation btw individuals Ontologies can specify domain and range Additional features : transitivity, functionnality, symmetry, reflexivity,...
  • 29.
    Ontologies: principles Classes: setsof things (think genericity) e.g. Rabbit (as opposed to Bugs Bunny) Organized hierarchically (taxonomy) from the more general to the more specific (multiple inherit. ok) Inheritance of properties True path rule: if class A annotates some data, then all the ancestors of A are also valid annotations (so if you tag a picture as BugsBunny, you do not need to mention Rabbit, CartoonCharacter,...) Can represent constraints on the properties of their instances
  • 30.
    Data and ontologies:example rdfs:subClassOf Sci-Fi CLASSES Book Book General knowledge (RDFS realm) rdf:type rdf:type INSTANCE(S) Dune Data-specific, No generalization (RDF realm)
  • 31.
    Data and ontologies:example The semantics of RDFS allows us to infer that Dune is an instance of Book! rdfs:subClassOf (so we do not need Book Sci-Fi to say it explicitly in Book the RDF file anymore) rdf:type rdf:type Dune
  • 32.
    Data and ontologies:example Litterat. Sci-Fi Book Award Award Person rdfs:subClassOf rdfs:subClassOf rdfs:subClassOf Country Nebula Sci-Fi Author Award Book rdf:type rdf:type United rdf:type rdf:type States rdf:type Dune citizenOf Nebula authorOf Award wonAward Frank 1965 Herbert
  • 33.
  • 34.
    Synthesis Annotations are importantfor efficient data description Integration (incl. future reuse) Interpretration Focus on describing data as precisely as possible Ontologies are important for interpreting these description General knowledge about a domain Reusable Support automatic reasoning
  • 35.
    Synthesis Building ontologies isdifficult We have a strong experience in building bad ontologies … but having a wide adoption is more important The lesson learned from Gene Ontology