Ontology at Manchester
Robert Stevens
BioHealth Informatics Group
School of Computer Science
University of Manchester
2
Ontology Research at Manchester
Language and
Reasoning
Tools
Modelling
3
So what is an ontology?
Catalog/
ID
Thesauri
Terms/
glossary
Informal
Is-a
Formal
Is-a
Formal
instance
Frames
(properties)
General
Logical
constraints
Value
restrictions
Disjointness,
Inverse, partof
Gene Ontology
Mouse Anatomy
EcoCyc
PharmGKB
TAMBIS
Arom
After Chris Welty et al
4
A Definition
o a set of logical axioms designed to account for the
intended meaning of a formal vocabulary used to describe
a certain (conceptualisation of) reality [Guarino 1998]
o “conceptualisation of” inserted by me
o “Logical axioms” means a formal definition of meaning of
terms in a formal language
o Formal language—something a computer an reason with
o Use symbols to make inferences
o Symbols represent things and their relationships
o Making inferences about things computationally amenable
5
OWL
• Ontologies will form the back bone of the
semantic web
• OWL is the latest standard in ontology
languages from the W3C
• Layered on top of RDF and RDF Schema
• Underpinned by Description Logics
6
OWL represents classes
of instances
A
B
C
7
Interpretations
• Individuals are
interpreted as objects
• Classes are interpreted
as sets containing
objects
• Properties are
interpreted as binary
relations on objects
8
Logical Descriptions
• Class: Water
• EquivalentTo: Molecule that
– madeOf 1 OxygenAtom and
– madeOf 2 HydrogenAtom and
– madeOf only (OxygenAtom or HydrogenAtom)
Class: WaterSubClassOf: Molecule that
hasBoilingPoint value 100 and
hasFreezingPoint value 0 and
hasState some Liquid
9
Reasoning
• These OWL descriptions can be submitted to
a DL reasoner
• Translated into DL
• Checked for consistency—is what we’ve said
satisfiable
• Also infers subsumption hierarchy implied by
statements
• Mistakes all too easy without help
• Formality is your friend
10
Language & Reasoning
• Supporting ontology engineering by automated
reasoning
– Classification
– Consistency checking
– Query answering
• Say the things you want to say and still reason
• Explain reasoning results
• Help debugging unexpected results
• Supporting modularity in ontologies
• Segmenting large ontologies into modules
11
Language & Reasoning
• Inspecting ontologies to find missing
knowledge
• Scalability: Larger ontologies; faster
reasoning; more instances; more
expressivity
• Instance Store: Query answering over
vast numbers of instances
Old Protégé (matrix wizard)
New Protégé (matrix tab)
SWOOP (crop circles)
15
ComparaGRID
16
Classsifying Protein Phosphatases
• Annotating a genome’s proteins is a
bottleneck
• Classifying proteins is a first step to
annotation
• Tools for detecting features
• Need human knowledge to determine class
membership
• Can we capture “how to recognise a
phosphatase” in an ontology?
17
Definition of Tyrosine Phosphatase
Class: TyrosinePhosphatase Complete
(Protein and
- (contains atLeast-1
ProteinTyrosinePhosphataseDomain) and
- (contains 1 TransmembraneDomain))
18
Definition for R2A Phosphatase
Class R2A Complete
(Protein and
- (contains 2 ProteinTyrosinePhosphataseDomain) and
- (contains 1 TransmembraneDomain )and
- (contains 4 FibronectinDomains) and
- (contains 1 ImmunoglobulinDomain) and
- (contains 1 MAMDomain) and
- (contains 1 Cadherin-LikeDomain) and
- (contains only (TyrosinePhosphataseDomain or
TransmembraneDomain or FibronectinDomain or
ImnunoglobulinDomain or Clathrin-LikeDomain or
ManDomain)))
19
Building the Ontology
• Classifications already made by biologists – based
on protein functionality;
• Protein domain composition and other details in
the literature;
• Some 50 classes of phosphatase, 30 protein
domains and 39 relationships;
• ”Value partition” of protein domains (covering and
disjoint);
• Defines range of contains property;
• Literature contains knowledge of how to recognise
members of each class of phosphatase.
20
Incremental Addition of Protein
Functional Domains
Phosphatase catalytic
Cadherin-like
Immunoglobulin
MAM domain Cellular retinaldehyde
Adhesion recognition Transmembrane
Fibronectin III Glycosylation
21
Classification of the Classical Tyrosine
Phosphatases
22
What is the Ontology Telling Us?
• Each class of phosphatase defined in terms of
domain composition
• We know the characteristics by which an individual
protein can be recognised to be a member of a
particular class of phosphatase
• We have this knowledge in a computational form
• If we had protein instances described in terms of
the ontology, we could classify those individual
proteins
• A catalogue of phosphatases
23
Classification of Protein Tyrosine
Phosphatases
24
Results
• Human “gold standard”: Same results plus
two more
• Partially annotated A. fumigatis: Better results
and two new putative phosphatases
• Easily generated and compared phosphatase
profiles
• Parasites
• Whole range of unexpected results---back to
bioinformatics sequence analysis
25
myGrid Service Ontology
• myGrid services and workflow toolkit
• Web service discovery and composition
• Semantic content of provenance
repository
• Wide use of service ontology
• Links wit BioMOBY
• Workflows as knowledge management
26
Informal Modelling
• OWL is formal, but ontology has a long
informal stage
• Tool forms of knowledge elicitation
techniques such as card sorting and
laddering
• Experiments with text to ontology tools
• With suitable text can truncate the informal
stage
• Provide useful starting points for later stages
27
Casual Modelling
• OWL can be scary
• Need the equivalent of pseudo-code
• Work on concept maps as an elicitation
tool
• Convertible to OWL
• Converting spreadsheets to OWL
• Converting thesaurae to OWL
28
Community Building of Ontologies
• Collaboration with University of British
Columbia, Vancouver
• No money and no centre: What can you
do?
• Use your community to build, extend,
check facts in your ontology
• Currently running experiments
29
The Sealife Browser
• An EU project to build a Semantic Grid
browser for the life sciences
• Uses ontology as background knowledge
• Dynamically link to terms on a page
• Link to tools, data, documents, etc
• A semantic shopping cart
• Need to use a broad range of ontologies and
many conversions
30
Modelling Biology & Medicine
• Describing biological phenomena
• Reconciling descriptions
• Analysing biological data
• Describing and analysing healthcare
records
• Guiding annotation: Creating and filling
forms
• Describing medical phenomena
31
Outside Relationships
• BioPAX
• FUGO/OBI
• Plant Ontology
• CBIO
• HL7, …
32
Training
• Introductory OWL tutorials: Non-
biological
• Advanced tutorial: Biology orientated
• Hundreds trained in UK and overseas
(mainly life sciences)
• Hands-on training

Ontology at Manchester

  • 1.
    Ontology at Manchester RobertStevens BioHealth Informatics Group School of Computer Science University of Manchester
  • 2.
    2 Ontology Research atManchester Language and Reasoning Tools Modelling
  • 3.
    3 So what isan ontology? Catalog/ ID Thesauri Terms/ glossary Informal Is-a Formal Is-a Formal instance Frames (properties) General Logical constraints Value restrictions Disjointness, Inverse, partof Gene Ontology Mouse Anatomy EcoCyc PharmGKB TAMBIS Arom After Chris Welty et al
  • 4.
    4 A Definition o aset of logical axioms designed to account for the intended meaning of a formal vocabulary used to describe a certain (conceptualisation of) reality [Guarino 1998] o “conceptualisation of” inserted by me o “Logical axioms” means a formal definition of meaning of terms in a formal language o Formal language—something a computer an reason with o Use symbols to make inferences o Symbols represent things and their relationships o Making inferences about things computationally amenable
  • 5.
    5 OWL • Ontologies willform the back bone of the semantic web • OWL is the latest standard in ontology languages from the W3C • Layered on top of RDF and RDF Schema • Underpinned by Description Logics
  • 6.
  • 7.
    7 Interpretations • Individuals are interpretedas objects • Classes are interpreted as sets containing objects • Properties are interpreted as binary relations on objects
  • 8.
    8 Logical Descriptions • Class:Water • EquivalentTo: Molecule that – madeOf 1 OxygenAtom and – madeOf 2 HydrogenAtom and – madeOf only (OxygenAtom or HydrogenAtom) Class: WaterSubClassOf: Molecule that hasBoilingPoint value 100 and hasFreezingPoint value 0 and hasState some Liquid
  • 9.
    9 Reasoning • These OWLdescriptions can be submitted to a DL reasoner • Translated into DL • Checked for consistency—is what we’ve said satisfiable • Also infers subsumption hierarchy implied by statements • Mistakes all too easy without help • Formality is your friend
  • 10.
    10 Language & Reasoning •Supporting ontology engineering by automated reasoning – Classification – Consistency checking – Query answering • Say the things you want to say and still reason • Explain reasoning results • Help debugging unexpected results • Supporting modularity in ontologies • Segmenting large ontologies into modules
  • 11.
    11 Language & Reasoning •Inspecting ontologies to find missing knowledge • Scalability: Larger ontologies; faster reasoning; more instances; more expressivity • Instance Store: Query answering over vast numbers of instances
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
    16 Classsifying Protein Phosphatases •Annotating a genome’s proteins is a bottleneck • Classifying proteins is a first step to annotation • Tools for detecting features • Need human knowledge to determine class membership • Can we capture “how to recognise a phosphatase” in an ontology?
  • 17.
    17 Definition of TyrosinePhosphatase Class: TyrosinePhosphatase Complete (Protein and - (contains atLeast-1 ProteinTyrosinePhosphataseDomain) and - (contains 1 TransmembraneDomain))
  • 18.
    18 Definition for R2APhosphatase Class R2A Complete (Protein and - (contains 2 ProteinTyrosinePhosphataseDomain) and - (contains 1 TransmembraneDomain )and - (contains 4 FibronectinDomains) and - (contains 1 ImmunoglobulinDomain) and - (contains 1 MAMDomain) and - (contains 1 Cadherin-LikeDomain) and - (contains only (TyrosinePhosphataseDomain or TransmembraneDomain or FibronectinDomain or ImnunoglobulinDomain or Clathrin-LikeDomain or ManDomain)))
  • 19.
    19 Building the Ontology •Classifications already made by biologists – based on protein functionality; • Protein domain composition and other details in the literature; • Some 50 classes of phosphatase, 30 protein domains and 39 relationships; • ”Value partition” of protein domains (covering and disjoint); • Defines range of contains property; • Literature contains knowledge of how to recognise members of each class of phosphatase.
  • 20.
    20 Incremental Addition ofProtein Functional Domains Phosphatase catalytic Cadherin-like Immunoglobulin MAM domain Cellular retinaldehyde Adhesion recognition Transmembrane Fibronectin III Glycosylation
  • 21.
    21 Classification of theClassical Tyrosine Phosphatases
  • 22.
    22 What is theOntology Telling Us? • Each class of phosphatase defined in terms of domain composition • We know the characteristics by which an individual protein can be recognised to be a member of a particular class of phosphatase • We have this knowledge in a computational form • If we had protein instances described in terms of the ontology, we could classify those individual proteins • A catalogue of phosphatases
  • 23.
    23 Classification of ProteinTyrosine Phosphatases
  • 24.
    24 Results • Human “goldstandard”: Same results plus two more • Partially annotated A. fumigatis: Better results and two new putative phosphatases • Easily generated and compared phosphatase profiles • Parasites • Whole range of unexpected results---back to bioinformatics sequence analysis
  • 25.
    25 myGrid Service Ontology •myGrid services and workflow toolkit • Web service discovery and composition • Semantic content of provenance repository • Wide use of service ontology • Links wit BioMOBY • Workflows as knowledge management
  • 26.
    26 Informal Modelling • OWLis formal, but ontology has a long informal stage • Tool forms of knowledge elicitation techniques such as card sorting and laddering • Experiments with text to ontology tools • With suitable text can truncate the informal stage • Provide useful starting points for later stages
  • 27.
    27 Casual Modelling • OWLcan be scary • Need the equivalent of pseudo-code • Work on concept maps as an elicitation tool • Convertible to OWL • Converting spreadsheets to OWL • Converting thesaurae to OWL
  • 28.
    28 Community Building ofOntologies • Collaboration with University of British Columbia, Vancouver • No money and no centre: What can you do? • Use your community to build, extend, check facts in your ontology • Currently running experiments
  • 29.
    29 The Sealife Browser •An EU project to build a Semantic Grid browser for the life sciences • Uses ontology as background knowledge • Dynamically link to terms on a page • Link to tools, data, documents, etc • A semantic shopping cart • Need to use a broad range of ontologies and many conversions
  • 30.
    30 Modelling Biology &Medicine • Describing biological phenomena • Reconciling descriptions • Analysing biological data • Describing and analysing healthcare records • Guiding annotation: Creating and filling forms • Describing medical phenomena
  • 31.
    31 Outside Relationships • BioPAX •FUGO/OBI • Plant Ontology • CBIO • HL7, …
  • 32.
    32 Training • Introductory OWLtutorials: Non- biological • Advanced tutorial: Biology orientated • Hundreds trained in UK and overseas (mainly life sciences) • Hands-on training

Editor's Notes

  • #4 Spectrum of what can be an ontology