Knowing what we’re
Bio-health Informatics Group
School of Computer Science
University of manchester
We have an item of data
• 27 what?
• Units, with what is 27
• Even if I told you, would
we interpret what I said
in the same way?
Mouse tail of 27 mm
• … and we can carry on:
Mouse strain, where was
it raised, on what was it
fed, times, dates, etc.
• All this data is necessary
to interpret my original
• Even if that metadata
exists, we have to agree
on the things the
Heterogeneity is rife
• We agree on units (more or less)…
• We don’t agree on much else when it comes to
labels for the entities in our domain
• If we don’t know what we’re talking about….
• It’s difficult to interpret and exchange data and the
results from data
Categories and Category Labels
U2-type nuclear mRNA 5' splice site recognition
spliceosomal E complex formation
spliceosomal E complex biosynthesis
spliceosomal CC complex formation
U2-type nuclear mRNA 5'-splice site recognition
The Ogden Triangle
[Ogden, Richards, 1923]
• Humans require words (or at least symbols) to communicate efficiently. The
mapping of words to things is only indirectly possible. We do it by creating
concepts that refer to things.
• The relation between symbols and things has been described in the form of the
We need to know what we’re talking about…
• … if we don’t, our data are useless
• Ifg we are to interpret our data then we need to
know what entities it describes
• We need to share data and re-use it
• We need to find data; compare data; analyse data
• We need to know what we know….
January 1st 1754 Executed 18
Found Dead 34
Kill'd by falls and other accidents 55
Kill'd themselves 36
Bit by mad dogs 3
Broken Limbs 5
Excessive Drinking 15
List of diseases &
casualties this year
Deaths by centile
A World of Instances
• The world (of information) is made up of things and lots of them
• Instances, individuals, objects, tokens, particulars.
• The Earth is a kind of Planet
• Robert Stevens (NE 67 41 58 A) is a Person
• All the individual Alpha Haemoglobins in my many Instances of Red Blood Cell
• Each cell instance in my Body has copies of some 30,000 Genes
• A Word, language, idea, etc.
• This Table, those Chairs,
• Any Thing with “A”, “The”, “That”, etc. before it….
We Put things into
• All these instances hang about making our world
• Putting these things into categories is a fundamental part of
• Psychologists study this as concept formation
• The same instances are put into a category
We have Labels for the
Categories and their
• We label categories with symbols: Words
• “Lion” is a category of big cat with big teeth
• Gene, Protein, Cell, Person, Hydrolase Activity, etc.
• …and, as we’ve already seen, each category can have many labels and any
particular label can refer to more than one category
• Semantic Heterogeneity
• “A lion” is an instance in that category
• Does the category “Lion” exist?
• Lions exist, but the category could just be a human way of talking about
• … we like putting things into categories
A Controlled Vocabulary• A specified set of words and phrases for the categories
in which we place instances
• Natural language definitions for those words and
• A glossary defines, but doesn’t control
• The Uniprot keywords define and control
• Control is placed upon which labels are used to
represent the categories (concepts) we’ve used to
describe the instances in the world
• …, but there is nothing about how things in these
categories are related
We also like to Relate Things
• Categories have subcategories
• Instances in one category can be related
in some way to instances in another
• Can relate instances to each other in
many different ways
• Is-a, part-of, develops-from, etc.axes
• We can use these relationships to classify
• Things in category A are part is
• If all instances in category A are also in
category B then As are kinds of Bs
Nucleic Acid Polypeptide
tRNA mRNA smRNA
Categories and sub-
polypeptide Nucleic acid
• We can make conditions that any instance must fulfil in order to be a
member of a particular category
• A Phosphatase must have a phosphatase catalytic domain
• A Receptor must have a transmembrane domain
• A codon has three nucleotide residues
• A limb has part that is a joint
• A man has a Y chromosome and an X chromosome
• A woman has only an X chromosome
• These conditions made from a property and a
• isPartOf, hasPart
• …and many, many more
A Structured Controlled
• Not only can we agree on the
labels we give categories
• Can also agree on how the
instances of categories are
• And agree on the labels we give
• Structure aids querying and
captures knowledge with greater
Nucleic Acid Polypeptide
tRNA mRNA smRNA
A Stronger Definition
• a set of logical axioms designed to account for the intended meaning
of a formal vocabulary used to describe a certain (conceptualisation of)
reality [described in an information system) [Guarino 1998]
• “conceptualisation of” inserted by me
• “Logical axioms” means a formal definition of meaning of terms in a
• Formal language—something a computer an reason with
• Use symbols to make inferences
• Symbols represent things and their relationships
• Making inferences about things computationally
So what is an ontology?
After Chris Welty et al
What does it all mean
• To interpret our data we need to know what it is we’re talking
• We need to decide the things that we’re talking about and
agree upon them
• We need to agree on how to recognise those entities
• We need to know how they are related to one another
• Ontologies are a mechanism for describing those entities
and their definitions
• There’s more to knowledge representation than ontologies…
All this knowledge needs
• We want this knowledge in a computational form
• To make the knowledge available for software (and
• To help us develop and manage the (often) complex
Building ontologies is hard (getting all those relationships in
the right place)
The Web Ontology Language (OWL) is a W3C
recommendation for ontologies on the Semantic Web and in
semantically enabled applications
A knowledge representation language with a strict semantics
that is amenable to autoamted reasoning
Web Ontology Language
• W3C recommendation for ontologies for the Semantic
• OWL-DL mapped to a decidable fragment of first order
• Classes, properties and instances
• Boolean operators, plus existential and universal
• Rich class expressions used in restriction on
properties – hasDomain some (ImnunoGlobinDomain
What are we saying?
• Are all instances of Man instances of Person?
• Can an instance of Person be both a Man
and an instance of Woman?
• Can there be any more kinds of Person?
What are we saying?
• What kinds of class can fill “has chromosome”?
• How many “Y chromosome” are present?
• Does their have to be a “Y chromosome”?
• What properties are sufficient to be a Man and which are
Y chromosomeMan has-chromosome
Necessity and Sufficiency
• An R2A phosphatase must have a fibronectin domain
• Having a fibronectin domain does not a phosphatase make
• Necessity -- what must a class instance have?
• Any protein that has a phosphatase catalytic domain is a
• All phosphatase enzymes have a catalytic domain
• Sufficiency – how is an instance recognised to be a member
of a class?
Problems Ontologies in
Biology Try To Solve
• Provenance – where did it come from, who did it?
• Reproducibility – can I repeat and find results
• Sharing – can others understand your data?
• Integration – can I readily take multiple (thousands
of) data sets and use them without preparation?
• New knowledge – can we infer new knowledge as
a sum of current knowledge (computationally)?