Knowing what we’re
talking about
Robert Stevens
Bio-health Informatics Group
School of Computer Science
University of manc...
We have an item of data
• 27
• 27 what?
• Units, with what is 27
associated?
• Even if I told you, would
we interpret what...
• text
27mm
• text
tail of
27mm
Mouse tail of 27 mm
• … and we can carry on:
Mouse strain, where was
it raised, on what was it
fed, times, dates, etc.
etc...
What is knowledge?
Heterogeneity is rife
• We agree on units (more or less)…
• We don’t agree on much else when it comes to
labels for the en...
Categories and Category Labels
GO:0000368
U2-type nuclear mRNA 5' splice site recognition
spliceosomal E complex formation...
The Ogden Triangle
“Roast Beef“
Concept
[Ogden, Richards, 1923]
• Humans require words (or at least symbols) to communicat...
We need to know what we’re talking about…
• … if we don’t, our data are useless
• Ifg we are to interpret our data then we...
Manchester Mercury
January 1st 1754 Executed 18
Found Dead 34
Frighted 2
Kill'd by falls and other accidents 55
Kill'd the...
A World of Instances
• The world (of information) is made up of things and lots of them
• Instances, individuals, objects,...
We Put things into
Categories
• All these instances hang about making our world
• Putting these things into categories is ...
We have Labels for the
Categories and their
Instances
• We label categories with symbols: Words
• “Lion” is a category of ...
A Controlled Vocabulary• A specified set of words and phrases for the categories
in which we place instances
• Natural lan...
We also like to Relate Things
Together
• Categories have subcategories
• Instances in one category can be related
in some ...
Categories and sub-
categories
biopolymer
polypeptide Nucleic acid
enzyme
DNA
RNA
Describing Category
Membership
• We can make conditions that any instance must fulfil in order to be a
member of a particu...
Relationships
• These conditions made from a property and a
successor relationship
• isPartOf, hasPart
• isDerivedFrom
• D...
A Structured Controlled
Vocabulary
• Not only can we agree on the
labels we give categories
• Can also agree on how the
in...
A Stronger Definition
• a set of logical axioms designed to account for the intended meaning
of a formal vocabulary used t...
So what is an ontology?
Catalog/
ID
Thesauri
Terms/
glossary
Informal
Is-a
Formal
Is-a
Formal
instance
Frames
(properties)...
What does it all mean
anyway
• To interpret our data we need to know what it is we’re talking
about
• We need to decide th...
All this knowledge needs
representing
• We want this knowledge in a computational form
• To make the knowledge available f...
Web Ontology Language
(OWL)
• W3C recommendation for ontologies for the Semantic
Web
• OWL-DL mapped to a decidable fragme...
What are we saying?
Person
WomanMan
is-ais-a
• Are all instances of Man instances of Person?
• Can an instance of Person b...
What are we saying?
• What kinds of class can fill “has chromosome”?
• How many “Y chromosome” are present?
• Does their h...
OWL represents
classes of
instances
A
B
C
Necessity and Sufficiency
• An R2A phosphatase must have a fibronectin domain
• Having a fibronectin domain does not a pho...
Uses of ontologies
Ontologies in software
Problems Ontologies in
Biology Try To Solve
• Provenance – where did it come from, who did it?
• Reproducibility – can I r...
The rise and rise of
ontologies
What are the prospects for
ontologies
Upcoming SlideShare
Loading in …5
×

Knowing what we’re talking about

385 views

Published on

Invited talk at CSIR, pretoria,2013

Published in: Science, Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
385
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Knowing what we’re talking about

  1. 1. Knowing what we’re talking about Robert Stevens Bio-health Informatics Group School of Computer Science University of manchester Oxford Road Manchester United Kingdom M13 9PL Robert.Stevens@manchester.ac.uk
  2. 2. We have an item of data • 27 • 27 what? • Units, with what is 27 associated? • Even if I told you, would we interpret what I said in the same way? 27
  3. 3. • text 27mm
  4. 4. • text tail of 27mm
  5. 5. Mouse tail of 27 mm • … and we can carry on: Mouse strain, where was it raised, on what was it fed, times, dates, etc. etc. • All this data is necessary to interpret my original number • Even if that metadata exists, we have to agree on the things the numbers describe mouse tail of 27mm
  6. 6. What is knowledge?
  7. 7. Heterogeneity is rife • We agree on units (more or less)… • We don’t agree on much else when it comes to labels for the entities in our domain • If we don’t know what we’re talking about…. • It’s difficult to interpret and exchange data and the results from data
  8. 8. Categories and Category Labels GO:0000368 U2-type nuclear mRNA 5' splice site recognition spliceosomal E complex formation spliceosomal E complex biosynthesis spliceosomal CC complex formation U2-type nuclear mRNA 5'-splice site recognition
  9. 9. The Ogden Triangle “Roast Beef“ Concept [Ogden, Richards, 1923] • Humans require words (or at least symbols) to communicate efficiently. The mapping of words to things is only indirectly possible. We do it by creating concepts that refer to things. • The relation between symbols and things has been described in the form of the meaning triangle:
  10. 10. We need to know what we’re talking about… • … if we don’t, our data are useless • Ifg we are to interpret our data then we need to know what entities it describes • We need to share data and re-use it • We need to find data; compare data; analyse data • We need to know what we know….
  11. 11. Manchester Mercury January 1st 1754 Executed 18 Found Dead 34 Frighted 2 Kill'd by falls and other accidents 55 Kill'd themselves 36 Murdered 3 Overlaid 40 Poisoned 1 Scalded 5 Smothered 1 Stabbed 1 Starved 7 Suffocated 5 Aged 1456 Consumption 3915 Convulsion 5977 Dropsy 794 Fevers 2292 Smallpox 774 Teeth 961 Bit by mad dogs 3 Broken Limbs 5 Bruised 5 Burnt 9 Drowned 86 Excessive Drinking 15 List of diseases & casualties this year 19276 burials 15444 christenings Deaths by centile
  12. 12. A World of Instances • The world (of information) is made up of things and lots of them • Instances, individuals, objects, tokens, particulars. • The Earth is a kind of Planet • Robert Stevens (NE 67 41 58 A) is a Person • All the individual Alpha Haemoglobins in my many Instances of Red Blood Cell • Each cell instance in my Body has copies of some 30,000 Genes • A Word, language, idea, etc. • This Table, those Chairs, • Any Thing with “A”, “The”, “That”, etc. before it….
  13. 13. We Put things into Categories • All these instances hang about making our world • Putting these things into categories is a fundamental part of human cognition • Psychologists study this as concept formation • The same instances are put into a category
  14. 14. We have Labels for the Categories and their Instances • We label categories with symbols: Words • “Lion” is a category of big cat with big teeth • Gene, Protein, Cell, Person, Hydrolase Activity, etc. • …and, as we’ve already seen, each category can have many labels and any particular label can refer to more than one category • Semantic Heterogeneity • “A lion” is an instance in that category • Does the category “Lion” exist? • Lions exist, but the category could just be a human way of talking about lions • … we like putting things into categories
  15. 15. A Controlled Vocabulary• A specified set of words and phrases for the categories in which we place instances • Natural language definitions for those words and phrases • A glossary defines, but doesn’t control • The Uniprot keywords define and control • Control is placed upon which labels are used to represent the categories (concepts) we’ve used to describe the instances in the world • …, but there is nothing about how things in these categories are related Biopolymer DNA Enzyme Nucleic acid mRNA Polypeptide snRNA tRNA
  16. 16. We also like to Relate Things Together • Categories have subcategories • Instances in one category can be related in some way to instances in another • Can relate instances to each other in many different ways • Is-a, part-of, develops-from, etc.axes • We can use these relationships to classify categories • Things in category A are part is • If all instances in category A are also in category B then As are kinds of Bs Biopolymer Nucleic Acid Polypeptide Enzym e DNA RNA tRNA mRNA smRNA
  17. 17. Categories and sub- categories biopolymer polypeptide Nucleic acid enzyme DNA RNA
  18. 18. Describing Category Membership • We can make conditions that any instance must fulfil in order to be a member of a particular category • A Phosphatase must have a phosphatase catalytic domain • A Receptor must have a transmembrane domain • A codon has three nucleotide residues • A limb has part that is a joint • A man has a Y chromosome and an X chromosome • A woman has only an X chromosome
  19. 19. Relationships • These conditions made from a property and a successor relationship • isPartOf, hasPart • isDerivedFrom • DevelopsFrom • isHomologousTo • …and many, many more
  20. 20. A Structured Controlled Vocabulary • Not only can we agree on the labels we give categories • Can also agree on how the instances of categories are related • And agree on the labels we give he relations • Structure aids querying and captures knowledge with greater fidelity Biopolymer Nucleic Acid Polypeptide Enzym e DNA RNA tRNA mRNA smRNA Gene transcribedFrom
  21. 21. A Stronger Definition • a set of logical axioms designed to account for the intended meaning of a formal vocabulary used to describe a certain (conceptualisation of) reality [described in an information system) [Guarino 1998] • “conceptualisation of” inserted by me • “Logical axioms” means a formal definition of meaning of terms in a formal language • Formal language—something a computer an reason with • Use symbols to make inferences • Symbols represent things and their relationships • Making inferences about things computationally
  22. 22. So what is an ontology? Catalog/ ID Thesauri Terms/ glossary Informal Is-a Formal Is-a Formal instance Frames (properties) General Logical constraints Value restrictions Disjointness, Inverse, partof Gene Ontology Mouse Anatomy EcoCyc PharmGKB TAMBIS Arom After Chris Welty et al
  23. 23. What does it all mean anyway • To interpret our data we need to know what it is we’re talking about • We need to decide the things that we’re talking about and agree upon them • We need to agree on how to recognise those entities • We need to know how they are related to one another • Ontologies are a mechanism for describing those entities and their definitions • There’s more to knowledge representation than ontologies…
  24. 24. All this knowledge needs representing • We want this knowledge in a computational form • To make the knowledge available for software (and humans) • To help us develop and manage the (often) complex artefacts Building ontologies is hard (getting all those relationships in the right place) The Web Ontology Language (OWL) is a W3C recommendation for ontologies on the Semantic Web and in semantically enabled applications A knowledge representation language with a strict semantics that is amenable to autoamted reasoning
  25. 25. Web Ontology Language (OWL) • W3C recommendation for ontologies for the Semantic Web • OWL-DL mapped to a decidable fragment of first order logic • Classes, properties and instances • Boolean operators, plus existential and universal quantification • Rich class expressions used in restriction on properties – hasDomain some (ImnunoGlobinDomain or FibronectinDomain)
  26. 26. What are we saying? Person WomanMan is-ais-a • Are all instances of Man instances of Person? • Can an instance of Person be both a Man and an instance of Woman? • Can there be any more kinds of Person?
  27. 27. What are we saying? • What kinds of class can fill “has chromosome”? • How many “Y chromosome” are present? • Does their have to be a “Y chromosome”? • What properties are sufficient to be a Man and which are simply necessary? Y chromosomeMan has-chromosome Y chromosomeMan has-chromosome X chromosomehas-chromosome autosomehas-chromosome 1 1 44
  28. 28. OWL represents classes of instances A B C
  29. 29. Necessity and Sufficiency • An R2A phosphatase must have a fibronectin domain • Having a fibronectin domain does not a phosphatase make • Necessity -- what must a class instance have? • Any protein that has a phosphatase catalytic domain is a phosphatase enzyme • All phosphatase enzymes have a catalytic domain • Sufficiency – how is an instance recognised to be a member of a class?
  30. 30. Uses of ontologies
  31. 31. Ontologies in software
  32. 32. Problems Ontologies in Biology Try To Solve • Provenance – where did it come from, who did it? • Reproducibility – can I repeat and find results reported? • Sharing – can others understand your data? • Integration – can I readily take multiple (thousands of) data sets and use them without preparation? • New knowledge – can we infer new knowledge as a sum of current knowledge (computationally)?
  33. 33. The rise and rise of ontologies
  34. 34. What are the prospects for ontologies

×