Making Semantics do Some Work


Published on

keynote talk at practical Semantic Astronomy (SemAst09), glasgow, 2009

Published in: Science, Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • All of which helps build better ontologies. But can we actually apply this computational amenability more
    Directly to biological knowledge. In this example, which is work by Katy Wolstencroft, we have codified
    Community knowledge about protein domains in phosphatases in OWL. We then take unknown protein sequences,
    Pass then through interpro and stick them into the instance store, which is basically a database and reasoner tied together
    Qualified Cardiniality!!!
  • Showing cell by function, cell by histology etc….
  • Tangled on left, arrow moving to picture of untangled ontology
  • Shows the asserted hierarchy in OWLViz for epiniephrin and prolactin cell (primitives) as well as Secretory and Endocrine Cell (defined)
  • Shows the inferred hierarchy (all parents) for epiniephrin and prolactin cell as well as defined secretory and endocrine Cell. Shows lots of multiple inheritance inferred by the reasoner.
  • Making Semantics do Some Work

    1. 1. Making Semantics do Some Work Robert Stevens BioHealth Informatics Group School of Computer Science University of Manchester
    2. 2. Introduction • What’s the use of highly axiomatised ontological descriptions? • Two use cases: • Classifying instances based on features: New discoveries; • Building a complex terminology. • Cost and benefit. • Conclusions
    3. 3. Protein Classification • Proteins divided into broad functional classes “Protein Families” • Families sub-divided to give family classifications • Class membership can be determined by “protein features”, such as domains, etc. • Resources exist for feature detection via primary sequence– but not class membership • Current Limitation of Automated Tools • Needs human knowledge to recognise class membership
    5. 5. Why Classify? • Classification and curation of a genome is the first step in understanding the processes and functions happening in an organism • Classification enables comparative genomic studies - what is already known in other organisms • The similarities and differences between processes and functions in related organisms often provide the greatest insight into the biology • In silico characterisation is the current bottleneck
    6. 6. Phosphatase Classification • Diagnostic phosphatase domains/motifs – sufficient for membership of the protein phosphatase superfamily • Any protein having a phosphatase domain is a member of the phosphatase super-family • Other motifs determine a protein’s place within the family • Usually needs human to recognise that features detected imply class membership • Can these be captured in an ontology?
    7. 7. OWL represents classes of instances A B C
    8. 8. Necessity and Sufficiency • An R2A phosphatase must have a fibronectin domain • Having a fibronectin domain does not a phosphatase make • Necessity -- what must a class instance have? • Any protein that has a phosphatase catalytic domain is a phosphatase enzyme • All phosphatase enzymes have a catalytic domain • Sufficiency – how is an instance recognised to be a member of a class?
    9. 9. Definition of Tyrosine Phosphatase Class: TyrosineReceptorProteinPhosphatase EquivalentTo: Protein That - contains atLeast 1 ProteinTyrosinePhosphataseDomain and - contains 1 TransmembraneDomain
    10. 10. …there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.
    11. 11. Definition for R2A Phosphatase Class: R2A EquivalentTO: Protein That - contains 2 ProteinTyrosinePhosphataseDomain and - contains 1 TransmembraneDomain and - contains 4 FibronectinDomains and - contains 1 ImmunoglobulinDomain and - contains 1 MAMDomain and - contains 1 Cadherin-LikeDomain and - contains only TyrosinePhosphataseDomain or TransmembraneDomain or FibronectinDomain or ImnunoglobulinDomain or Clathrin-LikeDomain or ManDomain
    12. 12. Automated Reasoning • An OWL-DL ontology mapped to its DL form as a collection of axioms • An automated reasoner checks for satisfiability – throws out the inconsistent and infers subsumption • Defined classes (where there are necessary and sufficient restrictions) enable a reasoner to infer subclass axioms • Also infer to which class an individual belongs
    13. 13. Incremental Addition of Protein Functional Domains Phosphatase catalytic Cadherin-like Immunoglobulin MAM domain Cellular retinaldehyde Adhesion recognition Transmembrane Fibronectin III Glycosylation
    14. 14. Classification of the Classical Tyrosine Phosphatases
    15. 15. What is the Ontology Telling Us? • Each class of phosphatase defined in terms of domain composition • We know the characteristics by which an individual protein can be recognised to be a member of a particular class of phosphatase • We have this knowledge in a computational form • If we had protein instances described in terms of the ontology, we could classify those individual proteins • A catalogue of phosphatases
    16. 16. Description of an Instance of a Protein Individual: P21592 Types: Protein, hasDomain 2 ProteinTyrosinePhosphataseDomain hasdomain 1 TransmembraneDomain,, hasdomain 4 FibronectinDomain, hasDomain 1 ImmunoglobulinDomain, hasdomain 1 MAMDomain, hasdomain 1 Cadherin-LikeDomain
    17. 17. Instance: P21592         TypeOf: Protein That Fact: hasDomain 2 ProteinTyrosinePhosphataseDomain and Fact: hasdomain 1 TransmembraneDomain and  Fact: hasdomain 4 FibronectinDomains and Fact: hasDomain 1 ImmunoglobulinDomain and Fact: hasdomain 1 MAMDomain and Fact: hasdomain 1 Cadherin-LikeDomain Tyrosine Phosphatase (containsDomain some TransmembraneDomain) and (containsDomain at least 1 ProteinTyrosinePhosphataseDomain) tase n some MAMDomain) and n some ProteinTyrosineCatalyticDomain or ImmunoglobulinDomain) and n some FibronectinDomain or FibronectinTypeIIIFoldDomain) and n exactly 2 ProteinTyrosinePhosphataseDomain)
    19. 19. So Far….. • Human phosphatases have been classified using the system • The ontology classification performed equally well as expert classification • The ontology system refined classification - DUSC contains zinc finger domain Characterised and conserved – but not in classification - DUSA contains a disintegrin domain previously uncharacterised – evolutionarily conserved • A new kind of phosphatase?
    20. 20. Aspergillus fumigatus • Phosphatase compliment very different from human >100 human <50 A.fumigatus • Whole subfamilies ‘missing’ Different fungi-specific phosphorylation pathways? No requirement for tissue-specific variations? • Novel serine/threonine phosphatase with homeobox Conserved in aspergillus and closely related species, but not in any other Again, a new phosphatase?
    21. 21. Generic Technique • Feature detection • Categories defined in terms of those features • Produce catalogue of what you currently know • Highlight cases that don’t match current knowledge
    22. 22. The Cell type Ontology • Some 880 terms • Describing cell function, lineage, developmental stage, ploidy, secretion, species,… • Not explicitly classified according to anatomy • Uses is-a and developsFrom • Used to describe cell types used in experiments
    23. 23. OBO Cell Type Ontology
    24. 24. Issues with Current CTO • History: A need was seen and a few days was spent “lashing” together an ontology by hand • Contains lots of knowledge • Asserted multiple inheritance: Humans will make slips and it is difficult • Some biological mistakes • All the knowledge is within the “is-a” relationships and implicit in the cell names
    25. 25. CTO Axes of Classification • Histology: What cells look like • Lineage: Whence a given cell develops • Ploidy: How sets of chromosomes in a cell • Nucleation: How many nuclei • Secretion & accumulation: What chemicals a cells secretes or accumulates • Function: What does the cell do • Location: In anatomy • Species: In what taxa does the cell exist • And some others
    26. 26. Implicit Knowledge • Anatomy: muscle cell; red blood cell • Maturity: immature t-lymphocyte • Cell surface protein: CD45 positive lymphocyte • Size • Shape
    27. 27. Problems • Tangles • Hard to maintain • Difficult to add a new cell • Inflexible queries: What about hormone secreting mesodermal cells? • Information hidden inside term names
    28. 28. A Tangled Ontology of Cars
    29. 29. Describing a Big Blue Ford Car Class: BigBlueFordCar SubClassOf: Car that hasColour some Blue and hasSize some Big and hasManufacturer some Ford
    30. 30. Modules • Choose a primary axis: In this case Vehicle • Other axes are represented in separate modules (Colour, Size (qualities) and manufacturer) • Represent other aspects of classes through restrictions • (Spot the ontological howler in this toy example)
    31. 31. Definition of a Red Car Class: RedCar EquivalentTo: Car that hasColour some Red • Any car that has the colour red is recognised to be a member of the class RedCar • The reasoner works it all out and builds the hierarchy for you
    32. 32. Normalisation • This technique of “pulling” apart tangled ontologies is “normalisation” • Makes for cleaner modelling • Makes for re-usable components • The reasoner builds the taxonomy “completely” • A new car (e.g., yellow Saab” is described and it just appears in the right place
    33. 33. What We Did • Examined CTO • Chose primary axis of classification • All other axes added as restrictions on class membership • Describe cells • Build ontology • Use reasoner
    34. 34. Ontologies Used CTO Ontolog y PATO Ontology GO Biological Process GO Cellular Component Species Taxonomy Anatom y Nucleation Morphology Size Ploidy Muscle Contraction Secretion Bacillus anthracis str. Ames Chloroplast Cell Membrane Epithelium Kidney
    35. 35. Mammalian Red Blood Cell Class: RedBloodCell SubclassOf: Cell That hasNucleation some Anucleate and participatesIn some OxygenTransport and existsIn some mammalia and part_of some BloodTissue and developsFrom some Reticulocyte
    36. 36. Mesodermal Lineage Cells Class: MesodermalLineageCell EquivalentTo: Cell That developsFrom some MesodermalCell (developsFrom is transitive)
    37. 37. Spreadsheet
    38. 38. Workflow Spreadsheet CVS OPPL OWL Ontology Reasoned Ontology
    39. 39. Secreting Cells Class: EpinephrinSecretingCell SubclassOf: Cell That belongs_to_line some Somatic and has_nucleation some mononucleate and has_ploidy some diploid and potentiality some TerminallyDifferentiated and participates_in some EpinephrineBiosyntheticProcess and participates_in EpinephrineSecretion Class: ProlactinSecretingCell SubclassOf: Cell That belongs_to_line some Somatic and has_nucleation some mononucleate and has_ploidy some diploid and potentiality some TerminallyDifferentiated and participates_in some PeptideHormoneSecretion and participates_in some ProlactinSecretion
    40. 40. Defined Cells Class: SecretoryCell EquivalentTo: Cell that participates_in some (secretion or (part_of some secrection) Class: EndocrineCell EquivalentTo: Cell that participates_in some (EndocrineProcess or (part_of some EndocrineProcess)
    41. 41. Asserted Hierarchy
    42. 42. Inferred Hierarchy
    43. 43. What We Found • More subsumption relationships • The “is-a” hierarchy is complete • Explicitness made us ask questions • Found bad structure • Can just slip in a new cell • Can make arbitrary queries based on any of the types of axis
    44. 44. Conclusions • Can use strict semantics and automated reasoning to build structurally sound ontologies • Can catalogue instances and make discoveries • If an object can be recognised by its features and features can be computationally generated classification can be automated • High cost and high benefit
    45. 45. Acknowledgements • Katy Wolstencroft did the protein phosphtase work as part of her Ph.D. • The work on the cell type ontology was udnertaken by members of the EPSRC OntoGenesis Network • All the ontoogy work at Manchester relies on the support and input of the wider BioHealth and Information Management Groups