The Past, Present and Future of Knowledge in Biology

507 views

Published on

Keynote talk at SMBM 2010

Published in: Science, Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
507
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Slide Title: Literature
    Lots of books in a library
  • Slide Title: Catalogues
    Stack of books listing:
    Genome
    Transcriptome
    Proteome
    Interactome
    Metabolome
    Phenome
  • Slide Title
    Slide contains:
    Book on the left with a plus sign
    Black and white image, man sat at an old valve-style computer (i.e. manchester baby)
    Text saying: genes, proteins, interactions, pathways
    Mouse on the right
    Text below images says:
    (left) Literature
    (middle) complex machines
    (right) Organism
    (bottom) “…. from biological facts, we make a system that is some model of a real thing” - Robert Stevens – 2008
  • All of which helps build better ontologies. But can we actually apply this computational amenability more
    Directly to biological knowledge. In this example, which is work by Katy Wolstencroft, we have codified
    Community knowledge about protein domains in phosphatases in OWL. We then take unknown protein sequences,
    Pass then through interpro and stick them into the instance store, which is basically a database and reasoner tied together
    Qualified Cardiniality!!!
  • Slide Title: Literature
    Lots of books in a library
  • The Past, Present and Future of Knowledge in Biology

    1. 1. The Past, Present and Future of Knowledge in Biology Robert Stevens BioHealth Informatics Group The University of Manchester Manchester United Kingdom Robert.Stevens@manchester.ac.uk
    2. 2. Overview • A look at the state of play • For what are we using ontologies? • What do we count as knowledge? • Doing so much more with knowledge • Stopping text being a dead end
    3. 3. Text and Ontologies: The Terrible Twins of Knowledge in Biology Robert Stevens BioHealth Informatics Group The University of Manchester Manchester United Kingdom Robert.Stevens@manchester.ac.uk
    4. 4. Biology now has lots of facts
    5. 5. Genome Proteome Transcriptome Interactome Metabolome PHENOME Lots of catalogues
    6. 6. Data are only as Good as their Metadata • There is a lot of biology out there… • How these entities are described in our data varies • We don’t even agree on what entities there are to describe in our data • This makes analysing data hard: You have to know what your data represent • …, but also how the entities described in your data relate to each other • We need to describe our data – their metadata
    7. 7. Creating Woods, not Trees Genes Proteins Pathways Interactions Literature Complex Machines Virtual Organism …. from biological facts, we make a system that is some model of a real organism
    8. 8. Timeline
    9. 9. There’s a Lot of it About Searching for “ontology” in five year chunks on the ACM digital portal Searching for “ontology” in five year chunks on the ACM digital portal Searching for “ontology” in five year chunks on PubMed Searching for “ontology” in five year chunks on PubMed
    10. 10. It’s all Gruber’s Fault • “In the context of knowledge sharing, the term ontology means a specification of a conceptualisation. That is, an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set-of-concept-definitions, but more general. And it is certainly a different sense of the word than its use in philosophy.” DOI:10.1006/knac.1993.1008 DOI:10.1006/ijhc.1995.1081
    11. 11. Angels on the head of a pin
    12. 12. Everything with a Blob and Line is called an Ontology • Wide acceptance criteria • Narrow evaluation criteria • Different sort of knowledge for different situations • Different styles of representation; some scruffy and some formal • Representing knowledge in biology is more than ontologies • We could stop calling them ontologies RDF graph RDF graph Database schema Database schema ThesaurusThesaurus OWL Ontology OWL Ontology Formal ontology Formal ontology SKOS vocabulary SKOS vocabulary
    13. 13. Uses of Ontologies
    14. 14. Knowing What We’ve got is so Useful • We could computationally handle lots of data, but we couldn’t do so with what we know about those data • Ontologies so far mainly used for a common tongue so that we can compare • … and it works! • Still getting lots of mileage from ontology annotation • …, But there is so much more
    15. 15. GENERIC GENE ONTOLOGY (GO) TERM FINDERS000003093 MXR1 YPL250C S000004294 SAM3 YIR017C S000003152 MMP1 MET1 Expressed Genes P-value score http://go.princeton.edu/cgi-bin/GOTermFinder
    16. 16. Classifying a Mouse Individual Description: Stops wriggling after 3 sec Has 3 cm tail Mass 10g 10 days old (since birth) Strain C57Bl/6 Class Description: Class:DepressedMouse EquivalentTo:Mouse that (wriggles For <=30 OR swims for <=45) DataTransformation
    17. 17. Short tailed mouse Class:ShortTailedMouse EquivalentTo:Mouse that hasPart EXACTLY 1 (Tail that hasAssay SOME (LengthAssay that hasValue SOME int[<= 20) and hasUnit SOME Millimetre)) SubClassOf: Mouse that hasPart some (Tail that hasQuality SOME Short) • We can recognise an instance of short- tailed mouse, but we also know that it has the quality “short” • Even when the fact isn’t asserted •First bullet
    18. 18. Classifying Proteins >uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa precursor (EC 3.1.3.48) (R-PTP-kappa). MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV……….. InterPro Instance Store Reasoner Translate Codify
    19. 19. OWL’s Automated Reasoners • Demonstrably useful in: – Building ontologies – Querying ontologies – Can automatically annotate – Have made “discoveries” But there is more than OWL’s reasoning
    20. 20. Separation of Knowledge and Software • We realised a long time ago that we needed to separate • We only recently called this knowledge component ontology • We don’t really need to see the ontology • We certainly shouldn’t show people OWL; it “scares the horses” • Ontology for software not humans (L. Hunter)
    21. 21. The Ontology cottage Industry • We’ve industrialised data production • We’ve (to some extent) industrialised data analysis • We’ve not really moved away from hand- crafted, “whittled” ontologies
    22. 22. Can we have Mass Editing of Ontologies? • Probably not; • Computer scientists in love with synchronous editing • …, but not really necessary (see CSCW) • Mass gathering of Knowledge
    23. 23. Mass Gathering of Knowledge and the Application of Patterns or a metamodel http://rightfield.org.uk http://www.e-lico.eu/populous
    24. 24. There’s so much more to Ontology Building than editing Axioms • Gathering knowledge • Adding labels • Adding other human orientated content • Reviewing, checking suggesting • Deploying, using, creating “views” • Ontology comprehension
    25. 25. There’s More to KR than OWL • OWL and its automated reasoners are useful • But there is so much more to KR than ontologies and OWL • Higher order reasoning • Rules • Other sorts of reasoning
    26. 26. Generating natural language Class: HeLa SubClassOf: Cell, bearer_of some 'cervical carcinoma’, derives_from some 'Homo sapiens’, derives_from some cervix, derives_from some 'epithelial cell' OWL HeLa is a cell line. A hela is all of the following: something that is bearer of a cervical carcinoma, something that derives from a homo sapiens, something that derives from an epithelial cell, and something that derives from a cervix. Generated natural language Experimental Factor Ontology (EFO) http://www.ebi.ac.uk/efo
    27. 27. Ontology as book Title: Experimental Factor Ontology Table of Contents Chapter 1. Cell line Chapter 2. Cell type Chapter 3. Chemical Compound Chapter 4. Organism HeLa is a cell line. A hela is all of the following: something that is bearer of a cervical carcinoma, something that derives from a homo sapiens, something that derives from an epithelial cell, and something that derives from a cervix. entry
    28. 28. DataData Types of Knowledge Biologist’s headBiologist’s head PapersPapers DatabasesDatabases OntologiesOntologies ??????
    29. 29. It’s not Just “Things” • Experiments produce data about things • Proteins, genes, chemicals, reactions, diseases, size, shape, speed, …. • As well as this knowledge we have knowledge of how it was done • OBI is still the “things” to do with production • We still need the methods of by which these “things” were deployed • The protocol
    30. 30. Knowledge about an experiment Workflow Run Workflow Run Workflow ProvenanceProvenance Organisationa l Organisationa l Results and Interpretation Results and Interpretation
    31. 31. Workflows are knowledge about methods Get genes in region Get pathways that contain genes Merge data into single files Get gene descriptions Get pathway descriptions Cross-reference ids Methods: 1. A QTL (region of chromosome) is entered into the workflow, specified as base pairs. These base pairs are subsequently used to identify, in the Ensembl database, any genes that lie within this region. 2. Any genes found within this region are subsequently annotated with Entrez and UniProt identifiers. 3. The Entrez and UniProt identifiers are then passed to a KEGG id conversion Web Service, to cross- reference the input ids to KEGG gene identifiers. This enables gene descriptions and biological pathway data to be returned from KEGG. 4. Each KEGG gene id is then used in a search for KEGG pathways. Any pathways found to contain the gene are returned as KEGG pathway ids. 5. Both KEGG gene and pathway ids are then sent to individual services, provided by KEGG, which provide a description of the gene and pathway. 6. The outputs of the workflow are then combined into single flat files, which can be saved locally and used to identify novel pathways and genes within the QTL region.
    32. 32. myExperiment http://www.myexperiment.org
    33. 33. Research Objects MethodMethod DataData IntroductionIntroduction ConclusionsConclusions ResultsResults Human Written WorkflowWorkflow Generated Text Semantically annotated
    34. 34. Model, View, Controller Annotated Data Annotated Data ControllerController ProjectionProjection Text Tables Graphs Steve Pettifer http://utopia.cs.man.ac.uk/
    35. 35. What Next? • Ontologies are not the only fruit • We could stop calling them ontologies • We need to produce “ontologies” faster • We need to do more interesting things with our knowledge • We need to make them pervade our tools • We need then to be “agile” • Open to other forms of KR and other forms of reasoning • Adding to data automatically • Generating our descriptions of data
    36. 36. Acknowledgements • Simon Jupp for the slides • Alan rector and Carole goble • sysMoDB for rightField (Katy Wolstencroft, Stuart Owen, Matt Horridge) • Populous – Simon Jupp • SWAT – richard Power, Sandra Williams and Allan third at the OU • EFO – James Malone and Helen Parkinson • Steve Pettifer for the Utopia and MVC • Paul Fisher and the Taverna team • The myExperiment team at Southampton and Manchester

    ×