Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mungall keynote-biocurator-2017

203 views

Published on

Slides from biocuration keynote presentation

Published in: Science
  • Be the first to comment

  • Be the first to like this

Mungall keynote-biocurator-2017

  1. 1. Chris Mungall Biocuration, Stanford, 2017 2017: AN ONTOLOGY BIOCURATION ODYSSEY chrismungall
  2. 2. Outline  My path towards biocuration  Ontologies past and future  Some final thoughts on biocuration
  3. 3. Edinburgh, Scotland
  4. 4. Which path to AI? (circa 1990s) Knowledge- Based Knowledge- Free statisti cs logic learnin g encodin g Artificial Intelligence Narrow AI Broad AI ‘knowin g that’ ‘knowin g how’ Biologicall y inspired Cognitivel y inspired
  5. 5. - All cats are mammals - All dogs are mammals
  6. 6. - All cats are mammals - All dogs are mammals - Mammals have fur - Dogs like balls - Fido is a dog
  7. 7. ???
  8. 8. Answer: CAT DOES NOT COMPUTE
  9. 9. • Analysis pipeline • Curation tools • Annotation databa From sequence to genome annotation
  10. 10. • Analysis pipeline • Curation tools • Annotation databa Chado Mungall, C. J., Emmert, D. B., & FlyBase Consortium, (2007). A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics, 23(13), i337-346. http://doi.org/10.1093/bioinformatics/btm189 Generalized community tools
  11. 11. • Analysis pipeline • Curation tools • Annotation database • Functional annotation Genomes to function annotation? What does it do?
  12. 12. Gene Ontology: tool for the unification of biology (2000)  Organize generalized biological knowledge as a graph  Attach genes to nodes  Propagate across species  Create gene lists  Interpret high throughput data Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., … Sherlock, G. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1), 25–29. http://doi.org/10.1038/75556
  13. 13. Ontologies as force amplifiers for data domain knowledgedata biocurationexperimen
  14. 14. Don’t worship the monolith PROBLEM: GO and other ontologies were becoming monolithic - lots of implicit overlap with other ontologies, latent structure
  15. 15. Open Biological Ontologies (OBO) http://obofoundry.org 1. Well-integrated Modular ontologies 2. Provide technical and sociotechnological framework for cooperation 4. Allow us to curate all of the things 3. Provide tools, best practices and infrastructure for forging new ontologies @obofoundry
  16. 16. OBO Library PURLs  PURL: Persistent URL  Consistent, predictable, stable and versioned URLs for ontology objects  Can be shortened as compact URIs (CURIEs), e.g. GO:0008150  Can be registered and viewed on OBO site  http://obofoundry.org  Ontology purls  Main ontology, subsets  versionIRIs  Ontology term purls
  17. 17. compound eye ommatidium sense organ eye disc is_a part_of develops from detection of light stimulus involved in visual perception (GO) One ontology to bind them: the Relation Ontology (RO) capable of outer photoreceptor cell part_of http://obofoundry.org/ontology/ro.html lamina monopolar neuron L3 synapsed by
  18. 18. Contributions to and uses of RO virtualflybrain.org globalbioticinteractions.org Osumi-Sutherland, D. (2012). doi:10.1093/bioinformatics/bts113  Has soma location  Has synaptic terminal in  Upstream in neural circuit with  …  Eats  Epiphyte of  Parasite of  Kleptoparasitizes  hyperparasitizes Neurocellular Bioitic interaction  Is model of  Has phenotype  Molecularly controls  Allosteric inhibitor of  causes or contributes to condition  ... David Osumi-Sutherland Anne ThessenMatt Brush Greg Stupp Gene, drug, phenotype >500 relations
  19. 19. What happens when the pieces don’t fit together?
  20. 20. Making the pieces fit together: GO and CHEBI Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., … Lomax, J. (2013). Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14(1), 513. http://doi.org/10.1186/1471-2164-14-513 GO CHEBI • Some relationships didn’t make sense • E.g. nucleotide isa carbohydrate • Acids  conjugate bases Harold Drabkin David Hill Jane Lomax Tanya Berardini Janna Hastings
  21. 21. Making the pieces fit together: GO and CHEBI Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., … Lomax, J. (2013). Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14(1), 513. http://doi.org/10.1186/1471-2164-14-513 GO CHEBI • Fixed many is-as • E.g. nucleotide isa carbohydrate • Acids  conjugate bases + OWL reasoning Harold Drabkin David Hill Jane Lomax Tanya Berardini Janna Hastings GO CHEBI + Design Patterns
  22. 22. lung lung lobular organ parenchymatous organ solid organ pleural sac thoracic cavity organ thoracic cavity abnormal lung morphology abnormal respiratory system morphology Mammalian Phenotype Mouse Anatomy FMA abnormal pulmonary acinus morphology abnormal pulmonary alveolus morphology lung alveolus organ system respiratory system Lower respiratory tract alveolar sac pulmonary acinus organ system respiratory system Human development lung lung bud respiratory primordium pharyngeal region Challenges of multi-species anatomy and phenotypes develops_from part_of is_a (SubClassOf) surrounded_by
  23. 23. The perils of mappings Class A Class B Mapped ? Useful ? FMA: extensor retinaculum of wrist MouseAnatomy: retina Yes No Plant Ontology: Pith Fly Anat: femur MouseAnatomy: medulla MouseAnatomy: femur Yes Yes No No* ZfishAnat: hypophysis MouseAnatomy: pituitary No Yes TAO:fossa AdverseReactions: depression Yes No FMA: colon GAZ: Colón, Panama Yes No Quality: male Chebi: maleate 2(-) Yes No
  24. 24. http://uberon.org • Initial Phase • Bottom-up • Create groupings of terms • Light curation • Next Phase • Top down • 14k classes • Design Patterns • Periodic alignment and feeding back to curators Uberon
  25. 25. http://uberon.org
  26. 26. Uberon for gene expression curation http://bgee.org/
  27. 27. Uberon for gene expression curation http://bgee.org/
  28. 28. dinosaurs, sponges, comb jellies and cephalopods, oh my Thacker, R. W., (2014). The Porifera Ontology (PORO): enhancing sponge systematics with an anatomy ontology. Journal of Biomedical Semantics, 5(1), 39. http://doi.org/10.1186/2041-1480-5-39 Graphic courtesy Nizar Ibrahim, Paul Sereno, et al. Phenotype RCN Wasila Dahdul Bob Thacker obofoundry.org/ ontology/ceph.html obofoundry.org/ ontology/cteno.html
  29. 29. Phenotype and Disease Ontologies  Problem: Many ontologies, vocabularies and condition/phenotype lists:  HP, MP, WBPhenotype, FBcv, TO, VT, FYPO, APO, SNOMED  OMIM, Orphanet, DO, NCIT, MESH, ICD, UMLS, MEDGEN …  ZFIN, Phenoscape: EQ Köhler, S.. (2013).. F1000Research, 1– 12. http://doi.org/10.3410/f1000research.2- Standardized Design Patterns + OWL Reasoning Bayesian OWL Ontology Merging (BOOM) Mungall, C.J et al (2016) kBOOM. bioRxiv 10.1101/048843 Monarch merged ‘upheno’ ontology MonDO Elvira Mitraka Sue Bello Nicole Vasileksky
  30. 30. Combined score Remove off-target and common variants Whole exome Variant Score based on allele frequency and pathological impact Mendelian filters Whole or partial phenome (HPO) Owl Sim Gene phenotype scores Curated Phenotype Data Monarch Integrated KB upheno Curated Orthology, Interaction, .. Data +GENOMISER
  31. 31. Environments
  32. 32. animal- associated soil marine plant- associated sediment aquatic hot spring food cultured freshwater hydrothermal vent terrestrialsludge waste water extremeorganism- associated air microbial mat lite http://obofoundry.org/ontology/envo.html Ramona Walls Pier Luigi Buttigieg
  33. 33. Environments: generalizing beyond microbes https://github.com/cmungall/environmental-conditions
  34. 34. Biological knowledge and curation QC Deegan, J., Dimmer, E., & Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC Bioinformatics, 11(1), 530. http://doi.org/10.1186/1471-2105-11-530 Annotation errors can arise for different reasons - machine error (inappropriate propagation) - human error Previous versions of the GO had various unusual annotations: • Genes in chicken responsible for lactation
  35. 35. Biological knowledge and curation QC Deegan, J., Dimmer, E., & Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC Bioinformatics, 11(1), 530. http://doi.org/10.1186/1471-2105-11-530 Annotation errors can arise for different reasons - machine error (inappropriate propagation) - human error Previous versions of the GO had various unusual annotations: • Genes in chicken responsible for lactation • Genes in slime mold responsible for dorsal fin development
  36. 36. Solution: Taxon constraints Deegan, J., Dimmer, E., & Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC Bioinformatics, 11(1), 530. http://doi.org/10.1186/1471-2105-11-530 Encode taxon constraints as OWL rules in the ontology only in taxon never in taxon Can be propagated across ontologies E.g. dorsal fin only in vertebrata (uberon) dorsal fin never in tetrapod (uberon) lactation only in mammals (go)
  37. 37. Hi, ROBOT  How can we package things up and make them easier to use in ontology/curation QC pipelines?  Enter ROBOT  Design Patterns  Continuous Integration
  38. 38. Next steps for ontology annotation  Existing ontology annotation model:  Bag of terms gene ter m ter m ter m ter m ter m ter m ter m ter m
  39. 39. All GO annotations for (human) beta- catenin:(Molec ular Function branch)
  40. 40. Next generation ontology annotation in Noctua http://noctua.berkeleybop.org/
  41. 41. Generalization to phenotypes http://noctua.berkeleybop.org/
  42. 42. Intelligent Concept Assistant https://github.com/INCATools
  43. 43. Take homes  Knowledge is a force multiplier  Applies to all biocuration work  But pinpoints need for QC  Design for generality  But acknowledge difficulties  Better support required  Biological knowledge is multifaceted and nuanced  Computer scientists have a tendency towards hubris  Biology is our nemesis  Collaborative approach is vital
  44. 44. http://hoodline.com/2016/12/caught-on-camera-self-driving-uber-runs-red- light-in-soma
  45. 45. Curators are…
  46. 46. Acknowledgments  Monarch Initiative: Jeremy Nguyen-Xuan, Kent Shefcheck, Matt Brush, Tom Conlin, Lilly Winfree, Eric Douglass, Jules Jacobsen, Craig McLachan, Suzanna Lewis, Julie McMurry, Dan Keith, Nicole Washington, Nicole Vasilevsky, Nathan Dunn, Harry Hochheiser, William Bone, Neal Boerkel, Damian Smedley, Tudor Groza, Sebastian Koehler, Melissa Haendel, Peter Robinson  GO: Michael Ashburner, David Hill, Paola Roncaglia, David Osumi-Sutherland, Tanya Berardini, Jen Deegan, Jane Lomax, Karen Christie, Pascale Gaudet, Monica Munoz-Torres, Seth Carbon, Eric Douglass, Heiko Dietze, Ruth Loverin, Rachael Huntley, Midori Harris, Harold Drabkin, Kimberley Van Auken, Marc Feuermann, Petra Fey, Jim Hu, Debbie Siegel, Helen Parkinson, Tony Sawford, Stacia Engel, Sylav Poux, Melanie Courtot, Becky Foulger, Emily Dimmer, Rachael Huntley, Huaiyu Mi, Judy Blake, Paul Sternberg, Mike Cherry, Suzi Lewis, Paul Thomas  OBO: Michael Ashburner, Suzanna Lewis, Barry Smith, Richard Scheuermann, Chris Stockert, Jie Zheng, Melanie Courtot, Simon Jupp, Ramona Wall,s Darren Natale, Melissa Haendel, Lynn Schriml, Alan Ruttenberg, Seth Carbon, James Overton, Bjoern Peters, + all contributors  Planteome: Pankaj Jaiswal, Dennis Stevenson, Laurel Cooper, Austin Meier, Marie Angelique Laporte, Elizabeth Arnaud  Uberon: David Osumi-Sutherland, Paula Mabee, Jim Balhoff, Wasila Dahdul, Alex Dececci, Nizar Ibrahim, Paul Sereno, Frederic Bastian, Ann Niknejad, Marc Robinson-Rechavi, David Blackburn, Terry Hayamizu, Yvonne Bradford, Ceri Van Slyke, Alex Diehl, Terry Meehab, Robert Druzinsky, Melissa Haendel  ALL OF THE BIOCURATORSNIH ORIP R24OD011883 NHGRI U41HG 002273 NSF DEB-0956049 DOE DE-AC02-05CH11231 NSF IOS 1340112 NSF DBI 1062404
  47. 47. Give me a place to stand and with a lever I will move the whole world
  48. 48. Uncovering latent meaning in ontologies Mungall, C. J. (2004). Obol: Integrating Language and Meaning in Bio-Ontologies. Comparative and Functional Genomics, 5(7), 509–520. regulation of Notch signaling pathway involved in heart induction relation relation anatomicpathway OWL EXPRESSION HERE ≡ ∃regulates (NSP ⊓ ∃ part-of HI)
  49. 49. Open Biological Ontologies (OBO)  To provide modular building blocks  Not just functional annotation of genes and gene products  Framework, tools and infrastructure for cooperation and harmonization Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., … Lewis, S. (2007). The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol, 25(11), 1251–1255. Functio n (GO) Anatomy Environ ment Chemical s (CHEBI) Phenotyp e and Disease Genes (SO, GENO) Occurs in … http://obofoundry.org
  50. 50. OBO: Modularity Functio n (GO) Gross Anatomy Chemical s (CHEBI) Abnormal Phenotype and Disease Sequenc e Imported into Cell Types
  51. 51. Relations: the glue that holds it together  RO 2005 paper  10 relations  Current RO  >500 relations  Molecular biology  Neurobiology  Biotic interactions  …  Many rules on how relations compose together  Working with wikidata http://obofoundry.org/ontology/ro.html
  52. 52. Beyond the GO Functional Genomics: Gene function Transcriptomics: Gene expression Phenomics: Effects of gene mutations Gene Ontology Anatomy and Stage Ontology Phenotype and Trait Ontology Links genes to What they do Links genes to where they are expressed Links genes to what happens when they are disrupted or when they varyDisease Ontology Environment Ontology
  53. 53. anatomical structure endoderm of forgut lung bud lung respiration organ organ foregut alveolus alveolus of lung organ part FMA:lung MA:lung endoderm GO: respiratory gaseous exchange MA:lung alveolus FMA: pulmonary alveolus is_a (taxon equivalent) develops_from part_of is_a (SubClassOf) capable_of NCBITaxon: Mammalia EHDAA: lung bud only_in_taxon pulmonary acinus alveolar sac lung primordium swim bladder respiratory primordium NCBITaxon: Actinopterygii http://uberon.org Mungall, C. J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 Uberon bridges anatomy ontologies
  54. 54. Uberon for comparative Gene Expression http://bgee.org/
  55. 55. Uberon Core Extensions to other animals… Thacker, R. W., Díaz, M. C., Kerner, A., Vignes-Lebbe, R., Segerdell, E., Haendel, M. a, & Mungall, C. J. (2014). The Porifera Ontology (PORO): enhancing sponge systematics with an anatomy ontology. Journal of Biomedical Semantics, 5(1), 39 Non-model/human extension Porifera Ontology Ctenophore Ontology Cephalopod Ontology http://phenotypercn.org https://github.com/obophenotype/cephalopod-ontology https://github.com/obophenotype/ctenophore-ontology https://github.com/obophenotype/porifera-ontology https://github.com/obophenotype/uberon Arthropod Ontology
  56. 56. http://monarchinitiative.org/analyze/phenotypes/ PhenoGrid: visualizing phenotype matches
  57. 57. The Undiagnosed Disease Patient (UDP) Use Case Clinical Phenotyping (HPO/phenot ips) Exome Sequencing Causative Variant?
  58. 58. https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2 Robinson, P., et al . (2013). Improved exome prioritization of disease genes through cross species phenotype comparison. Genome Research. doi:10.1101/gr.160325.113
  59. 59. TODO DEPRECATED The need for modularization  Growing pains of GO  Terms were added as-needed for curation  Hard to maintain  Scope: Encompassing all of biology is hard  Biochemistry, cell biology, plants, animal development and physiology, …  We needed to modularize  Meanwhile  Other ontologies in the ‘style’ of GO were popping up, for annotating other kinds of data  Challenge: how were we going to coordinate this?
  60. 60. Biological knowledge and curation QC  Taxon constraints  CONCRETE EXAMPLE HERE  Intersection rules  (see Seth’s talk) Deegan, J., Dimmer, E., & Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC Bioinformatics, 11(1), 530. http://doi.org/10.1186/1471-2105-11-530
  61. 61. Knowledge-Based • ice cream derived-from dairy • Ice cream is yummy
  62. 62. Uberon/CL applications and users  Ontology Modularization  GO  CLO  Pheno Ontologies (EQ definitions)  ENVO  Transcriptomics and genome annotation  ENCODE  FANTOM5  LINCS  BgeeDb  Phenomics  Human and Mammalia Phenotype Ontology  Phenotype comparison algorithms  Evolutionary Phenotypes: Phenoscape http://uberon.github.io/about/adopters.html
  63. 63. The path to AI, 1990s  Two goals  Broad AI  Narrow AI  What path to get there?  Knowledge-Based  Explicit Encoding of knowledge about the world  Analytic or deductive reasoning  Mathematical Logic vs Cognitively inspired (neats vs scruffs)  ‘Knowing that’  Knowledge-Free  Machine Learning, Neural Networks  Statistics  Pattern Recognition  Biological Inspired  ‘Knowing how’
  64. 64. Opposites Koehler et al, bioRxiv https://doi.org/10.1101/108977
  65. 65. compound eye ommatidium sense organ eye disc is_a part_of develops from detection of light stimulus involved in visual perception One ontology to bind them: the Relation Ontology (RO) capable of outer photoreceptor cell part_of http://obofoundry.org/ontology/ro.html lamina monopolar neuron L3 synapsed by

×