Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
GIGA2, Munich, March 2015
STRUCTURING
PHENOTYPE DATA:
Chris
Mungall
LBNL,
Berkeley
Gene
Ontology
Lessons from vertebrate
g...
Web Apollo: http://genomearchitect.org
Desvignes, T., Pontarotti, P., & Bobe, J. (2010).
Nme gene family evolutionary history reveals pre-
metazoan origins and h...
 Can we compute over the architecture of phenomes as we do
for genome architecture?
o What genes affect distal appendage ...
Gene
Every phenotype ever to have existed
expressed
in mouth
Affects appendage length
regulates EMT …
PHENOTYPES: ENDLESS FORMS
PeytoianathorstiAmphipholissquamataPetromyzonmarinus
Bugula
Homosapiens
(withcleftpalate)
Mystec...
Gene “expressed
in mouth”
“affects appendage length”
“long tentacles”
“elongated arms”
FREE TEXT != STRUCTURED
“expressed
...
ONTOLOGIES: STRUCTURING A DIVERSITY
OF PHENOTYPES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
...
ONTOLOGIES FOR MOLECULAR
PHENOTYPES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
...
GRAPH KNOWLEDGE QUERIES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype...
ONTOLOGIES FOR TRAITS
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype o...
 Wild-type phenotypic function:
o The Gene Ontology
 Anatomy:
o Uberon anatomy ontology
APPLICATIONS OF ONTOLOGIES
 For curating the ‘wild type functional phenotypes’
 Genes for over 0.5 million species have associations to GO
terms
 ...
 Experimental
o Curated from literature
 Automated methods:
o Based on sequence similarity
 E.g. blast2go
o Based on pr...
EXTRACTING GENE LISTS AND
INTERPRETING TRANSCRIPTOMIC DATA
Wang, Z., Pascual-Anaya, J., Zadissa, A., Li, W., Niimura, Y., ...
BEYOND THE GO
Functional
Genomics: Gene
function
Transcriptomics:
Gene expression
Phenomics: Effects
of gene mutations
Gen...
 Core: 14,000 terms
o Bias towards vertebrate systems
 Composite-Metazoan edition: 42,000 terms
o Integrates cell types,...
PHENOSCAPE: LINKING EVOLUTION TO
GENOMICS USING PHENOTYPE ONTOLOGIES
 Phenotypic knowledgebase
o Linking phenotypes to ex...
UBERON FOR COMPARATIVE GENE
EXPRESSION
EXAMPLE OF EXPRESSION DATA
Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence
ENSMUSG
00000071424
Grid2 UBERON:00
...
EXAMPLE OF INFERRED EXPRESSION
DATA
Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence
ENSMUSG
00000071424
Grid2 U...
CURATING A DATABASE OF HOMOLOGY
HYOPTHESES
https://github.com/BgeeDB/anatomical-similarity-annotations
gastrodermis
mouth
...
ONTOLOGIES FOR DATA
STANDARDIZATION IN SEQUENCING
CONSORTIA
Malladi, V. S., Erickson, D. T., Podduturi, N. R., Rowe, L. D....
 Monarch Initiative
o Large knowledgebase connecting genes, genotypes and diseases to
phenotypes
o Find novel linkages be...
Standard Clinical
Exome
Testing Pipeline
Predicts causative variant based on information in genome of patient and
backgrou...
https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2
Robinson, P., et al . (2013). Improved exome priorit...
http://monarchinitiative.org/analyze/phenotypes/
EXOMISER USES ONTOLOGY-BASED
PHENOTYPE MATCHING
cleft palate = cleft
(att...
SOLVING UNDIAGNOSED
DISEASES
Behavioural/
Psychiatric
Abnormality
Thyroid
stimulating
hormone excess
Gait apraxia
Spastici...
 Think about
o How your data will be re-used by others
o How what your doing will scale
 Provide structured metadata for...
 Providing metadata requires the right ontologies or
vocabularies in place
 Make phenotypic knowledge about your favorit...
 Got transcriptome data?
o Bgee will curate it for you!
o Caveat: Your genome must be in Ensembl Genomes
o We are also in...
Uberon Core
GOT ANATOMY EXPERTISE? CLAIM AN
INVERTEBRATE MODULE!
Thacker, R. W., Díaz, M. C., Kerner, A., Vignes-Lebbe, R....
Noctua
 Curation using multiple
ontologies with a graph
model
oWeb-based, collaborative
oAdvanced GO curation
oPhenotype ...
 Structured metadata is valuable
o Helps build the knowledge graph of invertebrate genomics
o Capture metadata up-front, ...
 Monarch
o Melissa A Haendel
o Nicole Washington
o Sebastian Kohler
o Harry Hochheiser
o Maryann Martone
o Suzanna Lewis
...
Toward synthesizing our knowledge of morphology: using ontologies and machine reasoning to extract presence/absence
evolut...
FORWARD GENOMICS
http://bejerano.stanford.edu/phenotree/public/html/ Hiller et al. 2012 Cell Reports
GIGA2 Structuring Phenotype Data
Upcoming SlideShare
Loading in …5
×

GIGA2 Structuring Phenotype Data

480 views

Published on

Slides from GIGA2 presentation on phenotype annotation for invertebrate genomes http://giga.nova.edu/index.php/news/40-giga-workshop-ii-summary

Published in: Science
  • Be the first to comment

  • Be the first to like this

GIGA2 Structuring Phenotype Data

  1. 1. GIGA2, Munich, March 2015 STRUCTURING PHENOTYPE DATA: Chris Mungall LBNL, Berkeley Gene Ontology Lessons from vertebrate genomes
  2. 2. Web Apollo: http://genomearchitect.org
  3. 3. Desvignes, T., Pontarotti, P., & Bobe, J. (2010). Nme gene family evolutionary history reveals pre- metazoan origins and high conservation between humans and the sea anemone, nematostella vectensis. PLoS ONE, 5(11). doi:10.1371/journal.pone.0015506 Genome structures are highly amenable to comparison
  4. 4.  Can we compute over the architecture of phenomes as we do for genome architecture? o What genes affect distal appendage length or shape? o What are the genes expressed in the mouth during development? o What structures develop using the same gene regulatory networks as in bilaterian mouths?  Current methods o Text based search of literature and manually gather results  Time consuming  Hard to automate COMPUTING OVER PHENOTYPES
  5. 5. Gene Every phenotype ever to have existed expressed in mouth Affects appendage length regulates EMT …
  6. 6. PHENOTYPES: ENDLESS FORMS PeytoianathorstiAmphipholissquamataPetromyzonmarinus Bugula Homosapiens (withcleftpalate) MystecetiAplysinaaerophoba Gastrula(Metazoan) mouth anusosculum blastopore cleft lip and palate
  7. 7. Gene “expressed in mouth” “affects appendage length” “long tentacles” “elongated arms” FREE TEXT != STRUCTURED “expressed around oral opening” “expressed in anterior end of gut tube”
  8. 8. ONTOLOGIES: STRUCTURING A DIVERSITY OF PHENOTYPES tentacle tentacular bud circumoral appendage tentacular club sucker arm develops into is a subtype of Is part of homologous arm IV https://github.com/obophenotype/cephalopod-ontology mouth surrounds
  9. 9. ONTOLOGIES FOR MOLECULAR PHENOTYPES tentacle tentacular bud circumoral appendage tentacular club sucker arm develops into is a subtype of Is part of homologous arm IV Scr Lox5 Antp Expressed in mouth surrounds
  10. 10. GRAPH KNOWLEDGE QUERIES tentacle tentacular bud circumoral appendage tentacular club sucker arm develops into is a subtype of Is part of homologous arm IV Scr Lox5 Antp Expressed in mouth surrounds “What genes Are expressed in structures that develop from a tentacle bud, or homologs?”
  11. 11. ONTOLOGIES FOR TRAITS tentacle tentacular bud circumoral appendage tentacular club sucker arm develops into is a subtype of Is part of homologous arm IV mouth surrounds shape length++ = shape of tentacular club = length of arm IV
  12. 12.  Wild-type phenotypic function: o The Gene Ontology  Anatomy: o Uberon anatomy ontology APPLICATIONS OF ONTOLOGIES
  13. 13.  For curating the ‘wild type functional phenotypes’  Genes for over 0.5 million species have associations to GO terms  >40,000 terms o Molecular function o Cellular component o Biological Process  Core and taxon-specific  Uses include o Gene set selection o Term enrichment THE GENE ONTOLOGY Gene Ontology: tool for the unification of biology: Ashburner et al. Nature Genetics 25, 25 - 29 (2000) http://geneontology.org
  14. 14.  Experimental o Curated from literature  Automated methods: o Based on sequence similarity  E.g. blast2go o Based on protein features  Interpro2GO o Based on phylogenetic evidence  Ensembl COMPARA  Panther Families and PAINT  Typically only applied for conserved cellular biology ASSIGNING GENE FUNCTION Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Briefings in Bioinformatics, 12(5), 449–62. doi:10.1093/bib/bbr042 PAINT
  15. 15. EXTRACTING GENE LISTS AND INTERPRETING TRANSCRIPTOMIC DATA Wang, Z., Pascual-Anaya, J., Zadissa, A., Li, W., Niimura, Y., Huang, Z., … Irie, N. (2013). The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan. Nature Genetics, 45(6), 701–6. doi:10.1038/ng.2615
  16. 16. BEYOND THE GO Functional Genomics: Gene function Transcriptomics: Gene expression Phenomics: Effects of gene mutations Gene Ontology Anatomy and Stage Ontology Phenotype and Trait Ontology Links genes to What they do Links genes to where they are expressed Links genes to what happens when they are disrupted
  17. 17.  Core: 14,000 terms o Bias towards vertebrate systems  Composite-Metazoan edition: 42,000 terms o Integrates cell types, developmental stages, o Species-specific ontologies  Uses o Standard reference for animal anatomy o Linking model organism databases o Evolutionary systematics (Phenoscape) o Comparative transcriptomics (Bgee) o Standardized vocabulary for mammalian sequencing consortia o Cross-species phenotype matching (Monarch) THE UBERON MULTI-SPECIES COMPARATIVE ANATOMY ONTOLOGY http://uberon.org Mungall, C. J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
  18. 18. PHENOSCAPE: LINKING EVOLUTION TO GENOMICS USING PHENOTYPE ONTOLOGIES  Phenotypic knowledgebase o Linking phenotypes to extant and extinct vertebrate taxa o Integrate with model organism databases  Extending Uberon to cover diversity of vertebrates Haendel, MA, Balhoff JP, ..., Sereno, PC., Mungall, C.J (2014). Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon. Journal of Biomedical Semantics, 5(1), 21. doi:10.1186/2041-1480-5-21
  19. 19. UBERON FOR COMPARATIVE GENE EXPRESSION
  20. 20. EXAMPLE OF EXPRESSION DATA Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence ENSMUSG 00000071424 Grid2 UBERON:00 00112 sexually immature UBERON:00 02979 Purkinje cell layer of cerebellar cortex high quality ENSMUSG 00000071424 Grid2 UBERON:00 18241 prime adult UBERON:00 04720 cerebellar vermis high quality Mus_musculus (‘simple’ expression file) http://bgee.org/?page=download
  21. 21. EXAMPLE OF INFERRED EXPRESSION DATA Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence ENSMUSG 00000071424 Grid2 UBERON:0 000112 sexually immature UBERON:00 02979 Purkinje cell layer of cerebellar cortex high quality ENSMUSG 00000071424 Grid2 UBERON:0 000112 sexually immature UBERON:00 02129 cerebellar cortex high quality ENSMUSG 00000071424 Grid2 UBERON:0 000112 sexually immature UBERON:00 02979 cerebellum high quality ENSMUSG 00000071424 Grid2 UBERON:0 000112 sexually immature UBERON:00 02028 hindbrain high quality … … ENSMUSG 00000071424 Grid2 UBERON:0 018241 prime adult UBERON:00 04720 cerebellar vermis high quality ENSMUSG 00000071424 Grid2 UBERON:0 018241 prime adult UBERON:00 04720 cerebellum high quality … … Mus_musculus (‘complete’ expression file) http://bgee.org/?page=download
  22. 22. CURATING A DATABASE OF HOMOLOGY HYOPTHESES https://github.com/BgeeDB/anatomical-similarity-annotations gastrodermis mouth choanoderm osculumhomologous homologous Leininger S, Adamski M, … Adamska M 10.1038/ncomms4905Developmen tal Gene expression evidence Cnidaria Porifera
  23. 23. ONTOLOGIES FOR DATA STANDARDIZATION IN SEQUENCING CONSORTIA Malladi, V. S., Erickson, D. T., Podduturi, N. R., Rowe, L. D., Chan, E. T., Davidson, J. M., … Hong, E. L. (2015). Ontology application and use at the ENCODE DCC. Database : The Journal of Biological Databases and Curation, 2015, bav010–. doi:10.1093/database/bav010 Washington, N.L., Stinson, E.O., Perry, M.D. et al. (2011) The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details. Database, 2011, bar023 https://www.encodeproject.org/search/?type=biosample
  24. 24.  Monarch Initiative o Large knowledgebase connecting genes, genotypes and diseases to phenotypes o Find novel linkages between human diseases to model systems o http://monarchinitiative.org  Driving use case o Given a patient with a rare or unique spectrum of abnormal phenotypes, determine the causative genomic variant(s) DISEASES AND ABNORMAL PHENOTYPES
  25. 25. Standard Clinical Exome Testing Pipeline Predicts causative variant based on information in genome of patient and background genomic data
  26. 26. https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2 Robinson, P., et al . (2013). Improved exome prioritization of disease genes through cross species phenotype comparison. Genome Research. doi:10.1101/gr.160325.113
  27. 27. http://monarchinitiative.org/analyze/phenotypes/ EXOMISER USES ONTOLOGY-BASED PHENOTYPE MATCHING cleft palate = cleft (attribute) palate (structure)+
  28. 28. SOLVING UNDIAGNOSED DISEASES Behavioural/ Psychiatric Abnormality Thyroid stimulating hormone excess Gait apraxia Spasticity increased exploration in new environment increased dopamine level hyperactivity hyperactivity Behavioral abnormality Abnormality of the endocrine system abnormal locomotor behavior Abnormal voluntary movement Patient phenotypes Sh3kbp1 tm1Ivdi -/- NIH Undiagnosed Disease Program, patient 2731
  29. 29.  Think about o How your data will be re-used by others o How what your doing will scale  Provide structured metadata for experimental data o Free text is not enough o Use ontologies and standardized vocabularies where possible  Failing to do so will cost you later! o All major human and model organism omics consortia now enforce this  ENCODE, FANTOM, LINCS o Also major phenotyping projects  IMPC/KOMP2 LESSONS
  30. 30.  Providing metadata requires the right ontologies or vocabularies in place  Make phenotypic knowledge about your favorite system structured and computable o This seems daunting, where do I start…? LESSONS
  31. 31.  Got transcriptome data? o Bgee will curate it for you! o Caveat: Your genome must be in Ensembl Genomes o We are also interested in your homology hypotheses  Got classic systematics data? o Talk to me about using Phenoscape infrastructure BGEE WILL CURATE YOUR TRANSCRIPTOME DATA
  32. 32. Uberon Core GOT ANATOMY EXPERTISE? CLAIM AN INVERTEBRATE MODULE! Thacker, R. W., Díaz, M. C., Kerner, A., Vignes-Lebbe, R., Segerdell, E., Haendel, M. a, & Mungall, C. J. (2014). The Porifera Ontology (PORO): enhancing sponge systematics with an anatomy ontology. Journal of Biomedical Semantics, 5(1), 39 Vertebrate structures Porifera Ontology Ctenophore Ontology Cephalopod Ontology http://phenotypercn.org Eric Edsinger, CephSeq https://github.com/obophenotype/cephalopod-ontology https://github.com/obophenotype/ctenophore-ontology https://github.com/obophenotype/porifera-ontology https://github.com/obophenotype/uberon Arthropod Ontology
  33. 33. Noctua  Curation using multiple ontologies with a graph model oWeb-based, collaborative oAdvanced GO curation oPhenotype curation  Beta available in summer 2015 ohttp://noctua.berkeleybop. org CURATE GENE REGULATORY NETWORKS AND PHENOTYPES
  34. 34.  Structured metadata is valuable o Helps build the knowledge graph of invertebrate genomics o Capture metadata up-front, not after the fact o Use ontologies where possible o Don’t repeat mistakes of projects that ignored this advice  Invertebrate Ontologies at a nascent stage o This is an opportunity! Get involved! CONCLUSIONS
  35. 35.  Monarch o Melissa A Haendel o Nicole Washington o Sebastian Kohler o Harry Hochheiser o Maryann Martone o Suzanna Lewis o Damian Smedley o Peter Robinson o William Bone o Jeremy Nguyen- Xuan ACKNOWLEDGMENTS  Uberon o Frederic Bastian o Ann Niknejad o Marc Robinson- Rechavi o Todd Vision o Jim Balhoff o Paul Sereno o Nizar Ibrahim o Alex Dececchi o Yvonne Bradford o Terry Hayamizu o Robert Druzinsky  NSF Phenotype RCN o Paula Mabee o Suzanna Lewis o Eva Huala o Andy Deans o Erik Segerdell o Robert Thacker o Eric Edsinger o Matt Yoder o Istvan Miko o David Osumi- Sutherland
  36. 36. Toward synthesizing our knowledge of morphology: using ontologies and machine reasoning to extract presence/absence evolutionary phenotypes across studies. Dececchi TA et al. https://peerj.com/preprints/807/
  37. 37. FORWARD GENOMICS http://bejerano.stanford.edu/phenotree/public/html/ Hiller et al. 2012 Cell Reports

×