GIGA2, Munich, March 2015
STRUCTURING
PHENOTYPE DATA:
Chris
Mungall
LBNL,
Berkeley
Gene
Ontology
Lessons from vertebrate
genomes
Web Apollo: http://genomearchitect.org
Desvignes, T., Pontarotti, P., & Bobe, J. (2010).
Nme gene family evolutionary history reveals pre-
metazoan origins and high conservation between
humans and the sea anemone, nematostella
vectensis. PLoS ONE, 5(11).
doi:10.1371/journal.pone.0015506
Genome
structures are
highly
amenable to
comparison
 Can we compute over the architecture of phenomes as we do
for genome architecture?
o What genes affect distal appendage length or shape?
o What are the genes expressed in the mouth during development?
o What structures develop using the same gene regulatory networks as
in bilaterian mouths?
 Current methods
o Text based search of literature and manually gather results
 Time consuming
 Hard to automate
COMPUTING OVER PHENOTYPES
Gene
Every phenotype ever to have existed
expressed
in mouth
Affects appendage length
regulates EMT …
PHENOTYPES: ENDLESS FORMS
PeytoianathorstiAmphipholissquamataPetromyzonmarinus
Bugula
Homosapiens
(withcleftpalate)
MystecetiAplysinaaerophoba
Gastrula(Metazoan)
mouth anusosculum
blastopore
cleft
lip and
palate
Gene “expressed
in mouth”
“affects appendage length”
“long tentacles”
“elongated arms”
FREE TEXT != STRUCTURED
“expressed
around oral
opening”
“expressed
in anterior
end of gut
tube”
ONTOLOGIES: STRUCTURING A DIVERSITY
OF PHENOTYPES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
https://github.com/obophenotype/cephalopod-ontology
mouth
surrounds
ONTOLOGIES FOR MOLECULAR
PHENOTYPES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
Scr Lox5 Antp
Expressed in
mouth
surrounds
GRAPH KNOWLEDGE QUERIES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
Scr Lox5 Antp
Expressed in
mouth
surrounds
“What genes
Are expressed in
structures that develop from
a tentacle bud, or homologs?”
ONTOLOGIES FOR TRAITS
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
mouth
surrounds
shape length++
=
shape of
tentacular club
=
length of
arm IV
 Wild-type phenotypic function:
o The Gene Ontology
 Anatomy:
o Uberon anatomy ontology
APPLICATIONS OF ONTOLOGIES
 For curating the ‘wild type functional phenotypes’
 Genes for over 0.5 million species have associations to GO
terms
 >40,000 terms
o Molecular function
o Cellular component
o Biological Process
 Core and taxon-specific
 Uses include
o Gene set selection
o Term enrichment
THE GENE ONTOLOGY
Gene Ontology: tool for the unification of biology: Ashburner et al. Nature Genetics 25, 25 - 29 (2000)
http://geneontology.org
 Experimental
o Curated from literature
 Automated methods:
o Based on sequence similarity
 E.g. blast2go
o Based on protein features
 Interpro2GO
o Based on phylogenetic evidence
 Ensembl COMPARA
 Panther Families and PAINT
 Typically only applied for
conserved cellular biology
ASSIGNING GENE FUNCTION
Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium.
Briefings in Bioinformatics, 12(5), 449–62. doi:10.1093/bib/bbr042
PAINT
EXTRACTING GENE LISTS AND
INTERPRETING TRANSCRIPTOMIC DATA
Wang, Z., Pascual-Anaya, J., Zadissa, A., Li, W., Niimura, Y., Huang,
Z., … Irie, N. (2013). The draft genomes of soft-shell turtle and
green sea turtle yield insights into the development and evolution
of the turtle-specific body plan. Nature Genetics, 45(6), 701–6.
doi:10.1038/ng.2615
BEYOND THE GO
Functional
Genomics: Gene
function
Transcriptomics:
Gene expression
Phenomics: Effects
of gene mutations
Gene Ontology
Anatomy and Stage
Ontology
Phenotype and Trait
Ontology
Links genes to
What they do
Links genes to
where they
are expressed
Links genes to
what happens
when they are
disrupted
 Core: 14,000 terms
o Bias towards vertebrate systems
 Composite-Metazoan edition: 42,000 terms
o Integrates cell types, developmental stages,
o Species-specific ontologies
 Uses
o Standard reference for animal anatomy
o Linking model organism databases
o Evolutionary systematics (Phenoscape)
o Comparative transcriptomics (Bgee)
o Standardized vocabulary for mammalian
sequencing consortia
o Cross-species phenotype matching (Monarch)
THE UBERON MULTI-SPECIES
COMPARATIVE ANATOMY ONTOLOGY
http://uberon.org
Mungall, C. J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species
anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
PHENOSCAPE: LINKING EVOLUTION TO
GENOMICS USING PHENOTYPE ONTOLOGIES
 Phenotypic knowledgebase
o Linking phenotypes to extant and extinct vertebrate taxa
o Integrate with model organism databases
 Extending Uberon to cover diversity of vertebrates
Haendel, MA, Balhoff JP, ..., Sereno, PC., Mungall, C.J (2014).
Unification of multi-species vertebrate anatomy ontologies for
comparative biology in Uberon. Journal of Biomedical Semantics,
5(1), 21. doi:10.1186/2041-1480-5-21
UBERON FOR COMPARATIVE GENE
EXPRESSION
EXAMPLE OF EXPRESSION DATA
Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence
ENSMUSG
00000071424
Grid2 UBERON:00
00112
sexually
immature
UBERON:00
02979
Purkinje cell
layer of
cerebellar
cortex
high quality
ENSMUSG
00000071424
Grid2 UBERON:00
18241
prime adult UBERON:00
04720
cerebellar
vermis
high quality
Mus_musculus (‘simple’ expression file)
http://bgee.org/?page=download
EXAMPLE OF INFERRED EXPRESSION
DATA
Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence
ENSMUSG
00000071424
Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02979
Purkinje cell layer
of cerebellar cortex
high quality
ENSMUSG
00000071424
Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02129
cerebellar cortex high quality
ENSMUSG
00000071424
Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02979
cerebellum high quality
ENSMUSG
00000071424
Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02028
hindbrain high quality
… …
ENSMUSG
00000071424
Grid2 UBERON:0
018241
prime
adult
UBERON:00
04720
cerebellar vermis high quality
ENSMUSG
00000071424
Grid2 UBERON:0
018241
prime
adult
UBERON:00
04720
cerebellum high quality
… …
Mus_musculus (‘complete’ expression file)
http://bgee.org/?page=download
CURATING A DATABASE OF HOMOLOGY
HYOPTHESES
https://github.com/BgeeDB/anatomical-similarity-annotations
gastrodermis
mouth
choanoderm
osculumhomologous
homologous
Leininger S, Adamski M, …
Adamska M
10.1038/ncomms4905Developmen
tal
Gene expression
evidence
Cnidaria Porifera
ONTOLOGIES FOR DATA
STANDARDIZATION IN SEQUENCING
CONSORTIA
Malladi, V. S., Erickson, D. T., Podduturi, N. R., Rowe, L. D., Chan, E. T., Davidson, J. M., … Hong, E. L. (2015). Ontology application and use at the
ENCODE DCC. Database : The Journal of Biological Databases and Curation, 2015, bav010–. doi:10.1093/database/bav010
Washington, N.L., Stinson, E.O., Perry, M.D. et al. (2011) The modENCODE Data Coordination Center: lessons in harvesting comprehensive
experimental details. Database, 2011, bar023
https://www.encodeproject.org/search/?type=biosample
 Monarch Initiative
o Large knowledgebase connecting genes, genotypes and diseases to
phenotypes
o Find novel linkages between human diseases to model systems
o http://monarchinitiative.org
 Driving use case
o Given a patient with a rare or unique spectrum of abnormal
phenotypes, determine the causative genomic variant(s)
DISEASES AND ABNORMAL PHENOTYPES
Standard Clinical
Exome
Testing Pipeline
Predicts causative variant based on information in genome of patient and
background genomic data
https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2
Robinson, P., et al . (2013). Improved exome prioritization of
disease genes through cross species phenotype comparison.
Genome Research. doi:10.1101/gr.160325.113
http://monarchinitiative.org/analyze/phenotypes/
EXOMISER USES ONTOLOGY-BASED
PHENOTYPE MATCHING
cleft palate = cleft
(attribute)
palate
(structure)+
SOLVING UNDIAGNOSED
DISEASES
Behavioural/
Psychiatric
Abnormality
Thyroid
stimulating
hormone excess
Gait apraxia
Spasticity
increased
exploration in new
environment
increased
dopamine level
hyperactivity
hyperactivity
Behavioral
abnormality
Abnormality of
the endocrine
system
abnormal
locomotor
behavior
Abnormal
voluntary
movement
Patient
phenotypes Sh3kbp1 tm1Ivdi -/-
NIH Undiagnosed Disease Program, patient 2731
 Think about
o How your data will be re-used by others
o How what your doing will scale
 Provide structured metadata for experimental data
o Free text is not enough
o Use ontologies and standardized vocabularies where possible
 Failing to do so will cost you later!
o All major human and model organism omics consortia now enforce
this
 ENCODE, FANTOM, LINCS
o Also major phenotyping projects
 IMPC/KOMP2
LESSONS
 Providing metadata requires the right ontologies or
vocabularies in place
 Make phenotypic knowledge about your favorite system
structured and computable
o This seems daunting, where do I start…?
LESSONS
 Got transcriptome data?
o Bgee will curate it for you!
o Caveat: Your genome must be in Ensembl Genomes
o We are also interested in your homology hypotheses
 Got classic systematics data?
o Talk to me about using Phenoscape infrastructure
BGEE WILL CURATE YOUR
TRANSCRIPTOME DATA
Uberon Core
GOT ANATOMY EXPERTISE? CLAIM AN
INVERTEBRATE MODULE!
Thacker, R. W., Díaz, M. C., Kerner, A., Vignes-Lebbe, R., Segerdell, E.,
Haendel, M. a, & Mungall, C. J. (2014). The Porifera Ontology (PORO):
enhancing sponge systematics with an anatomy ontology. Journal of
Biomedical Semantics, 5(1), 39
Vertebrate
structures
Porifera
Ontology
Ctenophore
Ontology
Cephalopod
Ontology
http://phenotypercn.org
Eric Edsinger, CephSeq
https://github.com/obophenotype/cephalopod-ontology
https://github.com/obophenotype/ctenophore-ontology
https://github.com/obophenotype/porifera-ontology
https://github.com/obophenotype/uberon
Arthropod
Ontology
Noctua
 Curation using multiple
ontologies with a graph
model
oWeb-based, collaborative
oAdvanced GO curation
oPhenotype curation
 Beta available in
summer 2015
ohttp://noctua.berkeleybop.
org
CURATE GENE REGULATORY NETWORKS
AND PHENOTYPES
 Structured metadata is valuable
o Helps build the knowledge graph of invertebrate genomics
o Capture metadata up-front, not after the fact
o Use ontologies where possible
o Don’t repeat mistakes of projects that ignored this advice
 Invertebrate Ontologies at a nascent stage
o This is an opportunity! Get involved!
CONCLUSIONS
 Monarch
o Melissa A Haendel
o Nicole Washington
o Sebastian Kohler
o Harry Hochheiser
o Maryann Martone
o Suzanna Lewis
o Damian Smedley
o Peter Robinson
o William Bone
o Jeremy Nguyen-
Xuan
ACKNOWLEDGMENTS
 Uberon
o Frederic Bastian
o Ann Niknejad
o Marc Robinson-
Rechavi
o Todd Vision
o Jim Balhoff
o Paul Sereno
o Nizar Ibrahim
o Alex Dececchi
o Yvonne Bradford
o Terry Hayamizu
o Robert Druzinsky
 NSF Phenotype RCN
o Paula Mabee
o Suzanna Lewis
o Eva Huala
o Andy Deans
o Erik Segerdell
o Robert Thacker
o Eric Edsinger
o Matt Yoder
o Istvan Miko
o David Osumi-
Sutherland
Toward synthesizing our knowledge of morphology: using ontologies and machine reasoning to extract presence/absence
evolutionary phenotypes across studies. Dececchi TA et al. https://peerj.com/preprints/807/
FORWARD GENOMICS
http://bejerano.stanford.edu/phenotree/public/html/ Hiller et al. 2012 Cell Reports

GIGA2 Structuring Phenotype Data

  • 1.
    GIGA2, Munich, March2015 STRUCTURING PHENOTYPE DATA: Chris Mungall LBNL, Berkeley Gene Ontology Lessons from vertebrate genomes
  • 2.
  • 3.
    Desvignes, T., Pontarotti,P., & Bobe, J. (2010). Nme gene family evolutionary history reveals pre- metazoan origins and high conservation between humans and the sea anemone, nematostella vectensis. PLoS ONE, 5(11). doi:10.1371/journal.pone.0015506 Genome structures are highly amenable to comparison
  • 4.
     Can wecompute over the architecture of phenomes as we do for genome architecture? o What genes affect distal appendage length or shape? o What are the genes expressed in the mouth during development? o What structures develop using the same gene regulatory networks as in bilaterian mouths?  Current methods o Text based search of literature and manually gather results  Time consuming  Hard to automate COMPUTING OVER PHENOTYPES
  • 5.
    Gene Every phenotype everto have existed expressed in mouth Affects appendage length regulates EMT …
  • 6.
  • 7.
    Gene “expressed in mouth” “affectsappendage length” “long tentacles” “elongated arms” FREE TEXT != STRUCTURED “expressed around oral opening” “expressed in anterior end of gut tube”
  • 8.
    ONTOLOGIES: STRUCTURING ADIVERSITY OF PHENOTYPES tentacle tentacular bud circumoral appendage tentacular club sucker arm develops into is a subtype of Is part of homologous arm IV https://github.com/obophenotype/cephalopod-ontology mouth surrounds
  • 9.
    ONTOLOGIES FOR MOLECULAR PHENOTYPES tentacle tentacular bud circumoral appendage tentacular clubsucker arm develops into is a subtype of Is part of homologous arm IV Scr Lox5 Antp Expressed in mouth surrounds
  • 10.
    GRAPH KNOWLEDGE QUERIES tentacle tentacular bud circumoral appendage tentacular clubsucker arm develops into is a subtype of Is part of homologous arm IV Scr Lox5 Antp Expressed in mouth surrounds “What genes Are expressed in structures that develop from a tentacle bud, or homologs?”
  • 11.
    ONTOLOGIES FOR TRAITS tentacle tentacular bud circumoral appendage tentacular clubsucker arm develops into is a subtype of Is part of homologous arm IV mouth surrounds shape length++ = shape of tentacular club = length of arm IV
  • 12.
     Wild-type phenotypicfunction: o The Gene Ontology  Anatomy: o Uberon anatomy ontology APPLICATIONS OF ONTOLOGIES
  • 13.
     For curatingthe ‘wild type functional phenotypes’  Genes for over 0.5 million species have associations to GO terms  >40,000 terms o Molecular function o Cellular component o Biological Process  Core and taxon-specific  Uses include o Gene set selection o Term enrichment THE GENE ONTOLOGY Gene Ontology: tool for the unification of biology: Ashburner et al. Nature Genetics 25, 25 - 29 (2000) http://geneontology.org
  • 14.
     Experimental o Curatedfrom literature  Automated methods: o Based on sequence similarity  E.g. blast2go o Based on protein features  Interpro2GO o Based on phylogenetic evidence  Ensembl COMPARA  Panther Families and PAINT  Typically only applied for conserved cellular biology ASSIGNING GENE FUNCTION Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Briefings in Bioinformatics, 12(5), 449–62. doi:10.1093/bib/bbr042 PAINT
  • 15.
    EXTRACTING GENE LISTSAND INTERPRETING TRANSCRIPTOMIC DATA Wang, Z., Pascual-Anaya, J., Zadissa, A., Li, W., Niimura, Y., Huang, Z., … Irie, N. (2013). The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan. Nature Genetics, 45(6), 701–6. doi:10.1038/ng.2615
  • 16.
    BEYOND THE GO Functional Genomics:Gene function Transcriptomics: Gene expression Phenomics: Effects of gene mutations Gene Ontology Anatomy and Stage Ontology Phenotype and Trait Ontology Links genes to What they do Links genes to where they are expressed Links genes to what happens when they are disrupted
  • 17.
     Core: 14,000terms o Bias towards vertebrate systems  Composite-Metazoan edition: 42,000 terms o Integrates cell types, developmental stages, o Species-specific ontologies  Uses o Standard reference for animal anatomy o Linking model organism databases o Evolutionary systematics (Phenoscape) o Comparative transcriptomics (Bgee) o Standardized vocabulary for mammalian sequencing consortia o Cross-species phenotype matching (Monarch) THE UBERON MULTI-SPECIES COMPARATIVE ANATOMY ONTOLOGY http://uberon.org Mungall, C. J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
  • 18.
    PHENOSCAPE: LINKING EVOLUTIONTO GENOMICS USING PHENOTYPE ONTOLOGIES  Phenotypic knowledgebase o Linking phenotypes to extant and extinct vertebrate taxa o Integrate with model organism databases  Extending Uberon to cover diversity of vertebrates Haendel, MA, Balhoff JP, ..., Sereno, PC., Mungall, C.J (2014). Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon. Journal of Biomedical Semantics, 5(1), 21. doi:10.1186/2041-1480-5-21
  • 19.
    UBERON FOR COMPARATIVEGENE EXPRESSION
  • 20.
    EXAMPLE OF EXPRESSIONDATA Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence ENSMUSG 00000071424 Grid2 UBERON:00 00112 sexually immature UBERON:00 02979 Purkinje cell layer of cerebellar cortex high quality ENSMUSG 00000071424 Grid2 UBERON:00 18241 prime adult UBERON:00 04720 cerebellar vermis high quality Mus_musculus (‘simple’ expression file) http://bgee.org/?page=download
  • 21.
    EXAMPLE OF INFERREDEXPRESSION DATA Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence ENSMUSG 00000071424 Grid2 UBERON:0 000112 sexually immature UBERON:00 02979 Purkinje cell layer of cerebellar cortex high quality ENSMUSG 00000071424 Grid2 UBERON:0 000112 sexually immature UBERON:00 02129 cerebellar cortex high quality ENSMUSG 00000071424 Grid2 UBERON:0 000112 sexually immature UBERON:00 02979 cerebellum high quality ENSMUSG 00000071424 Grid2 UBERON:0 000112 sexually immature UBERON:00 02028 hindbrain high quality … … ENSMUSG 00000071424 Grid2 UBERON:0 018241 prime adult UBERON:00 04720 cerebellar vermis high quality ENSMUSG 00000071424 Grid2 UBERON:0 018241 prime adult UBERON:00 04720 cerebellum high quality … … Mus_musculus (‘complete’ expression file) http://bgee.org/?page=download
  • 22.
    CURATING A DATABASEOF HOMOLOGY HYOPTHESES https://github.com/BgeeDB/anatomical-similarity-annotations gastrodermis mouth choanoderm osculumhomologous homologous Leininger S, Adamski M, … Adamska M 10.1038/ncomms4905Developmen tal Gene expression evidence Cnidaria Porifera
  • 23.
    ONTOLOGIES FOR DATA STANDARDIZATIONIN SEQUENCING CONSORTIA Malladi, V. S., Erickson, D. T., Podduturi, N. R., Rowe, L. D., Chan, E. T., Davidson, J. M., … Hong, E. L. (2015). Ontology application and use at the ENCODE DCC. Database : The Journal of Biological Databases and Curation, 2015, bav010–. doi:10.1093/database/bav010 Washington, N.L., Stinson, E.O., Perry, M.D. et al. (2011) The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details. Database, 2011, bar023 https://www.encodeproject.org/search/?type=biosample
  • 24.
     Monarch Initiative oLarge knowledgebase connecting genes, genotypes and diseases to phenotypes o Find novel linkages between human diseases to model systems o http://monarchinitiative.org  Driving use case o Given a patient with a rare or unique spectrum of abnormal phenotypes, determine the causative genomic variant(s) DISEASES AND ABNORMAL PHENOTYPES
  • 25.
    Standard Clinical Exome Testing Pipeline Predictscausative variant based on information in genome of patient and background genomic data
  • 26.
    https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2 Robinson, P., etal . (2013). Improved exome prioritization of disease genes through cross species phenotype comparison. Genome Research. doi:10.1101/gr.160325.113
  • 27.
    http://monarchinitiative.org/analyze/phenotypes/ EXOMISER USES ONTOLOGY-BASED PHENOTYPEMATCHING cleft palate = cleft (attribute) palate (structure)+
  • 28.
    SOLVING UNDIAGNOSED DISEASES Behavioural/ Psychiatric Abnormality Thyroid stimulating hormone excess Gaitapraxia Spasticity increased exploration in new environment increased dopamine level hyperactivity hyperactivity Behavioral abnormality Abnormality of the endocrine system abnormal locomotor behavior Abnormal voluntary movement Patient phenotypes Sh3kbp1 tm1Ivdi -/- NIH Undiagnosed Disease Program, patient 2731
  • 29.
     Think about oHow your data will be re-used by others o How what your doing will scale  Provide structured metadata for experimental data o Free text is not enough o Use ontologies and standardized vocabularies where possible  Failing to do so will cost you later! o All major human and model organism omics consortia now enforce this  ENCODE, FANTOM, LINCS o Also major phenotyping projects  IMPC/KOMP2 LESSONS
  • 30.
     Providing metadatarequires the right ontologies or vocabularies in place  Make phenotypic knowledge about your favorite system structured and computable o This seems daunting, where do I start…? LESSONS
  • 31.
     Got transcriptomedata? o Bgee will curate it for you! o Caveat: Your genome must be in Ensembl Genomes o We are also interested in your homology hypotheses  Got classic systematics data? o Talk to me about using Phenoscape infrastructure BGEE WILL CURATE YOUR TRANSCRIPTOME DATA
  • 32.
    Uberon Core GOT ANATOMYEXPERTISE? CLAIM AN INVERTEBRATE MODULE! Thacker, R. W., Díaz, M. C., Kerner, A., Vignes-Lebbe, R., Segerdell, E., Haendel, M. a, & Mungall, C. J. (2014). The Porifera Ontology (PORO): enhancing sponge systematics with an anatomy ontology. Journal of Biomedical Semantics, 5(1), 39 Vertebrate structures Porifera Ontology Ctenophore Ontology Cephalopod Ontology http://phenotypercn.org Eric Edsinger, CephSeq https://github.com/obophenotype/cephalopod-ontology https://github.com/obophenotype/ctenophore-ontology https://github.com/obophenotype/porifera-ontology https://github.com/obophenotype/uberon Arthropod Ontology
  • 33.
    Noctua  Curation usingmultiple ontologies with a graph model oWeb-based, collaborative oAdvanced GO curation oPhenotype curation  Beta available in summer 2015 ohttp://noctua.berkeleybop. org CURATE GENE REGULATORY NETWORKS AND PHENOTYPES
  • 34.
     Structured metadatais valuable o Helps build the knowledge graph of invertebrate genomics o Capture metadata up-front, not after the fact o Use ontologies where possible o Don’t repeat mistakes of projects that ignored this advice  Invertebrate Ontologies at a nascent stage o This is an opportunity! Get involved! CONCLUSIONS
  • 35.
     Monarch o MelissaA Haendel o Nicole Washington o Sebastian Kohler o Harry Hochheiser o Maryann Martone o Suzanna Lewis o Damian Smedley o Peter Robinson o William Bone o Jeremy Nguyen- Xuan ACKNOWLEDGMENTS  Uberon o Frederic Bastian o Ann Niknejad o Marc Robinson- Rechavi o Todd Vision o Jim Balhoff o Paul Sereno o Nizar Ibrahim o Alex Dececchi o Yvonne Bradford o Terry Hayamizu o Robert Druzinsky  NSF Phenotype RCN o Paula Mabee o Suzanna Lewis o Eva Huala o Andy Deans o Erik Segerdell o Robert Thacker o Eric Edsinger o Matt Yoder o Istvan Miko o David Osumi- Sutherland
  • 37.
    Toward synthesizing ourknowledge of morphology: using ontologies and machine reasoning to extract presence/absence evolutionary phenotypes across studies. Dececchi TA et al. https://peerj.com/preprints/807/
  • 38.

Editor's Notes

  • #2 http://monarchinitiative.org http://geneontology.org http://phenoscape.org
  • #3 As a bioinformatician I like genomes, there is a certain well-behaved regularity about them. We could be looking at any animal genome here
  • #4 Because of regularity can compare structures of the genome architecture across vast differences. Even if we don’t have conservation at the level of sequence or gene structure, there is a higher order conservation, in that for example genes are always made of exons and introns and UTRs.
  • #5 What if we want to do the same thing using phenotypes as our unit of comparison?
  • #6 Imagine we had a matrix  What do we put as the columns of the matrix?
  • #7 The challenge is the diversity of structures across animals. A diversity in organization. As seen for example in mouths and putative homologs.  How do we capture this diversty? Peytoia nathorsti an Anomalocaridid Amphipholis squamata http://invert-embryo.blogspot.de/2012_04_01_archive.html Oral disc of sea lamprey https://en.wikipedia.org/wiki/Silver_lamprey An individual zooid of the colonial ectoproct Bugula.. Image based on illustration from the BIODIDAC image lib. From http://www.jbiomedsem.com/content/5/1/34/figure/F3
  • #8 We might try free text. It works for humans, it’s unendingly expressive; but it’s largely opaque to machine processing. …The term ‘arm’ is especially ambiguous … we would end up with an enormous list full of redundancy with no structure to it.  Fortunately there is a better way
  • #9 This example shows a structured graph-based representation of some appendage types found in a cephalopod
  • #10 Expression data from http://www.nature.com/nature/journal/v424/n6952/fig_tab/nature01872_F3.html
  • #11 Expression data from http://www.nature.com/nature/journal/v424/n6952/fig_tab/nature01872_F3.html
  • #12 Expression data from http://www.nature.com/nature/journal/v424/n6952/fig_tab/nature01872_F3.html
  • #14 The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data.
  • #15 Mention QfO
  • #16 http://www.nature.com/ng/journal/v45/n6/full/ng.2615.html#f4  The GO is useful for analyzing and kind of experiment that outputs gene sets, but sometimes we need more
  • #18 ‘big data’ projects: Functional ANnoTation Of the Mammalian Genome The FANTOM5 project finds general rules for how cells change from one cell type to another The FANTOM5 project examines how our genome encodes the fantastic diversity of cell types that make up a human
  • #19 What are the developmental and genetic bases of evolutionary differences in morphology across species? Currently it is difficult to approach this question due to a lack of computational tools that allow researchers to integrate developmental genetic and comparative morphological/anatomical data.
  • #20 Bgee is a database to retrieve and compare gene expression patterns between animal species
  • #21 Ligand-gated ion channel, ligand unknown
  • #24 Search for brain returns results even when free text doesn’t mention brain. The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using awide array of experimen- tal techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to com- munity resources and the scientific community. As the volume of data increases, the or- ganization of experimental details becomes increasingly complicated and demands care- ful curation to identify related experiments. Here, we describe the ENCODE DCC’s use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide ex- amples of how ontologies are used to annotate ENCODE metadata and how the annota- tions can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes in- creasingly vital to allow for exploration and comparison of data between different scien- tific projects
  • #26 “Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic (non-coding, not affecting splicing, synonymous or missense mutations annotated as non-pathogenic by prediction algorithms).”
  • #27 Our extension: use phenotypes
  • #28 The profile on the left looks like free text but is actually structured descriptions using ontologies like Uberon. Color indicates how close the match is in phenomespace.
  • #30 In genomics, metadata is often anything that isn’t sequence data. But the metadata is valuable! Metadata is a love letter to the future Especially if you have a variety of tissue types or developmental stages
  • #31 In genomics, metadata is often anything that isn’t sequence data. But the metadata is valuable! Especially if you have a variety of tissue types or developmental stages
  • #34 GO and Phenotype databases currently lack experimental annotations for Invertebrates that are not Dmel or C elegans
  • #38 Figure 4. A) Bird’s Eye View in Mesquite (Maddison and Maddison 2011) 19 showing inferred (green), asserted (blue), and missing (white) data in the 20 synthetic supermatrix for the first 48 taxa (of 1,051) and all 639 characters