Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Translating research data into Gene Ontology annotations


Published on

Overview of the gene ontology annotation semantics and best practices for annotation.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Translating research data into Gene Ontology annotations

  1. 1. Translating research data into Gene Ontology annotations Pascale Gaudet SIB – Swiss Institute of Bioinformatics GO Consortium
  2. 2. Ontology Annotations Model of biology Gene Ontology Consortium What we provide A structured representation of biology, composed of: • Classes • Relations • Definitions + = - Antigen binding - Adaptive immune response - Extracellular IGHA1 Immunoglobulin heavy constant alpha 1 - Glutamine-tRNA ligase activity - Translation - Cytoplasm QARS Gln tRNA synthetase Statements about the functions of specific gene products. 3 aspects: • Molecular function • Biological process • Cellular component Representation of current knowledge in a manner that is: • Human understandable • Machine computable
  3. 3. GO “annotations” § An annotation is a statement linking a gene to some aspect of its function (a GO ontology term) § Each annotation is based on some evidence, recorded as part of the annotation § Evidence code (type of evidence) § Reference (published journal article) Examples: Annotation 1: INSR + ‘receptor activity’ Annotation 2: INSR + ‘plasma membrane’ Annotation 3: INSR + ‘insulin receptor signaling pathway’
  4. 4. Semantics of a GO annotation The association of a GO class with a gene product is a statement that means: § molecular function: molecular activities of gene products § cellular component: where gene products are active § biological process: pathways and larger processes made up of the activities of multiple gene products. § In other words, annotations represent the normal, in vivo biological role of gene products
  5. 5. Manual - Literature-based Manual - Sequence-based Algorithmic (unreviewed) How are annotations generated? An computer program analyses a sequences and make a prediction based on some decision criteria, for example: -protein domain (InterPro2GO) - sequence similarity (BLAST2GO) An expert reviews the literature and assigns functions, processes and cellular components to genes products > 500,000 annotations > 65M annotations An expert analyses a sequence and makes a prediction concerning the gene function based on known functions of related sequences The predictions can be based on the known function of evolutionarily related sequences (phylogenetic relationships) > 3M annotations
  6. 6. Manual - Literature-based Evidence types Chibucos MC, Siegele DA, Hu JC, Giglio M. (2017) Evidence and conclusion ontology PMID: 27812948 Manual - Sequence-based Algorithmic (unreviewed) EXP experimental evidence IDA inferred from direct assay IPI inferred from physical interaction IMP inferred from mutant phenotype ISS inferred from sequence similarity ISO inferred from sequence ortholog IBA inferred from biological aspect of ancestor IEA inferred from electronic annotation
  7. 7. Who produces GO annotations? • Model organism databases (SGD, FlyBase, wormbase, MGI, etc) • Generalist databases, for eg UniProtKB, IntAct • Domain-specific projects: Cardiovascular project (UCL), synapse project (VU), etc. • Anyone who wishes to contribute their expertise and data to the project
  8. 8. Best practices for generating literature-based GO annotations § Ensure consistency of usage across a broad consortium of contributors § Improve inferencing capabilities
  9. 9. Focus on the research hypothesis § Use prior knowledge to understand the hypothesis being tested and its relation to the experimental observation Protein Known roles Hypothesis Assay Result Conclusion for GO DDFB (O76075) DNase The nuclease activity of DDFB is required for nuclear DNA fragmentation during apoptosis Apoptotic DNA fragmentation increased in the presence of DDFB DDFB mediates nuclear DNA fragmentation during apoptosis = apoptotic DNA fragmentation (GO:0006309) FOXL2 (P58012) Transcription factor Mutations in FOXL2 are known to cause premature ovarian failure, which may be due to increased apoptosis Apoptotic DNA fragmentation increased in the presence of FOXL2 FOXL2 increases the rate of apoptosis = positive regulation of apoptotic process (GO:0043065)
  10. 10. Annotate the conclusion, not the assay 1) rubidium if often used to assay potassium transport, because the radioactive form is more readily available; - the physiologically relevant substrate is potassium 2) Protein kinases are often tested with non-physiologically relevant substrates, such as histone - if the authors do not discuss the physiological relevance, one cannot annotate the substrate
  11. 11. On the in vivo relevance of phenotypes • Phenotypes can help understand the function of proteins • Phenotypes can insights into mechanisms leading to disease • The scope of the GO, though, is to capture the normal function of proteins Indirect effects of a mutation - RNA polymerase affects essentially all cellular processes (cell proliferation, development, etc) but does not mediate these processes Lack of hypothesis for a role of a protein in a process: - Knockdown of Tmem234 in zebrafish results defects in pronephric glomerulus formation. Annotation by IMP to glomerulus formation is not supported by any cellular/molecular data
  12. 12. Get the wider perspective • Favor a gene-by-gene or pathway-by-pathway approach for curation rather than paper-by-paper • Read recent publications • Remove incorrect annotations based on invalidated hypothesis
  13. 13. Guidelines for high quality annotations • Annotate the conclusion of the experiment • Use the biological context to interpret the experiments • Carefully select publications. Read recent publications • Ensure consistency with existing annotations • Keep annotation up-to date: Remove obsolete annotations
  14. 14. Other approaches for quality control • Annotation consistency exercises • Taxonomic constraints • Co-occurrence of annotations • Phylogenetic annotations • User feedback - from GO website - from PubMed - from databases
  15. 15. GO annotations in PubMed
  16. 16. Annotations for a paper
  17. 17. This talk was based upon
  18. 18. Acknowledgments • GO PIs • Judy Blake • Mike Cherry • Suzanna Lewis • Paul Sternberg • Paul Thomas • GO Handbook contributors • Christophe Dessimoz • Jim Hu • Nives Skunca • Sylvain Poux • Funding • NIH HG002273 (GO)