• Like


Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Uploaded on

Event: Plant and Animal Genomes conference 2012 …

Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley

The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.

1 http://www.ebi.ac.uk/GOA

2 http://www.ebi.ac.uk/QuickGO

3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa

4 ftp://ftp.geneontology.org/pub/go/gene-associations

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • The English language is not precise
  • Electronic annotations are important for non-model species as these do not have a dedicated manual annotation effort
  • Genes differentially expressed during parasitoid attack versus control


  • 1.
      • Rachael Huntley
      • UniProt-GOA
      • EBI
      • PAGXX
      • January 2012
    Introduction to the Gene Ontology and GO annotation resources
  • 2. Why do we need GO?
  • 3. Reasons for the Gene Ontology www.geneontology.org
    • Inconsistency in English language
  • 4. Inconsistency in English languauge
    • Same name for different concepts
    or ?? Cell
  • 5.
    • Comparison is difficult – in particular across species or across databases
    • Just one reason why the Gene Ontology (GO) is is needed…
    • Different names for the same concept
    Eggplant Aubergine Brinjal Melongene Same for biological concepts
  • 6. Reasons for the Gene Ontology www.geneontology.org
    • Inconsistency in English language
    • Increasing amounts of biological data available
    • Increasing amounts of biological data to come
  • 7. Search on ‘DNA repair’... get almost 65,000 results Increasing amounts of biological data available Expansion of sequence information
  • 8. Reasons for the Gene Ontology www.geneontology.org
    • Inconsistency in English language
    • Large datasets need to be interpreted quickly
    • Increasing amounts of biological data available
    • Increasing amounts of biological data to come
  • 9.
    • A way to capture
    • biological knowledge for individual gene products
    • in a written and
    • computable form
    The Gene Ontology
    • A set of concepts
    • and their relationships
    • to each other arranged
    • as a hierarchy
    www.ebi.ac.uk/QuickGO Less specific concepts More specific concepts
  • 10. The Concepts in GO 1. Molecular Function 2. Biological Process 3. Cellular Component An elemental activity or task or job
    • protein kinase activity
    • insulin receptor activity
    A commonly recognised series of events
    • cell division
    Where a gene product is located
    • mitochondrion
    • mitochondrial matrix
    • mitochondrial inner membrane
  • 11. Anatomy of a GO term Unique identifier Term name Definition Synonyms Cross-references
  • 12. Ontology structure
    • Directed acyclic graph
    Terms can have more than one parent
    • Terms are linked by
    • relationships
    is_a part_of regulates (and +/- regulates) www.ebi.ac.uk/QuickGO occurs_in has_part These relationships allow for complex analysis of large datasets
  • 13. http://www.geneontology.org Reactome
  • 14. • Compile the ontologies - currently over 35,000 terms - constantly increasing and improving • Annotate gene products using ontology terms - around 30 groups provide annotations • Provide a public resource of data and tools - regular releases of annotations - tools for browsing/querying annotations and editing the ontology Aims of the GO project
  • 15. GO Annotation
  • 16. UniProt-Gene Ontology Annotation (UniProt-GOA) database at the EBI
    • Largest open-source contributor of annotations to GO
    • Provides annotation for more than 397,000 species
    • Our priority is to annotate the human proteome
  • 17. A GO annotation is … … a statement that a gene product; 1. has a particular molecular function or is involved in a particular biological process or is located within a certain cellular component 2. as determined by a particular method 3. as described in a particular reference P00505 Accession Name GO ID GO term name Reference Evidence code IDA PMID:2731362 aspartate transaminase activity GO:0004069 GOT2
  • 18. Electronic Annotation Manual Annotation UniProt-GOA incorporates annotations made using two methods
    • Quick way of producing large numbers of annotations
    • Annotations use less-specific GO terms
    • Time-consuming process producing lower numbers
    • of annotations
    • Annotations tend to use very specific GO terms
    • Only source of annotation for many non-model organism species
  • 19. 1. Mapping of external concepts to GO terms e.g. InterPro2GO, UniProt Keyword2GO, Enzyme Commission2GO Electronic annotation methods GO:0004715 ; non-membrane spanning protein tyrosine kinase activity
  • 20. Annotations are high-quality and have an explanation of the method (GO_REF) Macaque Mouse Dog Cow Guinea Pig Chimpanzee Rat Chicken Ensembl compara 2. Automatic transfer of manual annotations to orthologs ...and more e.g. Human Arabidopsis Rice Brachypodium Maize Poplar Grape … and more Ensembl compara Electronic annotation methods
  • 21. Manual annotation by GOA
    • High–quality, specific annotations made using:
          • Full text peer-reviewed papers
          • A range of evidence codes to categorise the types of evidence found in a paper
          • e.g. IDA, IMP, IPI
  • 22. * Includes manual annotations integrated from external model organism and specialist groups 920,557 Manual annotations* 110,247,289 Electronic annotations Dec 2011 Statistics Number of annotations in UniProt-GOA database
  • 23. How to access and use GO annotation data
  • 24. Where can you find annotations? UniProtKB Ensembl Entrez gene
  • 25. GO Consortium website Gene Association Files 17 column files containing all information for each annotation http://www.ebi.ac.uk/GOA/downloads.html UniProt-GOA website Numerous species-specific files
  • 26. GO browsers
  • 27. http://www.ebi.ac.uk/QuickGO The EBI's QuickGO browser Search GO terms or proteins Find sets of GO annotations
  • 28.
    • Access gene product functional information
    • Analyse high-throughput genomic or proteomic datasets
    • Validation of experimental techniques
    • Get a broad overview of a proteome
    • Obtain functional information for novel gene products  
    How scientists use the GO Some examples…
  • 29. Term enrichment
    • Most popular type of GO analysis
    • Determines which GO terms are more often associated with a specified list of genes/proteins compared with a control list or rest of genome
    • Many tools available to do this analysis
    • User must decide which is best for their analysis
  • 30. Numerous Third Party Tools www.geneontology.org/GO.tools.shtml
  • 31. Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI. MicroArray data analysis Analysis of high-throughput genomic datasets attacked time control Puparial adhesion Molting cycle Hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabolism Immune response
  • 32. Validation of experimental techniques (Cao et al ., Journal of Proteome Research 2006) Rat liver plasma membrane isolation
  • 33. Annotating novel sequences
    • Can use BLAST queries to find similar sequences with
    • GO annotation which can be transferred to the new sequence
    • Two tools currently available;
    • AmiGO BLAST – searches the GO Consortium database
    • BLAST2GO – searches the NCBI database
  • 34. Using the GO to provide a functional overview for a large dataset
    • Many GO analysis tools use GO slims to give a broad
    • overview of the dataset
    • GO slims are cut-down versions of the GO and
    • contain a subset of the terms in the whole GO
    • GO slims usually contain less-specialised GO terms
  • 35. Slimming the GO using the ‘true path rule’ Many gene products are associated with a large number of descriptive, leaf GO nodes:
  • 36. Slimming the GO using the ‘true path rule’ … however annotations can be mapped up to a smaller set of parent GO terms:
  • 37. GO slims or you can make your own using; Custom slims are available for download; http://www.geneontology.org/GO.slims.shtml
    • AmiGO's GO slimmer
    • QuickGO
    http://www.ebi.ac.uk/QuickGO http://amigo.geneontology.org/cgi-bin/amigo/slimmer
  • 38. www.ebi.ac.uk/QuickGO Map-up annotations with GO slims The EBI's QuickGO browser Search GO terms or proteins Find sets of GO annotations Questions on how to use QuickGO? Contact goa@ebi.ac.uk
  • 39. The UniProt-GOA group Curators: Software developer: Team leaders: Emily Dimmer Rachael Huntley Rolf Apweiler Email: goa@ebi.ac.uk http://www.ebi.ac.uk/GOA Claire O ’Donovan Tony Sawford Yasmin Alam-Faruque Prudence Mutowo
  • 40. Acknowledgements GO Consortium Members of; UniProtKB InterPro IntAct HAMAP Ensembl Ensembl Genomes Funding National Human Genome Research Institute (NHGRI) Kidney Research UK British Heart Foundation EMBL