TAIR -Using biological ontologies to accelerate progress in plant biology research

TAIR: A Sustainable Community Resource
for Arabidopsis Research
International Conference on Arabidopsis Research (ICAR 2016), GyeongJu, Korea

1. TAIR: a sustainable community resource for Arabidopsis
research (Eva Huala)
2. Using biological ontologies to accelerate progress in plant
biology research (Donghui Li)
3. Community annotation: making your data and publication
more discoverable (Donghui Li)

Using biological ontologies to accelerate
progress in plant biology research
Donghui Li
TAIR/Phoenix Bioinformatics

Every year, an average of:
• Over 3000 Arabidopsis research articles are added
• Over 2000 papers are associated with genes
• Over 400 articles have gene function, expression or
phenotype data extracted
• Over 5000 experiment-based annotations are added
using controlled vocabularies (GO and PO ontologies)
Producing a ‘gold standard’ annotated reference plant genome
Highly structured, searchable, computable
functional annotations

• How do we use biological ontologies to annotate Arabidopsis
gene function?
• How to read/interpret annotations?
• What can you do with these annotations?
Outline

Why do we need ontologies?
Inconsistency in free text:
Different names for the same concept
translation, protein synthesis
Same name for different concepts
Bud initiation?

A Gene Ontology (GO) term
Accession: GO:0006412
Name: translation
Ontology: biological_process
Synonyms: protein anabolism, protein biosynthesis, protein biosynthetic
process, protein formation, protein synthesis, protein translation
Definition: The cellular metabolic process in which a protein is formed,
using the sequence of a mature mRNA molecule to specify the
sequence of amino acids in a polypeptide chain. Translation is
mediated by the ribosome, and begins with the formation of a ternary
complex between aminoacylated initiator methionine tRNA, GTP, and
initiation factor 2, which subsequently associates with the small subunit
of the ribosome and an mRNA. Translation ends with the release of a
polypeptide chain from the ribosome. Source: GOC:go_curators

molecular function: catalytic / binding activities
kinase activity, DNA binding activity
biological process: biological goal or objective
protein translation, mitosis
cellular component: location or complex
nucleus, ribosome, proteasome
More info at www.geneontology.org
Gene Ontology (GO)

Terms in an ontology are connected
is_a
part_of

Annotation at different depth of the ontology
is_a
part_of

Retrieval at higher nodes in the ontology
is_a
part_of

Gene
product GO term
Evidence
code
Anatomy of a GO annotation
Reference

Experimental evidence codes (EXP)
IDA Inferred from Direct Assay (enzyme assays, in situ hybridization)
IMP Inferred from Mutant Phenotype (analysis of visible trait)
IPI Inferred from Physical Interaction (yeast-2-hybrid)
IEP Inferred from Expression Pattern (RT-PCR, Western blot)
IGI Inferred from Genetic Interaction (double mutant analysis)
Examples
http://geneontology.org/page/guide-go-evidence-codes
Commonly used evidence codes

Experimental evidence codes (EXP)
IDA Inferred from Direct Assay (enzyme assays, in situ hybridization)
IMP Inferred from Mutant Phenotype (analysis of visible trait)
IPI Inferred from Physical Interaction (yeast-2-hybrid)
IEP Inferred from Expression Pattern (RT-PCR, Western blot)
IGI Inferred from Genetic Interaction (double mutant analysis)
Computational Analysis Evidence Codes (non-EXP)
ISS Inferred from Sequence or Structural Similarity
- based on published sequence alignment
IEA Inferred from Electronic Annotation
- InterPro2GO
Examples
http://geneontology.org/page/guide-go-evidence-codes
Commonly used evidence codes

Evidence
code
Annotation
counts %
Evidence
code
Annotation
counts %
EXP 95,435 34.7 IDA 56,271 20.4
IEP 6,651 2.4
IGI 4,286 1.6
IMP 19,441 7.1
IPI 8,786 3.2
Non-EXP 179,801 66.2
Total 275,236 101
Summary of Arabidopsis GO annotations in TAIR
Notes: 9,186 unique publications used in EXP annotations
Based on TAIR ATH_GO_GOSLIM.txt 2016-06-05

Based on annotation data as of May 24, 2016
Summary of Arabidopsis GO annotations in TAIR

- Query gene function information
- GO annotation projection
- Functional categorization
- Term enrichment
Application: What can you do with TAIR GO/PO annotations?

Get annotations for individual genes from the TAIR locus page
Gene Ontology
annotations
Plant Ontology
annotations

Get annotations for individual genes from the TAIR locus page
Other functional information:
Gene summary
Polymorphism
Phenotype
Publications
Gene symbols

Get annotations for a list of genes

Find genes annotated to a GO/PO term

Download all GO/PO annotations

Source: http://geneontology.org/page/current-go-statistics 2016-06-03
Rat
Human
Mouse
Arabidopsis
Zebrafish
Worm
Chicken
Fly Yeast
Rice E coli
GO annotations by species

Annotating new plant genomes by projecting GO terms from Arabidopsis
onto other non-model plant species based on gene orthology
EnsemblPlants Compara
• Use the Compara pipeline to build orthology
• Automatically transfer GO annotations to plant orthologs
Rules
 at least a 40% peptide identity to each other
 only GO annotations with an evidence type of IDA, IEP, IGI,
IMP or IPI are projected
 no annotations with a 'NOT' qualifier are projected
 annotations to the GO:0005515 protein binding term are not
projected

TAIR’s functional categorization tool

Cellular
component
Molecular
function
Biological
process

Biological
process
Functional category Gene count
Overrepresentation statistical test:
In my list of genes, are any functional classes (for
example a GO process) found more often than
expected when compared with the reference list?
Term enrichment analysis

GOC provides a term enrichment tool powered by PANTHER
pantherdb.org geneontology.org

Input 1
Input 2
ID
Mapping
Use up-to-date
annotations

Output 168/26684=0.63%
0.63%x442=2.78

Model for the regulation of long-term drought
responses in Q. suber root
Model for ABA-dependent drought response in cork oak

1 The main activity of TAIR curators is producing a ‘gold standard’
annotated reference genome dataset by integrating
experimental data from the research literature. New annotations
are constantly added.
2 One common use of TAIR is to infer the function of genes in
agriculturally important species based on orthology to
Arabidopsis genes.
3 TAIR’s annotations are used in applications such as functional
categorization, term enrichment. It is important to use the latest
annotation file from TAIR.
Summary

Community annotation: making your data and
publication more discoverable
Donghui Li

Why should everyone participate -
increased exposure of your work

1.Pre-publication: register your
gene symbol to minimize
accidental duplications in gene
nomenclature
2.Preparing your manuscript:
include AGI locus identifiers
3. Post-publication: submit your
annotation to us (any journal)
Tips to make your research more discoverable

AT1G56650 PAP1 PRODUCTION OF ANTHOCYANIN PIGMENT 1
AT2G01180 PAP1 PHOSPHATIDIC ACID PHOSPHATASE 1
AT2G27190 PAP1 PURPLE ACID PHOSPHATASE 1
AT3G16500 PAP1 PHYTOCHROME-ASSOCIATED PROTEIN 1
Gene name duplication make it harder to find the
right gene

Plant Cell Physiol. 2010 Jun;51(6):866-76
Plant Cell Physiol. Jun;51(6):877-83
Conflicting nomenclature / error in publication not
uncommon

PMID:21447788
Mandatory requirement for publishing in some journals
Always include AGI codes

Requires a login so we can credit submitter
no subscription required
Video tutorial

Provide ‘evidence with’ as comments

• “I do profit a lot from the data on TAIR, thus
this submission is a small contribution to
extend the data present on TAIR.”
• “I gratefully did it [data submission] because I
already benefit from similar information for
other genes.”
Community feedback

AT3G25070
AT2G32700
IPI - protein interacting partner
IGI - other mutated loci in a double,
triple mutant
Some (but not all) annotations have supporting information in
the Evidence with field

Pay attention to the NOT qualifier in relationship type

TAIR -Using biological ontologies to accelerate progress in plant biology research

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to TAIR -Using biological ontologies to accelerate progress in plant biology research

Similar to TAIR -Using biological ontologies to accelerate progress in plant biology research (20)

More from Phoenix Bioinformatics

More from Phoenix Bioinformatics (9)

Recently uploaded

Recently uploaded (20)

TAIR -Using biological ontologies to accelerate progress in plant biology research