The document summarizes a workshop on using the Arabidopsis Information Resource (TAIR) database. TAIR is a curated database for the model organism Arabidopsis thaliana that contains gene functional annotations, phenotypes, and expression data. The workshop outlined features of TAIR including searching for gene information, analyzing gene sets, viewing genome browsers, and submitting data. It highlighted recent literature curation efforts, new locus page features, and ways to engage with the community and share data in a FAIR manner.
2. Outline
• Brief introduction to TAIR
• Using TAIR to find information about
Arabidopsis genes
• Searching and new locus page features
• Analyzing sets of genes
• Genome browsers (JBrowse)
• Community Engagement
• Data submission-old and new ways
3. The Arabidopsis Information Resource (TAIR)
www.arabidopsis.org
• Est 1999- only continuously curated database for Arabidopsis thaliana
• Manual curation of gene function data from experimental literature (by curators
like me)
• Genome structural annotation (TIGR, TAIR, Araport and now??)
• 2013 TAIR staff founded non-profit Phoenix and established a subscription based
funding to ensure longevity and continued curation
• Data is most current in database. Quarterly releases of year old data for bulk
consumption by groups like BAR, Gramene, etc…
• Quarterly subscriber releases are current (Stanford subscribes)
4. Biological Role/Activity (Gene
Ontology Annotations)
• GO Biological Process
• GO Molecular Function
• GO Cellular Component
Expression (Plant Ontology
Annotations)
• PO Structure
• PO Developmental Stage
Alleles and Phenotypes Nomenclature/Symbols and
curated summaries
Arabidopsis
Genes
Types of function information captured from the literature at TAIR
5. Literature curation efforts for 2019
• Citations: 4102
• Genes linked to articles: 13127
• Curated Gene Symbols: 856
• 221 alleles added and 617 updated.
• 188 Germplasms added and 205 updated
• 557 new phenotypes added and 1391 phenotypes linked to germplasms
• Updated Locus summaries: 762
• Articles used for GO /PO annotation: 475
• Numerous updates to GO annotations (additions,updates, deletions)
6. What is The Function of my gene (s)?
name
keyword
sequence
7.
8. TAIR Locus Page Highlights
Curated
names, and
summary
Jbrowse gene
image and
link(more
later)
Functional
Annotations
(GO and PO)
Detailed
annotation
view
16. Genome browsers
• SeqViewer: Developed at TAIR (https://seqviewer.arabidopsis.org/)
• GBrowse: GMOD genome browser (https://gbrowse.arabidopsis.org/cgi-
bin/gb2/gbrowse/arabidopsis/)
• JBrowse: GMOD genome browser
(https://jbrowse.arabidopsis.org/index.html?data=Araport11&loc=Chr1%3A215
34..32847&tracks=TAIR10_genome%2CAraport11_Loci%2CAraport11_gene_mo
dels%2CSALK_tDNAs&highlight=)
• Ported from Araport
• Corrected tracks
• Added new tracks
17.
18.
19.
20.
21.
22.
23. • ChIP-seq to survey the position
and abundance of various
epigenetic Histone markers as
well as RNAPII–Ser2P and the
hypoxia-induced ERFVII HRE2
(Ethylene Response Factor).
• ATAC-seq to map regions of
nucleosome-depleted
chromatin
• TRAP-Seq
• Higher the peak, more open
the chromatin in that region.
Integrative Analysis from the Epigenome to Translatome Uncovers Patterns of Dominant Nuclear Regulation during Transient
Stress; Travis A. Lee, Julia Bailey-Serres. https://doi.org/10.1105/tpc.19.00463
24.
25. Making your data FAIR
• FAIR is Findable, Accessible, Interoperable and Reusable
• ‘Tip sheet’ for before you publish: https://bit.ly/2vYROdb
• TAIR accepts data for gene function, phenotypes, nomenclature
• Data that can be displayed as tracks in JBrowse
• Got something small to share? Consider publishing in microPublications Biology
https://www.micropublication.org/
R
DATA and
METADATA
26. Highlights
• Updated NCBI BLAST with new graphical visualization
• BAR eFP view in locus page
• New links out to PhyloGenes in locus page
• JBrowse ported from Araport to TAIR (with fixes and enhancements)
• Decoupling from ABRC ordering
• New ways to share your data!
• And as always – updated gene function data, every week
• Upcoming Changes coming soon
• Enhancements to speed locus page loading
27. Stay connected, get help, share stuff…
• Contact us at curator@arabidopsis.org
• You tube channel (https://www.youtube.com/user/TAIRinfo/)
• Help documents (https://www.arabidopsis.org/help/index.jsp)
• Keep up with Social Media
• Twitter @tair_news
• Facebook (https://www.facebook.com/tairnews/)
New data added to TAIR in 2019 from manual curation of the literature. Plus tens of thousands of GO annotations
TAIR blast has been updated to running latest version of NCBI BLAST plus we have added the WuBLAST graphic after we retired WUBLAST. TAIR’s blast version contains unique datasets.
Curated data, BAR expression viewer, link our for more information, now link out to stocks no longer order via TAIR. Phylogenes
No longer ordering via TAIR: now stock links to ABRC directly or to search NASC/RIKEN
No longer ordering via TAIR: now stock links to ABRC directly or to search NASC/RIKEN
Bulk data retrieval using gene lists . Or if you want to retrieve data for the whole genome dataset there are quarterly releases. Public data releases are year old data.If you are a subscriber you can access the most current release of data. Also subscribers can request custom data sets. Here is an example of retrieving gene descriptions for a set of genes, or the whole genome set.
http://www.arabidopsis.org/tools/go_term_enrichment.jsp
Data Source: Nguyen, et al., Tomato Golden2-like transcription factors reveal molecular gradients that function during fruit development and ripening. ( https://doi.org/10.1105/tpc.113.118794)
Uses the most current GO annotation datasets –this is same Enrichment tool available from GO site, Other tools may use older annotation files which may be fine for some species but for Arabidopsis you are probably not getting the most accurate results.
Now lets look at some of the features and data in TAIR jbrowse. On the left you see a list of pre-set tracks and their respective categories. We have managed to recreate almost all the data tracks from Araport’s Jbrowse in TAIR.
Lets say you want to find your fav gene. You enter the name in the box and hit enter. click
The browser window will center to that gene. . On the top you see the positional information for the browser window.
Now lets say you want to compare gene models between different releases, you can line up the gene models from multiple releases, like I have done here with Araport11 and TAIR10 and you can see here the same region in tair 10 has one more gene model in Araport11.
Now that you found your gene, if you want to pull out the sequence of a gene and study it, Click.
Something that was nice in SeqViewer that you can do in Jbrowse. Right click to go to TAIR locus page or you can use seqlighter to pull the sequence out and then run various functions on the sequence or use it to design primers. For example, You can highlight intron-exon boundaries, or UTR, add flanking regions to the sequence and export the sequence for analysis with other tools. click
Jbrowse lets you view data of diff types. For example the tracks with the green bars show Intron/Exon Splice junctions predicted using TopHat. If you want to map RNA-seq reads on the Arabidopis you can do that. The bottom track you see here shows RNA_Seq data from SRA aligned to the TAIR10 genome using TopHat. Other track types include the ones for VISTA data from Phytozome.
Some data tracks in Araport stopped working recently and with help from a whole lot of folks we managed to revive these tracks on the TAIR Jbrowse instance. One of these are VISTA plots from Phytozome which show the syntenic regions. What you see are VISTA alignments of Thaliana genome to three species, including lyrate and Brassica rapa.
CDS: Skyblue
UTR:Lightblue
Intron: Wheat
Another Jbrowse feature that has been revived is the EPIC-CoGe function. This allows you to import and share data from CoGE and visualize in the context of Jbrowse.
This study looking at the impact of hypoxia on Arabidopsis seedlings and correlations with different types of epigenetic markeers. The group generated CHIP-Seq (methylation), ATAC-Seq ( Assay for Transposase-Accessible Chromatin to map accessible chromatin to assay transcript populations) and TRAP Seq (Translating Ribosome Affinity Purification , ) data which the group shared with us. Result is 41 new tracks that can be displayed to mine the data to look for patterns of regulation.
Jbrowse has a several graphical options for its tracks which can accommodate a wide variety of data types. For example The data you see here is from a recently published paper