Tools and Methods developed under the PhAnToMe (http://www.phantome.org) project between 2009-2012—and updated in 2015. The tools and database rely on the Subsystems Technology, the SEED (http://theseed.org) environment, and RAST server (http://rast.nmpdr.org)
This is the fifth such presentation; this year at the EMBO Viruses of Microbes 2016 Meeting, 21 July 2016 (http://events.embo.org/16-virus-microbe)
The Opera of Phantome - 2016 (presented at the EMBO Viruses of Microbes 2016 meeting)
1. The Opera of PhAnToMe 2016
Ramy K. Aziz (@azizrk)
July 21 2016
opus (LT) = work (Pl. opera)
SEED-based phage database (2009-2013-…)
Phage Genomics Workshop, VoM 2016
giantmicrobes.com
2. 21 July 2016
History
Phage Genomics - VoM 2016
NSF-funded, 3-year project (09-
12) to develop
Phage
Annotation
Tools and
Methods
Four Centers:
- SDSU, San Diego, CA
- VCU, Richmond, VA
- USF, St. Pete FL
- UA, Tucson, AZ
5. I. The Environment: SEED
http://theseed.org
21 July 2016
Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053
Phage Genomics - VoM 2016
7. SEED: Main concept
One genome
All genomes
21 July 2016 Phage Genomics - VoM 2016
“Subsystems-based technologies were developed in the SEED with the view that
the interpretation of one genome can be made more efficient and consistent if
hundreds of genomes are simultaneously annotated in one subsystem at a time”
8. SEED: Main concept
• Protein-based database
Jargon: PEG = protein-encoding gene
• The subsystems approach
and
• FIGfams: protein families based on
– sequence similarity
– chromosomal co-occurrence, gene order, synteny
– human curation, evidence-based expert
assertions
21 July 2016 Phage Genomics - VoM 2016
10. 21 July 2016
What is a subsystem?
• “A subset of functional roles studied across genomes”
• A spreadsheet where:
– each row represents a genome
– each column represents a functional role/ feature/ protein
– different patterns = variants
Function 1 Function 2 … Function n
Genome a
Genome b
…
Genome z
Phage Genomics - VoM 2016
13. The ToolBox: The RAST family
• (At least) Five ways to annotate a genome via RAST:
– RAST (http://rast.nmpdr.org)
• annotates online, saves your genome on server
– myRAST (local)
• uses the server but you can edit offline)
– “PhAST” (http://www.phantome.org/PhageSeed/Phage.cgi?page=phast)
• phage-optimized gene-calling
– Use your favorite gene caller then upload gbk file to RAST
– RASTtk (second-generation RAST)
• modular
• batch upload
21 July 2016 Phage Genomics - VoM 2016
20. The RASTtk Microbial Annotation Pipeline
FASTA QC
FASTA to
Genome TO Call rRNAs Call tRNAs
Call CDSs
Prodigal
Call CDSs
Glimmer3
Annotate
Proteins
K-mer v2
Annotate
Proteins
K-mer v1
Call CRISPRs CALL Phages
(PhiSpy)
Find Repeats Export
GenBank, GFF3,
Fasta
• Green boxes are alternative pipeline steps
• Dashed boxes are optional pipeline steps
21 July 2016 Phage Genomics - VoM 2016
21. In final development: phi-RASTtk
FASTA QC
FASTA to
Genome TO Call rRNAs Call tRNAs
Call CDSs
Prodigal
Call CDSs
GenMark
Annotate
Phage Proteins
Annotate
Proteins
K-mer v2
Find Repeats Find Toxins Export
GenBank, GFF3,
Fasta
• Green boxes are alternative pipeline steps
• Dashed boxes are optional pipeline steps
21 July 2016 Phage Genomics - VoM 2016
✔
✔ ?
25. What do you need
to annotate your genome?
• A sequenced genome
• Format: fasta or genbank (.gbk)
• A RAST username and password
• You can find some data to play with at
http://egybio.net/tutorial/
21 July 2016 Phage Genomics - VoM 2016
36. 2. Command-line RASTtk (Batch option)
• Where?
– On IRIS
(http://tutorial.theseed.org/services/docs/invoc
ation/Iris/iris.html)
– On your desktop (Download RASTtk)
21 July 2016 Phage Genomics - VoM 2016
37. 2. Command-line RASTtk (Batch option)
• How?
– The key is to convert each contigs file (fasta)
to so-called “Genome-typed object” GTO
– Once you have GTOs for all your genomes,
you can run a couple of commands to
annotate each of them OR put them in a
folder and annotate in batch
21 July 2016 Phage Genomics - VoM 2016
38. 2. Command-line RASTtk (Batch option)
• Commands:
o rast-create-genome
rast-create-genome --scientific-name ”Enterophage
Lambda" --genetic-code 11 --domain Virus --contigs
lambda.fasta > lambda.gto
o rast-process-genome
o rast-process-genome-batch
o rast-export-genome
Details on http://tutorial.theseed.org or follow
link from: http://egybio.net/tutorial
21 July 2016 Phage Genomics - VoM 2016
39. 2. Command-line RASTtk (Batch option)
• If you have 10 extra min, we can try a
quick run right after the session (it
takes about 10 min to annotate a bacterial
genome)
21 July 2016 Phage Genomics - VoM 2016
40. 3. Browse your favorite genome
21 July 2016 Phage Genomics - VoM 2016
41. 3. Browse your favorite genome
21 July 2016 Phage Genomics - VoM 2016
42. 4. Explore the protein page
• Annotation history
• Annotation clearinghouse
• Evidence
– similarities
– literature
21 July 2016 Phage Genomics - VoM 2016
43. 4. Explore the protein page
21 July 2016 Phage Genomics - VoM 2016
• Find your favorite protein
44. 4. Explore the protein page
21 July 2016 Phage Genomics - VoM 2016
• Find your favorite protein
45. 5. Align proteins (in context)
• Evidence> Similarities> Align
• Compare region, advanced settings
• Phylogenetic trees
21 July 2016 Phage Genomics - VoM 2016
46. 5. Align proteins (in context)
21 July 2016 Phage Genomics - VoM 2016
51. Prospects
• Another phage annotation summit?
– First summit (Jan 2011) was at Biosphere 2, Tucson,
AZ
– A second one?
• On a summit? (e.g., Bogotá? Mount Sinai?)
• Red Sea Resort in Egypt??
• Pushing for community annotation
– Undergraduate students (I have about 20 in training)
21 July 2016 Phage Genomics - VoM 2016
53. Acknowledgments
Robert A. Edwards, PhD
• RASTtk and PhiRAST development:
Ross Overbeek, Robert Olson, Jim Davis, Gordon
Pusch, Terry Disz, Bruce Parrello
• Phage annotators (Phantomers):
Bhakti Dwivedi, Mya Breitbart, et al.
• FIG and all SEED annotators:
VeronikaV, SvetaG, OlgaV/Z, et al.
21 July 2016
$$
Phage Genomics - VoM 2016
&
NSF
Katelyn McNair
54. If you use, please cite
• SEED, RAST, myRAST, phiRAST, PHAST:
– RAST: Aziz et al., BMC Genomics 2008
– SEED servers: Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053.
– Nucleic Acids Res. 2014 Jan;42(Database issue):D206-14
21 July 2016 Phage Genomics - VoM 2016