The Opera of PhAnToMe
Ramy K. Aziz (Twitter: @azizrk)
Aug 04 2013
opus (LT) = work (Pl. opera)
The environment, the toolbo...
08/04/2013
Past,
Phage Genomics - Evergreen 2013
NSF-funded, 3-year project (09-
12) to develop
Phage
Annotation
Tools and...
http://www.phantome.org
08/04/2013
… present, ...
Phage Genomics - Evergreen 2013
TBA
08/04/2013
… and future
Phage Genomics - Evergreen 2013
Aims
• Direct
– Discuss the concepts behind RAST
– Quickly preview several tools developed under (or
under influence of) t...
Outline
• The environment (the SEED)
– The SEED and the „Subsystems Technology‟
• The toolbox (PhAnToMe and sequels)
– PHA...
I. THE ENVIRONMENT
The Opera of PhAnToMe
Phage Genomics - Evergreen 2013
I. The Environment: SEED
http://theseed.org
08/04/2013
Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal...
SEED: Main concept
One genome
All genomes
08/04/2013 Phage Genomics - Evergreen 2013
SEED: Main concept
One genome
All genomes
08/04/2013 Phage Genomics - Evergreen 2013
“Subsystems-based technologies were d...
SEED: Main concept
• Protein-based database
Jargon: PEG = protein-encoding gene
• The subsystems approach
and
• FIGfams: p...
RAST: automated annotation
08/04/2013 Phage Genomics - Evergreen 2013
08/04/2013
What is a subsystem?
• “A subset of functional roles studied across genomes”
• A spreadsheet where:
– each row ...
08/04/2013
What is a subsystem?
Phage Genomics - Evergreen 2013
Advantages of subsystems
Subsystems-
based
annotation
08/04/2013 Phage Genomics - Evergreen 2013
Annotation  Reconstruction
from genome from metagenome
08/04/2013 Phage Genomics - Evergreen 2013
Incomplete
frameshift
-...
Annotation  Reconstruction
from genome from metagenome
08/04/2013
Incomplete faulty assembly
frameshift
- complete
- accu...
II. THE TOOLBOX
The Opera of PhAnToMe
Phage Genomics - Evergreen 2013
II. PhAnToMeToolBox
http://www.phantome.org
08/04/2013 Phage Genomics - Evergreen 2013
II. The ToolBox: RAST
• (At least) Four ways to annotate a genome via
RAST:
– myRAST (local)
• uses the server but you can...
http://rast.nmpdr.org
08/04/2013 Phage Genomics - Evergreen 2013
phiRAST complaints
• ORF/Gene calling
• tRNA
– bug fixed, but still follow Andrew‟s advice
• Too many hypotheticals, etc.
...
“PhAST”: some improvement?
08/04/2013 Phage Genomics - Evergreen 2013
“PhAST”: some improvement?
08/04/2013 Phage Genomics - Evergreen 2013
PHAST: Disambiguation
08/04/2013 Phage Genomics - Evergreen 2013
Other tools
• PHACTS:
– classifies and predicts lifestyle
• PhiSpy:
– finds prophages
• iVireons
– predicts phage structur...
II. The ToolBox: PHACTS
• PHAge Classification Tool Set
• Uses a novel similarity algorithm and a supervised
Random Forest...
PHACTS
• Out of the 227 phages with a
known lifestyle, PHACTS was
able to confidently and
correctly calculate the
lifestyl...
PHACTS
• http://www.phantome.org/PHACTS/
• Other applications
• Host prediction: whether a phage infects a Gram
positive o...
PHACTS
08/04/2013 Phage Genomics - Evergreen 2013
Kate McNair
II. The ToolBox: PhiSpy
Calculate genomic
characteristics
Classify
prophage region
Evaluate predicted
prophages
• Transcri...
PhiSpy
• Performance comparison in 50 complete
bacterial genomes
Applications %Identified %FN %FP
Prophinder 89% 11% 12%
P...
• Download: PhiSpy
– http://sourceforge.net/projects/phispy
• PhiSpy is on Kbase
– http://kbase.science.energy.gov
• Web v...
iVIREONS – http://vdm.sdsu.edu/ivireons
Victor Seguritan
Victor Seguritan
Application of
Artificial Neural Networks (ANNs)
to Viral Dark Matter
Viral Hypothetical
Protein Sequence...
“FAMILIES” OF ANNs
1) General structural proteins:
2) Phage major capsid proteins
3) Phage tail/tail fibers/collar etc.
4)...
1
iVIREONS – http://vdm.sdsu.edu/ivireons
2 Enter User Info
Vibrio
Phage
virus@microsoft.com
DHS
3 Upload Sequences
Victor...
4 View Results
5 Copy Results to a Spreadsheet
iVIREONS – http://vdm.sdsu.edu/ivireons
-Structural 1:1
-MCP 1:1
-MCP 2:1
-...
III. THE COMMUNITY
The Opera of PhAnToMe
Phage Genomics - Evergreen 2013
SEED allows continuous annotation
08/04/2013
SEED
RAST
GenomesSubsystems
SEED Viewer
New Genomes
Subsystems
Editor
Phage G...
SEED allows community annotation
08/04/2013 Phage Genomics - Evergreen 2013
Later in the meeting,
• Who might be interested in putting
together:
a) an outline for an annotation jamboree/
workshop wi...
POST SCRIPTUM
The Opera of PhAnToMe
08/04/2013 Phage Genomics - Evergreen 2013
Aims
• Direct
– Discuss the concepts behind RAST
– Quickly preview several tools developed under (or
under influence of) t...
If you use, please cite
• SEED, RAST, myRAST, phiRAST, PHAST:
– RAST, BMC Genomics 2008 and SEED servers: PLoS ONE 2011
• ...
Acknowledgments
Robert A. Edwards, PhD
• PhiRAST development:
Ross Overbeek, Robert Olson, Gordon Pusch, Terry
Disz, Bruce...
$$
&
NSF
Acknowledgments
• PHAST
Victor Seguritan
08/04/2013
Katelyn McNair
• iVireons
Phage Genomics - Evergreen 2013
Upcoming SlideShare
Loading in …5
×

"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

34,125 views

Published on

Tools and Methods developed under the PhAnToMe (http://www.phantome.org) project between 2009-2012 using the Subsystems Technology, the SEED (http://theseed.org) environment, and RAST server (http://rast.nmpdr.org)

Third presentation at the Phage Genomics Workshop at the 20th Biennial Evergreen International Phage Meeting

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
34,125
On SlideShare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • This is the overview of phispy. IT has three steps. First is the calculation of genomic characteristics. Then classification and finally evaluation step.Now inphispy, as we considered to find prophages without depending on sequence similarity, we have quantifyed five genomic characteristics which are capable to distinguish phage region w/o homology to known phage gene. In next couple of slides I am going to explain each of these characteristics.
  • We have calculated prophages on 50 bacterial genomes and manually checked the predictions. We also compare our predictions with 2 other popular prophage finding tool prophinder and phage_phinder and from this table we see that phispy can predict the maximum number of prophages
  • So using kbase, we predicted prophages on about a large set of bacterial genomes using phiSpy.About 1100 genomes have no prophages
  • "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage Meeting

    1. 1. The Opera of PhAnToMe Ramy K. Aziz (Twitter: @azizrk) Aug 04 2013 opus (LT) = work (Pl. opera) The environment, the toolbox, and the community Phage Genomics Workshop, Evergreen 2013
    2. 2. 08/04/2013 Past, Phage Genomics - Evergreen 2013 NSF-funded, 3-year project (09- 12) to develop Phage Annotation Tools and Methods Four Centers: - SDSU, San Diego, CA - VCU, Richmond, VA - USF, St. Pete FL - UA, Tucson, AZ
    3. 3. http://www.phantome.org 08/04/2013 … present, ... Phage Genomics - Evergreen 2013
    4. 4. TBA 08/04/2013 … and future Phage Genomics - Evergreen 2013
    5. 5. Aims • Direct – Discuss the concepts behind RAST – Quickly preview several tools developed under (or under influence of) the PhAnToMe project – Demonstrate online, community annotation using SEED • Indirect {hidden agenda ;)} – PhAnToMe 2.0? – Establish community annotation efforts/ crowdsourcing – Seek Funding? Crowdfunding? 08/04/2013 Phage Genomics - Evergreen 2013
    6. 6. Outline • The environment (the SEED) – The SEED and the „Subsystems Technology‟ • The toolbox (PhAnToMe and sequels) – PHAST and RAST – PhACTS – PhiSPy – iVireons • The community – Online annotation process – Annotation jamboree(s) – Course design 08/04/2013 Phage Genomics - Evergreen 2013 $$ Writing proposals, applying for grants
    7. 7. I. THE ENVIRONMENT The Opera of PhAnToMe Phage Genomics - Evergreen 2013
    8. 8. I. The Environment: SEED http://theseed.org 08/04/2013 Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053 Phage Genomics - Evergreen 2013
    9. 9. SEED: Main concept One genome All genomes 08/04/2013 Phage Genomics - Evergreen 2013
    10. 10. SEED: Main concept One genome All genomes 08/04/2013 Phage Genomics - Evergreen 2013 “Subsystems-based technologies were developed in the SEED with the view that the interpretation of one genome can be made more efficient and consistent if hundreds of genomes are simultaneously annotated in one subsystem at a time” Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053
    11. 11. SEED: Main concept • Protein-based database Jargon: PEG = protein-encoding gene • The subsystems approach and • FIGfams: protein families based on – sequence similarity – chromosomal co-occurrence, gene order, synteny – human curation, evidence-based expert assertions 08/04/2013 Phage Genomics - Evergreen 2013
    12. 12. RAST: automated annotation 08/04/2013 Phage Genomics - Evergreen 2013
    13. 13. 08/04/2013 What is a subsystem? • “A subset of functional roles studied across genomes” • A spreadsheet where: – each row represents a genome – each column represents a functional role/ feature/ protein – different patterns = variants Function 1 Function 2 … Function n Genome a Genome b … Genome z Phage Genomics - Evergreen 2013
    14. 14. 08/04/2013 What is a subsystem? Phage Genomics - Evergreen 2013
    15. 15. Advantages of subsystems Subsystems- based annotation 08/04/2013 Phage Genomics - Evergreen 2013
    16. 16. Annotation  Reconstruction from genome from metagenome 08/04/2013 Phage Genomics - Evergreen 2013 Incomplete frameshift - complete - accurate Credit: Andrew Kropinski Credit: Bas Dutilh faulty assembly
    17. 17. Annotation  Reconstruction from genome from metagenome 08/04/2013 Incomplete faulty assembly frameshift - complete - accurate Phage Genomics - Evergreen 2013 Credit: Andrew Kropinski Credit: Bas Dutilh
    18. 18. II. THE TOOLBOX The Opera of PhAnToMe Phage Genomics - Evergreen 2013
    19. 19. II. PhAnToMeToolBox http://www.phantome.org 08/04/2013 Phage Genomics - Evergreen 2013
    20. 20. II. The ToolBox: RAST • (At least) Four ways to annotate a genome via RAST: – myRAST (local) • uses the server but you can edit offline) – RAST (http://rast.nmpdr.org) • annotates online, saves your genome on server – “PhAST” (http://www.phantome.org/PhageSeed/Phage.cgi?page=phast) • optimized gene-calling – Use your favorite gene caller then upload gbk file to RAST 08/04/2013 Phage Genomics - Evergreen 2013
    21. 21. http://rast.nmpdr.org 08/04/2013 Phage Genomics - Evergreen 2013
    22. 22. phiRAST complaints • ORF/Gene calling • tRNA – bug fixed, but still follow Andrew‟s advice • Too many hypotheticals, etc. – manual annotation, see later – need for expert annotations, community contribution – funding 08/04/2013 Phage Genomics - Evergreen 2013
    23. 23. “PhAST”: some improvement? 08/04/2013 Phage Genomics - Evergreen 2013
    24. 24. “PhAST”: some improvement? 08/04/2013 Phage Genomics - Evergreen 2013
    25. 25. PHAST: Disambiguation 08/04/2013 Phage Genomics - Evergreen 2013
    26. 26. Other tools • PHACTS: – classifies and predicts lifestyle • PhiSpy: – finds prophages • iVireons – predicts phage structural proteins, holins, more to come 08/04/2013 Phage Genomics - Evergreen 2013
    27. 27. II. The ToolBox: PHACTS • PHAge Classification Tool Set • Uses a novel similarity algorithm and a supervised Random Forest classifier to predict whether the lifestyle of a phage, described by its proteome, is virulent or temperate. • The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage. • PHACTS predictions have had a 99% precision rate. 08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair
    28. 28. PHACTS • Out of the 227 phages with a known lifestyle, PHACTS was able to confidently and correctly calculate the lifestyle of 197 phages. • Only 2 phages were predicted confidently wrong: The two phages that were confidently incorrectly classified were both virulent phages that contained a functional integrase 08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair
    29. 29. PHACTS • http://www.phantome.org/PHACTS/ • Other applications • Host prediction: whether a phage infects a Gram positive or Gram negative bacteria • Taxonomy prediction: a phage‟s Family 08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair
    30. 30. PHACTS 08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair
    31. 31. II. The ToolBox: PhiSpy Calculate genomic characteristics Classify prophage region Evaluate predicted prophages • Transcriptional Strand Orientation •Customized AT skew • Customized GC skew •Protein length •Abundance of Phage words •Random Forest •Pre calculated training genome •Input bacterial genome • Produce a rank for each gene • Phage insertion points • Similarity of phage proteins 08/04/2013 Phage Genomics - Evergreen 2013 SajiaAkhter
    32. 32. PhiSpy • Performance comparison in 50 complete bacterial genomes Applications %Identified %FN %FP Prophinder 89% 11% 12% Phage_finder 82% 18% 1.33% PhiSpy 94% 6% 0.66% 08/04/2013 Phage Genomics - Evergreen 2013 SajiaAkhter
    33. 33. • Download: PhiSpy – http://sourceforge.net/projects/phispy • PhiSpy is on Kbase – http://kbase.science.energy.gov • Web version under final development • Ran PhiSpy on 4,335 bacterial genomes • Predicted 12,826prophages in 3,203 genomes – 9,101 known prophages – 3,723 undefined prophages 08/04/2013 Phage Genomics - Evergreen 2013 PhiSpy SajiaAkhter
    34. 34. iVIREONS – http://vdm.sdsu.edu/ivireons Victor Seguritan
    35. 35. Victor Seguritan Application of Artificial Neural Networks (ANNs) to Viral Dark Matter Viral Hypothetical Protein Sequences Known eval <= 0.001 Conserved Domain DB (rpsblast) Keep sequences ≥ 200 aa no hit OR e-value > 0.001 no hit OR e-value > 0.001 eval <= 0.001 Reference Sequence DB (tblastp) Artificial Neural Networks (ANNs) Remove ≥ 80% identical sequences Synthesize ANN-predicted Hypothetical Protein Genes Clone in E.coli Purification By Cobalt Affinity Validation by TEM or X-ray Crystallography 08/04/2013 Phage Genomics - Evergreen 2013
    36. 36. “FAMILIES” OF ANNs 1) General structural proteins: 2) Phage major capsid proteins 3) Phage tail/tail fibers/collar etc. 4) Holins 5) Portals • Trained with all types of proteins • Both phages & viruses 08/04/2013 Phage Genomics - Evergreen 2013 Victor Seguritan
    37. 37. 1 iVIREONS – http://vdm.sdsu.edu/ivireons 2 Enter User Info Vibrio Phage virus@microsoft.com DHS 3 Upload Sequences Victor Seguritan
    38. 38. 4 View Results 5 Copy Results to a Spreadsheet iVIREONS – http://vdm.sdsu.edu/ivireons -Structural 1:1 -MCP 1:1 -MCP 2:1 -MCP 3:1 -MCP 4:1 -MCP 7:1 -MCP 22:1 (lambda) -Tail 1:1 -Tail 2:1 -Tail 4:1 -Tail 7:1 -Tail 6.6:1 (lambda) Stringencies Reported 08/04/2013 Phage Genomics - Evergreen 2013
    39. 39. III. THE COMMUNITY The Opera of PhAnToMe Phage Genomics - Evergreen 2013
    40. 40. SEED allows continuous annotation 08/04/2013 SEED RAST GenomesSubsystems SEED Viewer New Genomes Subsystems Editor Phage Genomics - Evergreen 2013
    41. 41. SEED allows community annotation 08/04/2013 Phage Genomics - Evergreen 2013
    42. 42. Later in the meeting, • Who might be interested in putting together: a) an outline for an annotation jamboree/ workshop with phage experts b) a syllabus/outline for a course to get undergraduate/graduate students to annotate specific subsystems c) a proposal to get funding for community annotation efforts d) all above 08/04/2013 Phage Genomics - Evergreen 2013
    43. 43. POST SCRIPTUM The Opera of PhAnToMe 08/04/2013 Phage Genomics - Evergreen 2013
    44. 44. Aims • Direct – Discuss the concepts behind RAST – Quickly preview several tools developed under (or under influence of) the PhAnToMe project – Demonstrate online, community annotation using SEED • Indirect {hidden agenda ;)} – PhAnToMe 2.0? – Establish community annotation efforts/ crowdsourcing – Seek Funding? Crowdfunding? 08/04/2013 Phage Genomics - Evergreen 2013
    45. 45. If you use, please cite • SEED, RAST, myRAST, phiRAST, PHAST: – RAST, BMC Genomics 2008 and SEED servers: PLoS ONE 2011 • Other tools – PHAST: McNair et al. PMID: 22238260; PhiSpy: Akhter et al. PMID: 22584627; iVireons: Seguritan et al. PMID: 22927809 • Letters of support 08/04/2013 Phage Genomics - Evergreen 2013
    46. 46. Acknowledgments Robert A. Edwards, PhD • PhiRAST development: Ross Overbeek, Robert Olson, Gordon Pusch, Terry Disz, Bruce Parrello • Phage annotators (Phantomers): BhaktiDwivedi, MyaBreitbart, et al. • FIG and all SEED annotators: VeronikaV, SvetaG, OlgaV/Z, et al. SajiaAkhter 08/04/2013 $$ Phage Genomics - Evergreen 2013 & NSF
    47. 47. $$ & NSF Acknowledgments • PHAST Victor Seguritan 08/04/2013 Katelyn McNair • iVireons Phage Genomics - Evergreen 2013

    ×