SlideShare a Scribd company logo
1 of 46
From Sequence to Knowledge:
The Art & Science of Phage
Genome Annotation
Ramy K. Aziz – Cairo University
From Sequence to
Knowledge:
PhAnToMe, RAST, and the
Ultimate Kropinski Toolkit
A helping hand through
The Annotation Bottleneck
Compiled by: Andrew Kropinski and Ramy Aziz
Online material
• Data & links:
– http://egybio.net/tutorial
• Slides
– http://bit.ly/annotation2016
– http://bit.ly/phantome4
– Old tutorials (more detailed, but missing latest ):
• Evergreen 2011: http://slidesha.re/phantome1
• http://slidesha.re/phiRAST1 (Karin)
• Evergreen 2013: http://bit.ly/phantome2
• Evergreen 2015: http://bit.ly/phantome3
21 July 2016 Phage Genomics - VoM 2016
INTRODUCTION
21 July 2016 Phage Genomics - VoM 2016
“The analysis bottleneck”
• Observation:
– We generate more data than we can analyze.
– We generate sequence data faster than
we can analyze them.
• Opinion:
– Bottlenecks are not
created equal!
– It is important to define the question(s)
before working on the answer(s)!
21 July 2016 Phage Genomics - VoM 2016
“The analysis bottleneck”
• The Lavigne paradox
21 July 2016 Phage Genomics - VoM 2016
“The analysis bottleneck”
• The Lavigne paradox
21 July 2016 Phage Genomics - VoM 2016
Quick group activity
Defining the question(s):
• How many of you have annotated a
genome?
• How many phage genomes have you
sequenced (or are in the process of
sequencing)?
a) None b) 1-5 c) 5-50 d) > 50
• What is the single most pressing question
you want to answer from genome analysis?
21 July 2016 Phage Genomics - VoM 2016
DEFINING THE QUESTION(S)
“Begin with the end in mind” (Covey, the 7 habits)
21 July 2016 Phage Genomics - VoM 2016
What You Want
The goal:
 complete
 accurate
Incomplete:
 genome
termini Faulty assembly
Frameshift
 chimeric
fragments21 July 2016 Phage Genomics - VoM 2016
A process of reconstruction
21 July 2016 Phage Genomics - VoM 2016
Annotation  Reconstruction
from genome from metagenome
21 July 2016 Phage Genomics - VoM 2016
Incomplete
frameshift
- complete
- accurate
Credit: Andrew Kropinski Credit: Bas Dutilh
faulty assembly
Annotation  Reconstruction
from genome from metagenome
21 July 2016
Incomplete faulty assembly
frameshift
- complete
- accurate
Phage Genomics - VoM 2016
Credit: Andrew Kropinski Credit: Bas Dutilh
A process of reconstruction
• Experimentally
DNA
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
21 July 2016 Phage Genomics - VoM 2016
A process of reconstruction
• Experimentally
• Computationally
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
21 July 2016 Phage Genomics - VoM 2016
DNA
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Countless tools
21 July 2016 Phage Genomics - VoM 2016
Authority figures
Andrew Kropinski Rob Lavigne
21 July 2016 Phage Genomics - VoM 2016
Rob Edwards
General outline
• Part I: The “Kropinski toolkit”
– Tools approved and recommended by Andrew
Kropinski (http://molbiol-tools.ca): from seq to pub
• Part II: SEED-based tools:
– The RAST family
– The PhAnToMe database/portal
21 July 2016 Phage Genomics - VoM 2016
The Kropinski Toolkit
21 July 2016 Phage Genomics - VoM 2016
What we want, according to Andrew
Well characterized genome, in which, ideally we
know:
 the location & function of all the genes
 the location of promoters & terminators
 the correct taxonomy
PstI PstI
20
21
22
23
24
25
26
26A
27 28 29
30
31
32
33
30.0 kb
Viruses; dsDNA viruses, no RNA stage; Caudovirales; Siphoviridae;
T1virus
21 July 2016 Phage Genomics - VoM 2016
Desired outcome: Create GenBank
submission
• Complete, accurate description of the
genome and its taxonomy
Good title
Desired outcome (2)
21 July 2016 Phage Genomics - VoM 2016
Desired outcome (3)
21 July 2016 Phage Genomics - VoM 2016
Desired outcome (4)
 Protein products of concern, particularly
for those interested in phage therapy:
 Integrases
 Toxins
PstI PstI
20
21
22
23
24
25
26
26A
27 28 29
30
31
32
33
30.0 kb
21 July 2016 Phage Genomics - VoM 2016
Processes and Steps
I. Primary analysis
(QC/ pre-annotation proofreading: e.g., orient with BLASTN)
II. Genome annotation
– Gene finding (ORF calling)
– Automated annotation
– Massaging (edition, functional assignment)
III. Second dimension (regulatory elements)
IV. Comparative genomics
V. Metadata
VI. Visualization
21 July 2016 Phage Genomics - VoM 2016
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
AUTOMATED ANNOTATION
II. Genome Annotation
21 July 2016 Phage Genomics - VoM 2016
RAST (subsystems-based tools)
• Will be the major focus of this short
tutorial…
• The goal is:
1. Quick demo how to use RAST
2. Quick preview batch annotation in RAST
3. Optimize RAST for phage annotation
4. Demonstrate & discuss how to improve
RAST output
21 July 2016 Phage Genomics - VoM 2016
RAST (subsystems-based tools)
• But,
before getting there …
21 July 2016 Phage Genomics - VoM 2016
The Kropinski wisdom
1. Always use more than one tool
2. Never blindly trust any automated (or manual)
process
3. Use your eyes and hands: visual inspection/
manual proofreading, re-annotation
– Every apparently complicated file is still editable on
your favorite text editor (e.g., NotePad)
4. If you don’t know a gene’s function (if you
can’t convince your grandma), better keep it
unnamed than contribute to error propagation
2 Aug 2015 Phage Genomics - Evergreen 2015
What do I call my gene product
(i.e. protein)?
 “phage hypothetical protein” – redundant
 “gp87” (gp = gene product)  hypothetical protein
 gp200 describes radically different proteins in
Listeria, Enterococcus, Mycobacterium,
Rhodococcus, Sphingomonas, Pseudomonas,
• Bacillus and Synechococcus phage genomes
 Add /note=“similar to gp43 of Escherichia coli
phage T4”
21 July 2016 Phage Genomics - VoM 2016
What do I call my gene product
(i.e. protein)?
 /product=“UboA”; “NrdA”; “hypothetical protein
SA5_0153/152”; “ORF184” (as bad as gp184); “RNAP1”;
"32 kDa protein”
 Bad because they don`t mean anything to the casual (or
informed) reader.
 Unless you are a bioinformatician or biostatistician be
conservative in recording “hits.” Could you convince your
grandmda?, if not list as a “hypothetical protein” but do take
a stand “putative DNA polymerase” is cowardly
21 July 2016 Phage Genomics - VoM 2016
Nomenclature Sins
 hypothetical protein  DNA polymerase with no
or poor quality evidence is far worse than:
 DNA polymerase  hypothetical protein
 Be cautious about using BLASTP hits in naming
gps – is there additional evidence to back the
designation up
21 July 2016 Phage Genomics - VoM 2016
Consistent Nomenclature
 All of these describe homologs of the
product of the coliphage T4 rIIA gene!
rIIA protector from prophage-induced early lysis
protector from prophage-induced early lysis
protector from prophage-induced early lysis rIIA
membrane-associated affects host membrane ATPase
rIIA membrane-associated affects host membrane ATPase
phage rIIA lysis inhibitor
rIIA protector
rIIA
rIIA protein
membrane integrity protector
hypothetical protein
unnamed protein product !!!!!!
protein of unknown function
21 July 2016 Phage Genomics - VoM 2016
Bottom line:
Manual vs. Automated
• “Turtles know the road better than
rabbits… ” Khalil Gibran
• “… but they may never reach the end!”
• The best approach?
– Human expert-based annotation
2 Aug 2015 Phage Genomics - Evergreen 2015
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
IV. COMPARATIVE GENOMICS
Genomic pairwise comparisons
 EMBOSS Stretcher:http://emboss.bioinformatics.nl/cgi-
bin/emboss/stretcher N.B. genomes must be collinear
 BLASTN - NCBI
 ANI (Average Nucleotide Identity):http://enve-
omics.ce.gatech.edu/ani/
 GGDC 2.0 (Genome to Genome Distance Calculator):
http://ggdc.dsmz.de/distcalc2.php
 jSpeciesWS –
ANI:http://jspecies.ribohost.com/jspeciesws/
Proteomic pairwise
comparisons
 CoreGenes –
(http://binf.gmu.edu:8080/CoreGenes3.0/)
 TBLASTX
 Remember protein sequence is more conserved
than DNA sequence; probably useful for more
distant relationships
VI. “POLISH” IT TO PUBLISH IT
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Servers & software
 BLAST Ring Image Generator (http://brig.sourceforge.net)
 CGView (http://wishart.biology.ualberta.ca/cgview)
 CGView Comparison Tool:
http://stothard.afns.ualberta.ca/downloads/CCT
 Circos (http://circos.ca)
 DNAPlotter:
(http://www.sanger.ac.uk/science/tools/dnaplotter)
 Easyfig (http://easyfig.sourceforge.net)
 GenomeVx (http://wolfe.ucd.ie/GenomeVx)
 GView Server (https://server.gview.ca)
 progressiveMauve and ACT
EasyFig
CGView Comparison Tool
BLAST Ring Image Generator

More Related Content

Similar to From Sequence to Knowledge: The Art and Science of Phage Genome Annotation

An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)Ramy K. Aziz
 
From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)Ramy K. Aziz
 
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...Ramy K. Aziz
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Torsten Seemann
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Ramy K. Aziz
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebaseKew Sama
 
Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016solgenomics
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingStephen Turner
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsJuan Antonio Vizcaino
 
Lab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxLab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxkarlos64
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxPRIYANKAZALA9
 
New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...QIAGEN
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree..."The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...Ramy K. Aziz
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGenomeInABottle
 

Similar to From Sequence to Knowledge: The Art and Science of Phage Genome Annotation (20)

An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
 
From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)
 
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
Pride cluster presentation
Pride cluster presentation Pride cluster presentation
Pride cluster presentation
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasets
 
Lab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxLab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptx
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
 
New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree..."The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 

More from Ramy K. Aziz

An introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationAn introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationRamy K. Aziz
 
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...Ramy K. Aziz
 
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...Ramy K. Aziz
 
Giving and Receiving Feedback
Giving and Receiving FeedbackGiving and Receiving Feedback
Giving and Receiving FeedbackRamy K. Aziz
 
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011Ramy K. Aziz
 
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Ramy K. Aziz
 
If the dead bacteria could speak
If the dead bacteria could speakIf the dead bacteria could speak
If the dead bacteria could speakRamy K. Aziz
 

More from Ramy K. Aziz (9)

An introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationAn introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotation
 
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
 
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
 
Giving and Receiving Feedback
Giving and Receiving FeedbackGiving and Receiving Feedback
Giving and Receiving Feedback
 
FootballOmics
FootballOmicsFootballOmics
FootballOmics
 
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
 
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
 
If the dead bacteria could speak
If the dead bacteria could speakIf the dead bacteria could speak
If the dead bacteria could speak
 
Rka nxt 2010_web
Rka nxt 2010_webRka nxt 2010_web
Rka nxt 2010_web
 

Recently uploaded

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 

Recently uploaded (20)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 

From Sequence to Knowledge: The Art and Science of Phage Genome Annotation

  • 1. From Sequence to Knowledge: The Art & Science of Phage Genome Annotation Ramy K. Aziz – Cairo University
  • 2. From Sequence to Knowledge: PhAnToMe, RAST, and the Ultimate Kropinski Toolkit A helping hand through The Annotation Bottleneck Compiled by: Andrew Kropinski and Ramy Aziz
  • 3. Online material • Data & links: – http://egybio.net/tutorial • Slides – http://bit.ly/annotation2016 – http://bit.ly/phantome4 – Old tutorials (more detailed, but missing latest ): • Evergreen 2011: http://slidesha.re/phantome1 • http://slidesha.re/phiRAST1 (Karin) • Evergreen 2013: http://bit.ly/phantome2 • Evergreen 2015: http://bit.ly/phantome3 21 July 2016 Phage Genomics - VoM 2016
  • 4. INTRODUCTION 21 July 2016 Phage Genomics - VoM 2016
  • 5. “The analysis bottleneck” • Observation: – We generate more data than we can analyze. – We generate sequence data faster than we can analyze them. • Opinion: – Bottlenecks are not created equal! – It is important to define the question(s) before working on the answer(s)! 21 July 2016 Phage Genomics - VoM 2016
  • 6. “The analysis bottleneck” • The Lavigne paradox 21 July 2016 Phage Genomics - VoM 2016
  • 7. “The analysis bottleneck” • The Lavigne paradox 21 July 2016 Phage Genomics - VoM 2016
  • 8. Quick group activity Defining the question(s): • How many of you have annotated a genome? • How many phage genomes have you sequenced (or are in the process of sequencing)? a) None b) 1-5 c) 5-50 d) > 50 • What is the single most pressing question you want to answer from genome analysis? 21 July 2016 Phage Genomics - VoM 2016
  • 9. DEFINING THE QUESTION(S) “Begin with the end in mind” (Covey, the 7 habits) 21 July 2016 Phage Genomics - VoM 2016
  • 10. What You Want The goal:  complete  accurate Incomplete:  genome termini Faulty assembly Frameshift  chimeric fragments21 July 2016 Phage Genomics - VoM 2016
  • 11. A process of reconstruction 21 July 2016 Phage Genomics - VoM 2016
  • 12. Annotation  Reconstruction from genome from metagenome 21 July 2016 Phage Genomics - VoM 2016 Incomplete frameshift - complete - accurate Credit: Andrew Kropinski Credit: Bas Dutilh faulty assembly
  • 13. Annotation  Reconstruction from genome from metagenome 21 July 2016 Incomplete faulty assembly frameshift - complete - accurate Phage Genomics - VoM 2016 Credit: Andrew Kropinski Credit: Bas Dutilh
  • 14. A process of reconstruction • Experimentally DNA TGATTGTGTGTTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG TGATTGGTCTNNNTCTCTTGCGCAATGCG 21 July 2016 Phage Genomics - VoM 2016
  • 15. A process of reconstruction • Experimentally • Computationally TGATTGTGTGTTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG TGATTGGTCTNNNTCTCTTGCGCAATGCG 21 July 2016 Phage Genomics - VoM 2016 DNA TGATTGTGTGTTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG TGATTGGTCTNNNTCTCTTGCGCAATGCG
  • 16. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 17. Countless tools 21 July 2016 Phage Genomics - VoM 2016
  • 18. Authority figures Andrew Kropinski Rob Lavigne 21 July 2016 Phage Genomics - VoM 2016 Rob Edwards
  • 19. General outline • Part I: The “Kropinski toolkit” – Tools approved and recommended by Andrew Kropinski (http://molbiol-tools.ca): from seq to pub • Part II: SEED-based tools: – The RAST family – The PhAnToMe database/portal 21 July 2016 Phage Genomics - VoM 2016
  • 20. The Kropinski Toolkit 21 July 2016 Phage Genomics - VoM 2016
  • 21. What we want, according to Andrew Well characterized genome, in which, ideally we know:  the location & function of all the genes  the location of promoters & terminators  the correct taxonomy PstI PstI 20 21 22 23 24 25 26 26A 27 28 29 30 31 32 33 30.0 kb Viruses; dsDNA viruses, no RNA stage; Caudovirales; Siphoviridae; T1virus 21 July 2016 Phage Genomics - VoM 2016
  • 22. Desired outcome: Create GenBank submission • Complete, accurate description of the genome and its taxonomy Good title
  • 23. Desired outcome (2) 21 July 2016 Phage Genomics - VoM 2016
  • 24. Desired outcome (3) 21 July 2016 Phage Genomics - VoM 2016
  • 25. Desired outcome (4)  Protein products of concern, particularly for those interested in phage therapy:  Integrases  Toxins PstI PstI 20 21 22 23 24 25 26 26A 27 28 29 30 31 32 33 30.0 kb 21 July 2016 Phage Genomics - VoM 2016
  • 26. Processes and Steps I. Primary analysis (QC/ pre-annotation proofreading: e.g., orient with BLASTN) II. Genome annotation – Gene finding (ORF calling) – Automated annotation – Massaging (edition, functional assignment) III. Second dimension (regulatory elements) IV. Comparative genomics V. Metadata VI. Visualization 21 July 2016 Phage Genomics - VoM 2016
  • 27. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 28. AUTOMATED ANNOTATION II. Genome Annotation 21 July 2016 Phage Genomics - VoM 2016
  • 29. RAST (subsystems-based tools) • Will be the major focus of this short tutorial… • The goal is: 1. Quick demo how to use RAST 2. Quick preview batch annotation in RAST 3. Optimize RAST for phage annotation 4. Demonstrate & discuss how to improve RAST output 21 July 2016 Phage Genomics - VoM 2016
  • 30. RAST (subsystems-based tools) • But, before getting there … 21 July 2016 Phage Genomics - VoM 2016
  • 31. The Kropinski wisdom 1. Always use more than one tool 2. Never blindly trust any automated (or manual) process 3. Use your eyes and hands: visual inspection/ manual proofreading, re-annotation – Every apparently complicated file is still editable on your favorite text editor (e.g., NotePad) 4. If you don’t know a gene’s function (if you can’t convince your grandma), better keep it unnamed than contribute to error propagation 2 Aug 2015 Phage Genomics - Evergreen 2015
  • 32. What do I call my gene product (i.e. protein)?  “phage hypothetical protein” – redundant  “gp87” (gp = gene product)  hypothetical protein  gp200 describes radically different proteins in Listeria, Enterococcus, Mycobacterium, Rhodococcus, Sphingomonas, Pseudomonas, • Bacillus and Synechococcus phage genomes  Add /note=“similar to gp43 of Escherichia coli phage T4” 21 July 2016 Phage Genomics - VoM 2016
  • 33. What do I call my gene product (i.e. protein)?  /product=“UboA”; “NrdA”; “hypothetical protein SA5_0153/152”; “ORF184” (as bad as gp184); “RNAP1”; "32 kDa protein”  Bad because they don`t mean anything to the casual (or informed) reader.  Unless you are a bioinformatician or biostatistician be conservative in recording “hits.” Could you convince your grandmda?, if not list as a “hypothetical protein” but do take a stand “putative DNA polymerase” is cowardly 21 July 2016 Phage Genomics - VoM 2016
  • 34. Nomenclature Sins  hypothetical protein  DNA polymerase with no or poor quality evidence is far worse than:  DNA polymerase  hypothetical protein  Be cautious about using BLASTP hits in naming gps – is there additional evidence to back the designation up 21 July 2016 Phage Genomics - VoM 2016
  • 35. Consistent Nomenclature  All of these describe homologs of the product of the coliphage T4 rIIA gene! rIIA protector from prophage-induced early lysis protector from prophage-induced early lysis protector from prophage-induced early lysis rIIA membrane-associated affects host membrane ATPase rIIA membrane-associated affects host membrane ATPase phage rIIA lysis inhibitor rIIA protector rIIA rIIA protein membrane integrity protector hypothetical protein unnamed protein product !!!!!! protein of unknown function 21 July 2016 Phage Genomics - VoM 2016
  • 36. Bottom line: Manual vs. Automated • “Turtles know the road better than rabbits… ” Khalil Gibran • “… but they may never reach the end!” • The best approach? – Human expert-based annotation 2 Aug 2015 Phage Genomics - Evergreen 2015
  • 37. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 39. Genomic pairwise comparisons  EMBOSS Stretcher:http://emboss.bioinformatics.nl/cgi- bin/emboss/stretcher N.B. genomes must be collinear  BLASTN - NCBI  ANI (Average Nucleotide Identity):http://enve- omics.ce.gatech.edu/ani/  GGDC 2.0 (Genome to Genome Distance Calculator): http://ggdc.dsmz.de/distcalc2.php  jSpeciesWS – ANI:http://jspecies.ribohost.com/jspeciesws/
  • 40. Proteomic pairwise comparisons  CoreGenes – (http://binf.gmu.edu:8080/CoreGenes3.0/)  TBLASTX  Remember protein sequence is more conserved than DNA sequence; probably useful for more distant relationships
  • 41. VI. “POLISH” IT TO PUBLISH IT
  • 42. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 43. Servers & software  BLAST Ring Image Generator (http://brig.sourceforge.net)  CGView (http://wishart.biology.ualberta.ca/cgview)  CGView Comparison Tool: http://stothard.afns.ualberta.ca/downloads/CCT  Circos (http://circos.ca)  DNAPlotter: (http://www.sanger.ac.uk/science/tools/dnaplotter)  Easyfig (http://easyfig.sourceforge.net)  GenomeVx (http://wolfe.ucd.ie/GenomeVx)  GView Server (https://server.gview.ca)  progressiveMauve and ACT
  • 46. BLAST Ring Image Generator

Editor's Notes

  1. Gp200 from Pseudomonas phage 201phi2-1 is related to phiKZ gp120 and EL gp78
  2. "Shifting the genomic gold standard for the prokaryotic species definition" Michael Richter and Ramon Rosselló-Móra. PNAS vol. 106 no. 45 pg 19126–19131, doi: 10.1073/pnas.0906412106 JSpeciesWS is a quick and easy to use online service to measure the probability if two or more (draft) genomes belong to the same species or not by pairwise comparison of (1) their Average Nucleotide Identity (ANI) and/or (2) correlation indexes of their Tetra-nucleotide signatures.
  3. Star - online