SlideShare a Scribd company logo
From Sequence to Knowledge:
The Art & Science of Phage
Genome Annotation
Ramy K. Aziz – Cairo University
From Sequence to
Knowledge:
PhAnToMe, RAST, and the
Ultimate Kropinski Toolkit
A helping hand through
The Annotation Bottleneck
Compiled by: Andrew Kropinski and Ramy Aziz
Online material
• Data & links:
– http://egybio.net/tutorial
• Slides
– http://bit.ly/annotation2016
– http://bit.ly/phantome4
– Old tutorials (more detailed, but missing latest ):
• Evergreen 2011: http://slidesha.re/phantome1
• http://slidesha.re/phiRAST1 (Karin)
• Evergreen 2013: http://bit.ly/phantome2
• Evergreen 2015: http://bit.ly/phantome3
21 July 2016 Phage Genomics - VoM 2016
INTRODUCTION
21 July 2016 Phage Genomics - VoM 2016
“The analysis bottleneck”
• Observation:
– We generate more data than we can analyze.
– We generate sequence data faster than
we can analyze them.
• Opinion:
– Bottlenecks are not
created equal!
– It is important to define the question(s)
before working on the answer(s)!
21 July 2016 Phage Genomics - VoM 2016
“The analysis bottleneck”
• The Lavigne paradox
21 July 2016 Phage Genomics - VoM 2016
“The analysis bottleneck”
• The Lavigne paradox
21 July 2016 Phage Genomics - VoM 2016
Quick group activity
Defining the question(s):
• How many of you have annotated a
genome?
• How many phage genomes have you
sequenced (or are in the process of
sequencing)?
a) None b) 1-5 c) 5-50 d) > 50
• What is the single most pressing question
you want to answer from genome analysis?
21 July 2016 Phage Genomics - VoM 2016
DEFINING THE QUESTION(S)
“Begin with the end in mind” (Covey, the 7 habits)
21 July 2016 Phage Genomics - VoM 2016
What You Want
The goal:
 complete
 accurate
Incomplete:
 genome
termini Faulty assembly
Frameshift
 chimeric
fragments21 July 2016 Phage Genomics - VoM 2016
A process of reconstruction
21 July 2016 Phage Genomics - VoM 2016
Annotation  Reconstruction
from genome from metagenome
21 July 2016 Phage Genomics - VoM 2016
Incomplete
frameshift
- complete
- accurate
Credit: Andrew Kropinski Credit: Bas Dutilh
faulty assembly
Annotation  Reconstruction
from genome from metagenome
21 July 2016
Incomplete faulty assembly
frameshift
- complete
- accurate
Phage Genomics - VoM 2016
Credit: Andrew Kropinski Credit: Bas Dutilh
A process of reconstruction
• Experimentally
DNA
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
21 July 2016 Phage Genomics - VoM 2016
A process of reconstruction
• Experimentally
• Computationally
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
21 July 2016 Phage Genomics - VoM 2016
DNA
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Countless tools
21 July 2016 Phage Genomics - VoM 2016
Authority figures
Andrew Kropinski Rob Lavigne
21 July 2016 Phage Genomics - VoM 2016
Rob Edwards
General outline
• Part I: The “Kropinski toolkit”
– Tools approved and recommended by Andrew
Kropinski (http://molbiol-tools.ca): from seq to pub
• Part II: SEED-based tools:
– The RAST family
– The PhAnToMe database/portal
21 July 2016 Phage Genomics - VoM 2016
The Kropinski Toolkit
21 July 2016 Phage Genomics - VoM 2016
What we want, according to Andrew
Well characterized genome, in which, ideally we
know:
 the location & function of all the genes
 the location of promoters & terminators
 the correct taxonomy
PstI PstI
20
21
22
23
24
25
26
26A
27 28 29
30
31
32
33
30.0 kb
Viruses; dsDNA viruses, no RNA stage; Caudovirales; Siphoviridae;
T1virus
21 July 2016 Phage Genomics - VoM 2016
Desired outcome: Create GenBank
submission
• Complete, accurate description of the
genome and its taxonomy
Good title
Desired outcome (2)
21 July 2016 Phage Genomics - VoM 2016
Desired outcome (3)
21 July 2016 Phage Genomics - VoM 2016
Desired outcome (4)
 Protein products of concern, particularly
for those interested in phage therapy:
 Integrases
 Toxins
PstI PstI
20
21
22
23
24
25
26
26A
27 28 29
30
31
32
33
30.0 kb
21 July 2016 Phage Genomics - VoM 2016
Processes and Steps
I. Primary analysis
(QC/ pre-annotation proofreading: e.g., orient with BLASTN)
II. Genome annotation
– Gene finding (ORF calling)
– Automated annotation
– Massaging (edition, functional assignment)
III. Second dimension (regulatory elements)
IV. Comparative genomics
V. Metadata
VI. Visualization
21 July 2016 Phage Genomics - VoM 2016
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
AUTOMATED ANNOTATION
II. Genome Annotation
21 July 2016 Phage Genomics - VoM 2016
RAST (subsystems-based tools)
• Will be the major focus of this short
tutorial…
• The goal is:
1. Quick demo how to use RAST
2. Quick preview batch annotation in RAST
3. Optimize RAST for phage annotation
4. Demonstrate & discuss how to improve
RAST output
21 July 2016 Phage Genomics - VoM 2016
RAST (subsystems-based tools)
• But,
before getting there …
21 July 2016 Phage Genomics - VoM 2016
The Kropinski wisdom
1. Always use more than one tool
2. Never blindly trust any automated (or manual)
process
3. Use your eyes and hands: visual inspection/
manual proofreading, re-annotation
– Every apparently complicated file is still editable on
your favorite text editor (e.g., NotePad)
4. If you don’t know a gene’s function (if you
can’t convince your grandma), better keep it
unnamed than contribute to error propagation
2 Aug 2015 Phage Genomics - Evergreen 2015
What do I call my gene product
(i.e. protein)?
 “phage hypothetical protein” – redundant
 “gp87” (gp = gene product)  hypothetical protein
 gp200 describes radically different proteins in
Listeria, Enterococcus, Mycobacterium,
Rhodococcus, Sphingomonas, Pseudomonas,
• Bacillus and Synechococcus phage genomes
 Add /note=“similar to gp43 of Escherichia coli
phage T4”
21 July 2016 Phage Genomics - VoM 2016
What do I call my gene product
(i.e. protein)?
 /product=“UboA”; “NrdA”; “hypothetical protein
SA5_0153/152”; “ORF184” (as bad as gp184); “RNAP1”;
"32 kDa protein”
 Bad because they don`t mean anything to the casual (or
informed) reader.
 Unless you are a bioinformatician or biostatistician be
conservative in recording “hits.” Could you convince your
grandmda?, if not list as a “hypothetical protein” but do take
a stand “putative DNA polymerase” is cowardly
21 July 2016 Phage Genomics - VoM 2016
Nomenclature Sins
 hypothetical protein  DNA polymerase with no
or poor quality evidence is far worse than:
 DNA polymerase  hypothetical protein
 Be cautious about using BLASTP hits in naming
gps – is there additional evidence to back the
designation up
21 July 2016 Phage Genomics - VoM 2016
Consistent Nomenclature
 All of these describe homologs of the
product of the coliphage T4 rIIA gene!
rIIA protector from prophage-induced early lysis
protector from prophage-induced early lysis
protector from prophage-induced early lysis rIIA
membrane-associated affects host membrane ATPase
rIIA membrane-associated affects host membrane ATPase
phage rIIA lysis inhibitor
rIIA protector
rIIA
rIIA protein
membrane integrity protector
hypothetical protein
unnamed protein product !!!!!!
protein of unknown function
21 July 2016 Phage Genomics - VoM 2016
Bottom line:
Manual vs. Automated
• “Turtles know the road better than
rabbits… ” Khalil Gibran
• “… but they may never reach the end!”
• The best approach?
– Human expert-based annotation
2 Aug 2015 Phage Genomics - Evergreen 2015
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
IV. COMPARATIVE GENOMICS
Genomic pairwise comparisons
 EMBOSS Stretcher:http://emboss.bioinformatics.nl/cgi-
bin/emboss/stretcher N.B. genomes must be collinear
 BLASTN - NCBI
 ANI (Average Nucleotide Identity):http://enve-
omics.ce.gatech.edu/ani/
 GGDC 2.0 (Genome to Genome Distance Calculator):
http://ggdc.dsmz.de/distcalc2.php
 jSpeciesWS –
ANI:http://jspecies.ribohost.com/jspeciesws/
Proteomic pairwise
comparisons
 CoreGenes –
(http://binf.gmu.edu:8080/CoreGenes3.0/)
 TBLASTX
 Remember protein sequence is more conserved
than DNA sequence; probably useful for more
distant relationships
VI. “POLISH” IT TO PUBLISH IT
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Servers & software
 BLAST Ring Image Generator (http://brig.sourceforge.net)
 CGView (http://wishart.biology.ualberta.ca/cgview)
 CGView Comparison Tool:
http://stothard.afns.ualberta.ca/downloads/CCT
 Circos (http://circos.ca)
 DNAPlotter:
(http://www.sanger.ac.uk/science/tools/dnaplotter)
 Easyfig (http://easyfig.sourceforge.net)
 GenomeVx (http://wolfe.ucd.ie/GenomeVx)
 GView Server (https://server.gview.ca)
 progressiveMauve and ACT
EasyFig
CGView Comparison Tool
BLAST Ring Image Generator

More Related Content

Similar to From Sequence to Knowledge: The Art and Science of Phage Genome Annotation

An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
Ramy K. Aziz
 
From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)
Ramy K. Aziz
 
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
Ramy K. Aziz
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
Torsten Seemann
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
GenomeInABottle
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
Takako Mochizuki
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
GenomeInABottle
 
Pride cluster presentation
Pride cluster presentation Pride cluster presentation
Pride cluster presentation
Juan Antonio Vizcaino
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
GenomeInABottle
 
Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)
Ramy K. Aziz
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
Kew Sama
 
Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016
solgenomics
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
Stephen Turner
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasets
Juan Antonio Vizcaino
 
Lab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxLab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptx
karlos64
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
PRIYANKAZALA9
 
New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...
QIAGEN
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
Chris Evelo
 
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree..."The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
Ramy K. Aziz
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
GenomeInABottle
 

Similar to From Sequence to Knowledge: The Art and Science of Phage Genome Annotation (20)

An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
 
From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)From Sequence to Knowledge (Tools for Phage Genome Annotation)
From Sequence to Knowledge (Tools for Phage Genome Annotation)
 
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
Pride cluster presentation
Pride cluster presentation Pride cluster presentation
Pride cluster presentation
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)Using RAST for phage annotation (2018 VoM meeting)
Using RAST for phage annotation (2018 VoM meeting)
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasets
 
Lab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxLab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptx
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
 
New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...New technology and workflow for integrated collection, stabilization and puri...
New technology and workflow for integrated collection, stabilization and puri...
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree..."The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 

More from Ramy K. Aziz

An introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationAn introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotation
Ramy K. Aziz
 
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
Ramy K. Aziz
 
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
Ramy K. Aziz
 
Giving and Receiving Feedback
Giving and Receiving FeedbackGiving and Receiving Feedback
Giving and Receiving Feedback
Ramy K. Aziz
 
FootballOmics
FootballOmicsFootballOmics
FootballOmics
Ramy K. Aziz
 
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
Ramy K. Aziz
 
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Ramy K. Aziz
 
If the dead bacteria could speak
If the dead bacteria could speakIf the dead bacteria could speak
If the dead bacteria could speak
Ramy K. Aziz
 
Rka nxt 2010_web
Rka nxt 2010_webRka nxt 2010_web
Rka nxt 2010_web
Ramy K. Aziz
 

More from Ramy K. Aziz (9)

An introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationAn introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotation
 
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
 
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
From Sequence to Knowledge (Phage Genomics Workshop Intro at the 22nd Biennia...
 
Giving and Receiving Feedback
Giving and Receiving FeedbackGiving and Receiving Feedback
Giving and Receiving Feedback
 
FootballOmics
FootballOmicsFootballOmics
FootballOmics
 
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
 
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
 
If the dead bacteria could speak
If the dead bacteria could speakIf the dead bacteria could speak
If the dead bacteria could speak
 
Rka nxt 2010_web
Rka nxt 2010_webRka nxt 2010_web
Rka nxt 2010_web
 

Recently uploaded

Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
ShibsekharRoy1
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
RAYMUNDONAVARROCORON
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 

Recently uploaded (20)

Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 

From Sequence to Knowledge: The Art and Science of Phage Genome Annotation

  • 1. From Sequence to Knowledge: The Art & Science of Phage Genome Annotation Ramy K. Aziz – Cairo University
  • 2. From Sequence to Knowledge: PhAnToMe, RAST, and the Ultimate Kropinski Toolkit A helping hand through The Annotation Bottleneck Compiled by: Andrew Kropinski and Ramy Aziz
  • 3. Online material • Data & links: – http://egybio.net/tutorial • Slides – http://bit.ly/annotation2016 – http://bit.ly/phantome4 – Old tutorials (more detailed, but missing latest ): • Evergreen 2011: http://slidesha.re/phantome1 • http://slidesha.re/phiRAST1 (Karin) • Evergreen 2013: http://bit.ly/phantome2 • Evergreen 2015: http://bit.ly/phantome3 21 July 2016 Phage Genomics - VoM 2016
  • 4. INTRODUCTION 21 July 2016 Phage Genomics - VoM 2016
  • 5. “The analysis bottleneck” • Observation: – We generate more data than we can analyze. – We generate sequence data faster than we can analyze them. • Opinion: – Bottlenecks are not created equal! – It is important to define the question(s) before working on the answer(s)! 21 July 2016 Phage Genomics - VoM 2016
  • 6. “The analysis bottleneck” • The Lavigne paradox 21 July 2016 Phage Genomics - VoM 2016
  • 7. “The analysis bottleneck” • The Lavigne paradox 21 July 2016 Phage Genomics - VoM 2016
  • 8. Quick group activity Defining the question(s): • How many of you have annotated a genome? • How many phage genomes have you sequenced (or are in the process of sequencing)? a) None b) 1-5 c) 5-50 d) > 50 • What is the single most pressing question you want to answer from genome analysis? 21 July 2016 Phage Genomics - VoM 2016
  • 9. DEFINING THE QUESTION(S) “Begin with the end in mind” (Covey, the 7 habits) 21 July 2016 Phage Genomics - VoM 2016
  • 10. What You Want The goal:  complete  accurate Incomplete:  genome termini Faulty assembly Frameshift  chimeric fragments21 July 2016 Phage Genomics - VoM 2016
  • 11. A process of reconstruction 21 July 2016 Phage Genomics - VoM 2016
  • 12. Annotation  Reconstruction from genome from metagenome 21 July 2016 Phage Genomics - VoM 2016 Incomplete frameshift - complete - accurate Credit: Andrew Kropinski Credit: Bas Dutilh faulty assembly
  • 13. Annotation  Reconstruction from genome from metagenome 21 July 2016 Incomplete faulty assembly frameshift - complete - accurate Phage Genomics - VoM 2016 Credit: Andrew Kropinski Credit: Bas Dutilh
  • 14. A process of reconstruction • Experimentally DNA TGATTGTGTGTTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG TGATTGGTCTNNNTCTCTTGCGCAATGCG 21 July 2016 Phage Genomics - VoM 2016
  • 15. A process of reconstruction • Experimentally • Computationally TGATTGTGTGTTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG TGATTGGTCTNNNTCTCTTGCGCAATGCG 21 July 2016 Phage Genomics - VoM 2016 DNA TGATTGTGTGTTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG TGATTGGTCTNNNTCTCTTGCGCAATGCG
  • 16. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 17. Countless tools 21 July 2016 Phage Genomics - VoM 2016
  • 18. Authority figures Andrew Kropinski Rob Lavigne 21 July 2016 Phage Genomics - VoM 2016 Rob Edwards
  • 19. General outline • Part I: The “Kropinski toolkit” – Tools approved and recommended by Andrew Kropinski (http://molbiol-tools.ca): from seq to pub • Part II: SEED-based tools: – The RAST family – The PhAnToMe database/portal 21 July 2016 Phage Genomics - VoM 2016
  • 20. The Kropinski Toolkit 21 July 2016 Phage Genomics - VoM 2016
  • 21. What we want, according to Andrew Well characterized genome, in which, ideally we know:  the location & function of all the genes  the location of promoters & terminators  the correct taxonomy PstI PstI 20 21 22 23 24 25 26 26A 27 28 29 30 31 32 33 30.0 kb Viruses; dsDNA viruses, no RNA stage; Caudovirales; Siphoviridae; T1virus 21 July 2016 Phage Genomics - VoM 2016
  • 22. Desired outcome: Create GenBank submission • Complete, accurate description of the genome and its taxonomy Good title
  • 23. Desired outcome (2) 21 July 2016 Phage Genomics - VoM 2016
  • 24. Desired outcome (3) 21 July 2016 Phage Genomics - VoM 2016
  • 25. Desired outcome (4)  Protein products of concern, particularly for those interested in phage therapy:  Integrases  Toxins PstI PstI 20 21 22 23 24 25 26 26A 27 28 29 30 31 32 33 30.0 kb 21 July 2016 Phage Genomics - VoM 2016
  • 26. Processes and Steps I. Primary analysis (QC/ pre-annotation proofreading: e.g., orient with BLASTN) II. Genome annotation – Gene finding (ORF calling) – Automated annotation – Massaging (edition, functional assignment) III. Second dimension (regulatory elements) IV. Comparative genomics V. Metadata VI. Visualization 21 July 2016 Phage Genomics - VoM 2016
  • 27. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 28. AUTOMATED ANNOTATION II. Genome Annotation 21 July 2016 Phage Genomics - VoM 2016
  • 29. RAST (subsystems-based tools) • Will be the major focus of this short tutorial… • The goal is: 1. Quick demo how to use RAST 2. Quick preview batch annotation in RAST 3. Optimize RAST for phage annotation 4. Demonstrate & discuss how to improve RAST output 21 July 2016 Phage Genomics - VoM 2016
  • 30. RAST (subsystems-based tools) • But, before getting there … 21 July 2016 Phage Genomics - VoM 2016
  • 31. The Kropinski wisdom 1. Always use more than one tool 2. Never blindly trust any automated (or manual) process 3. Use your eyes and hands: visual inspection/ manual proofreading, re-annotation – Every apparently complicated file is still editable on your favorite text editor (e.g., NotePad) 4. If you don’t know a gene’s function (if you can’t convince your grandma), better keep it unnamed than contribute to error propagation 2 Aug 2015 Phage Genomics - Evergreen 2015
  • 32. What do I call my gene product (i.e. protein)?  “phage hypothetical protein” – redundant  “gp87” (gp = gene product)  hypothetical protein  gp200 describes radically different proteins in Listeria, Enterococcus, Mycobacterium, Rhodococcus, Sphingomonas, Pseudomonas, • Bacillus and Synechococcus phage genomes  Add /note=“similar to gp43 of Escherichia coli phage T4” 21 July 2016 Phage Genomics - VoM 2016
  • 33. What do I call my gene product (i.e. protein)?  /product=“UboA”; “NrdA”; “hypothetical protein SA5_0153/152”; “ORF184” (as bad as gp184); “RNAP1”; "32 kDa protein”  Bad because they don`t mean anything to the casual (or informed) reader.  Unless you are a bioinformatician or biostatistician be conservative in recording “hits.” Could you convince your grandmda?, if not list as a “hypothetical protein” but do take a stand “putative DNA polymerase” is cowardly 21 July 2016 Phage Genomics - VoM 2016
  • 34. Nomenclature Sins  hypothetical protein  DNA polymerase with no or poor quality evidence is far worse than:  DNA polymerase  hypothetical protein  Be cautious about using BLASTP hits in naming gps – is there additional evidence to back the designation up 21 July 2016 Phage Genomics - VoM 2016
  • 35. Consistent Nomenclature  All of these describe homologs of the product of the coliphage T4 rIIA gene! rIIA protector from prophage-induced early lysis protector from prophage-induced early lysis protector from prophage-induced early lysis rIIA membrane-associated affects host membrane ATPase rIIA membrane-associated affects host membrane ATPase phage rIIA lysis inhibitor rIIA protector rIIA rIIA protein membrane integrity protector hypothetical protein unnamed protein product !!!!!! protein of unknown function 21 July 2016 Phage Genomics - VoM 2016
  • 36. Bottom line: Manual vs. Automated • “Turtles know the road better than rabbits… ” Khalil Gibran • “… but they may never reach the end!” • The best approach? – Human expert-based annotation 2 Aug 2015 Phage Genomics - Evergreen 2015
  • 37. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 39. Genomic pairwise comparisons  EMBOSS Stretcher:http://emboss.bioinformatics.nl/cgi- bin/emboss/stretcher N.B. genomes must be collinear  BLASTN - NCBI  ANI (Average Nucleotide Identity):http://enve- omics.ce.gatech.edu/ani/  GGDC 2.0 (Genome to Genome Distance Calculator): http://ggdc.dsmz.de/distcalc2.php  jSpeciesWS – ANI:http://jspecies.ribohost.com/jspeciesws/
  • 40. Proteomic pairwise comparisons  CoreGenes – (http://binf.gmu.edu:8080/CoreGenes3.0/)  TBLASTX  Remember protein sequence is more conserved than DNA sequence; probably useful for more distant relationships
  • 41. VI. “POLISH” IT TO PUBLISH IT
  • 42. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 43. Servers & software  BLAST Ring Image Generator (http://brig.sourceforge.net)  CGView (http://wishart.biology.ualberta.ca/cgview)  CGView Comparison Tool: http://stothard.afns.ualberta.ca/downloads/CCT  Circos (http://circos.ca)  DNAPlotter: (http://www.sanger.ac.uk/science/tools/dnaplotter)  Easyfig (http://easyfig.sourceforge.net)  GenomeVx (http://wolfe.ucd.ie/GenomeVx)  GView Server (https://server.gview.ca)  progressiveMauve and ACT
  • 46. BLAST Ring Image Generator

Editor's Notes

  1. Gp200 from Pseudomonas phage 201phi2-1 is related to phiKZ gp120 and EL gp78
  2. "Shifting the genomic gold standard for the prokaryotic species definition" Michael Richter and Ramon Rosselló-Móra. PNAS vol. 106 no. 45 pg 19126–19131, doi: 10.1073/pnas.0906412106 JSpeciesWS is a quick and easy to use online service to measure the probability if two or more (draft) genomes belong to the same species or not by pairwise comparison of (1) their Average Nucleotide Identity (ANI) and/or (2) correlation indexes of their Tetra-nucleotide signatures.
  3. Star - online