SlideShare a Scribd company logo
1 of 80
From Sequence to Knowledge
Computational tools for
phage genome annotation
A helping hand through
The Annotation Bottleneck
Ramy K. Aziz
Professor of Microbiology and Immunology,
Children’s Cancer Hospital (Egypt 57357) &
Faculty of Pharmacy, Cairo University
Twitter: @azizrk
PRELUDE
6 August 2021 Phage Genomics - Evergreen 2021
A bit of history…
• Since 2009, a Genomics Workshop has
become an essential part of the world-
famous Biennial Evergreen phage meeting
• The challenge was: how to meet
needs/expectations that are so many and
so diverse, in ~4 hours
• The next-level challenging request =
objectively keeping up with all excellent
tools that are being developed
6 August 2021 Phage Genomics - Evergreen 2021
MOTIVATION
WHY?
6 August 2021 Phage Genomics - Evergreen 2021
“The analysis bottleneck”
• Observation:
– We generate more data than we can analyze.
– We generate sequence data faster
than we can analyze them.
• Opinion:
– Bottlenecks are not
created equal!
– It is important to define the question(s)
before working on the answer(s)!
6 August 2021 Phage Genomics - Evergreen 2021
“The analysis bottleneck”
6 August 2021 Phage Genomics - Evergreen 2021 Roux et al. Nature Biotech 2019
“The analysis bottleneck”
• The Lavigne paradox (2013)
6 August 2021 Phage Genomics - Evergreen 2021
“The analysis bottleneck”
• The Lavigne paradox (2013)
6 August 2021 Phage Genomics - Evergreen 2021
EXPECTATIONS
6 August 2021 Phage Genomics - Evergreen 2021
Attendees’ expectations
• How many Evergreen/Annotation workshops you attended?
• Have you:
– annotated at least a phage genome?
– compared several phage genomes?
– worked on a viral metagenome?
– used the command line (Unix, Linux,
Mac Terminal) for sequence analysis?
• To optimize the content, let’s
take this survey on SOCRATIVE
(http://socrative.com)
– Enter ROOM: AZIZ15
6 August 2021 Phage Genomics - Evergreen 2021
What biologists want?
6 August 2021 Phage Genomics - Evergreen 2021
• A flawless, fully automated machine that reads
scientists’ mind, takes sequence as input and
converts it into publishable knowledge
Charlie Chaplin - Feeding Machine - Modern Times
What life offers?
6 August 2021 Phage Genomics - Evergreen 2021
What working in genomics
really is: “It takes two to tango”
6 August 2021 Phage Genomics - Evergreen 2021
Biologist
(aka human)
Computer
(aka machine)
Give me
everything
tonight!
Garbage IN
 Garbage
OUT
DEFINING THE QUESTION(S)
“Begin with the end in mind”
6 August 2021 Phage Genomics - Evergreen 2021
What you want …... is
from genome from metagenome
6 August 2021 Phage Genomics - Evergreen 2021
Incomplete
frameshift
- complete
- accurate
Credit: Andrew Kropinski Credit: Bas Dutilh
faulty assembly
What you want …... is
from genome from metagenome
6 August 2021
Incomplete faulty assembly
frameshift
- complete
- accurate
Phage Genomics - Evergreen 2021
Credit: Andrew Kropinski Credit: Bas Dutilh
A process of reconstruction
6 August 2021 Phage Genomics - Evergreen 2021
A process of reconstruction
• Experimentally
6 August 2021 Phage Genomics - Evergreen 2021
DNA
GTCTCTCTNNNTCTCTTG
A process of reconstruction
• Experimentally
• Computationally
6 August 2021 Phage Genomics - Evergreen 2021
DNA
GTCTCTCTNNNTCTCTTG
GTCTCTCTNNNTCTCTTG
A process of reconstruction
• Experimentally
• Computationally
6 August 2021 Phage Genomics - Evergreen 2021
“Any phage
one can get!”
“eDNA”
GTCTCTCTNNNTCTCTTG
GTCTCTCTNNNTCTCTTG
THE PROCESS / PIPELINE
6 August 2021 Phage Genomics - Evergreen 2021
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Countless tools and databases
6 August 2021 Phage Genomics - Evergreen 2021
Countless tools and databases
• Galaxy Apollo (Jason Gill et al.)
– https://cpt.tamu.edu/galaxy-pub/
• VIGA (by Enrique Gonzalez Tortuero)
– https://github.com/EGTortuero/viga
– (https://www.biorxiv.org/content/10.1101/277509v1)
– Email: E.GonzalezTortuero@salford.ac.uk
6 August 2021 Phage Genomics - Evergreen 2021
Material/Resources
• Data & links:
– http://egybio.net/tutorial
• Slides
– http://bit.ly/annotation2018, http://bit.ly/phageRAST2018
– http://bit.ly/phagePATRIC
– Old tutorials (more detailed, but missing latest ):
• Evergreen 2011: http://slidesha.re/phantome1
• http://slidesha.re/phiRAST1 (by Karin Holmfeldt)
• Evergreen 2013: http://bit.ly/phantome2
• Evergreen 2015: http://bit.ly/phantome3
• VoM 2016: http://bit.ly/annotation2016, http://bit.ly/phantome4
• Evergreen 2017:
http://bit.ly/phigenomics2017 , http://bit.ly/phantome2017
6 August 2021 Phage Genomics - Evergreen 2021
Material/Resources
6 August 2021 Phage Genomics - Evergreen 2021
I. ANNOTATION OVERVIEW
6 August 2021 Phage Genomics - Evergreen 2021
Desired outcome
Well characterized genome, in which, ideally we
know:
 the location & function of all the genes
 the location of promoters & terminators
 the correct taxonomy
PstI PstI
20
21
22
23
24
25
26
26A
27 28 29
30
31
32
33
30.0 kb
Viruses; dsDNA viruses, no RNA stage; Caudovirales; Siphoviridae;
T1virus
6 August 2021 Phage Genomics - Evergreen 2021
Desired outcome:
Create GenBank submission
• Complete, accurate description of the
genome and its taxonomy
6 August 2021 Phage Genomics - Evergreen 2021
Desired outcome (2)
6 August 2021 Phage Genomics - Evergreen 2021
Desired outcome (3)
6 August 2021 Phage Genomics - Evergreen 2021
Desired outcome (4)
 Protein products of concern, particularly
for those interested in phage therapy:
 Integrases
 Toxins
 Antimicrobial resistance genes
PstI PstI
20
21
22
23
24
25
26
26A
27 28 29
30
31
32
33
30.0 kb
6 August 2021 Phage Genomics - Evergreen 2021
II. CLASSIFICATION
6 August 2021 Phage Genomics - Evergreen 2021
Classification
• The phage sequence space (Lima-Mendez et al.)
• The phage proteomic tree (Edwards & Rohwer)
• New: VIP tree http://www.genome.jp/viptree
6 August 2021 Phage Genomics - Evergreen 2021
III. AUTOMATED ANNOTATION
6 August 2021 Phage Genomics - Evergreen 2021
THE CONCEPTS BEHIND THE TOOLS
How to?
It is all about
Matching/ Comparing Classifying
6 August 2021 Phage Genomics - Evergreen 2021
From:
Current Opinion in Biotechnology
2003, 14:303–310
What to count?
6 August 2021 Phage Genomics - Evergreen 2021
What to count?
6 August 2021 Phage Genomics - Evergreen 2021
What to count?
6 August 2021 Phage Genomics - Evergreen 2021
What to count?
6 August 2021 Phage Genomics - Evergreen 2021
What to count?
6 August 2021 Phage Genomics - Evergreen 2021
What to count?
6 August 2021 Phage Genomics - Evergreen 2021
What to count?
6 August 2021 Phage Genomics - Evergreen 2021
What to count?
6 August 2021 Phage Genomics - Evergreen 2021
What to count?
6 August 2021 Phage Genomics - Evergreen 2021
What to count?
6 August 2021 Phage Genomics - Evergreen 2021
6 August 2021
What to count? How to bin?
How to classify these?
Phage Genomics - Evergreen 2021
6 August 2021
What to count? How to bin?
Assembly or long-reads
Phage Genomics - Evergreen 2021
6 August 2021
What to count? How to bin?
“Truth”
Phage Genomics - Evergreen 2021
6 August 2021
What to count? How to bin?
Similarity, variability, and functional prediction
Phage Genomics - Evergreen 2021
6 August 2021
What to count? How to bin?
Counting genes/ gene families (protein families)…
Counting domains/ motifs
Phage Genomics - Evergreen 2021
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Nomenclature Sins
 hypothetical protein  DNA polymerase with no
or poor quality evidence is far worse than:
 DNA polymerase  hypothetical protein
 Be cautious about using BLASTP hits in naming
gps – is there additional evidence to back the
designation up?
6 August 2021 Phage Genomics - Evergreen 2021
 All of these describe homologs of the
product of the coliphage T4 rIIA gene!
rIIA protector from prophage-induced early lysis
protector from prophage-induced early lysis
protector from prophage-induced early lysis rIIA
membrane-associated affects host membrane ATPase
rIIA membrane-associated affects host membrane ATPase
phage rIIA lysis inhibitor
rIIA protector
rIIA
rIIA protein
membrane integrity protector
hypothetical protein
unnamed protein product !!!!!!
protein of unknown function
6 August 2021 Phage Genomics - Evergreen 2021
Consistent Nomenclature
What do I call my gene product
(i.e. protein)?
 “phage hypothetical protein” – redundant
 “gp87” (gp = gene product)  hypothetical protein
 gp200 describes radically different proteins in
Listeria, Enterococcus, Mycobacterium,
Rhodococcus, Sphingomonas, Pseudomonas,
• Bacillus and Synechococcus phage genomes
 Add /note=“similar to gp43 of Escherichia coli
phage T4”
6 August 2021 Phage Genomics - Evergreen 2021
What do I call my gene product
(i.e. protein)?
 “phage hypothetical protein” – redundant
 “gp87” (gp = gene product)  hypothetical protein
 gp200 describes radically different proteins in
Listeria, Enterococcus, Mycobacterium,
Rhodococcus, Sphingomonas, Pseudomonas,
• Bacillus and Synechococcus phage genomes
 Add /note=“similar to gp43 of Escherichia coli
phage T4”
6 August 2021 Phage Genomics - Evergreen 2021
Bottom line:
Manual vs. Automated
• “Tortoises can tell you more about the road
than rabbits… ” Khalil Gibran
• “… but they may never reach the end!”
• The best approach?
– Human expert-based annotation
6 August 2021 Phage Genomics - Evergreen 2021
PATRIC/SEED/RAST: Main concept
One genome
All genomes
6 August 2021 Phage Genomics - Evergreen 2021
PATRIC/SEED/RAST: Main concept
One genome
All genomes
6 August 2021 Phage Genomics - Evergreen 2021
“Subsystems-based technologies were developed in the SEED with the view that
the interpretation of one genome can be made more efficient and consistent if
hundreds of genomes are simultaneously annotated in one subsystem at a time”
RAST: automated annotation
6 August 2021 Phage Genomics - Evergreen 2021
Subsystems-based tools
(Extended RAST family)
• (At least) Five ways to annotate a genome via RAST:
– RAST (http://rast.nmpdr.org)
• annotates online, saves your genome on server
– Use your favorite gene caller then upload gbk file to RAST
– myRAST (local)
• uses the server but you can edit offline)
– RASTtk (second-generation RAST)
• modular
• batch upload
– PATRIC
– Phanotator
6 August 2021 Phage Genomics - Evergreen 2021
RAST (http://rast.nmpdr.org)
6 August 2021 Phage Genomics - Evergreen 2021
RASTtk (RAST toolkit)
6 August 2021 Phage Genomics - Evergreen 2021
RAST Video demos available
• Find & watch:
– http://tutorial.theseed.org
6 August 2021 Phage Genomics - Evergreen 2021
PATRIC (https://www.patricbrc.org/)
6 August 2021 Phage Genomics - Evergreen 2021
PATRIC (https://www.patricbrc.org/)
6 August 2021 Phage Genomics - Evergreen 2021
IV. COMPARATIVE GENOMICS
6 August 2021 Phage Genomics - Evergreen 2021
Genomic pairwise comparisons
6 August 2021 Phage Genomics - Evergreen 2021
 EMBOSS Stretcher: http://emboss.bioinformatics.nl/cgi-
bin/emboss/stretcher N.B. genomes must be collinear
 BLASTN - NCBI
 ANI (Average Nucleotide Identity): http://enve-
omics.ce.gatech.edu/ani/
 GGDC 2.0 (Genome to Genome Distance Calculator):
http://ggdc.dsmz.de/distcalc2.php
 jSpeciesWS – ANI:
http://jspecies.ribohost.com/jspeciesws/
Proteomic pairwise comparisons
 CoreGenes
http://binf.gmu.edu:8080/CoreGenes3.5/
 tBLASTX
 Remember that protein sequence is more
conserved than DNA sequence; probably
useful for more distant relationships.
6 August 2021 Phage Genomics - Evergreen 2021
V. METADATA
6 August 2021 Phage Genomics - Evergreen 2021
Standards for metadata reporting
6 August 2021 Phage Genomics - Evergreen 2021
Roux, et al.
VI. VISUALIZATION
6 August 2021 Phage Genomics - Evergreen 2021
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation (segmenter)
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Servers & software
 BLAST Ring Image Generator (http://brig.sourceforge.net)
 CGView (http://wishart.biology.ualberta.ca/cgview)
 CGView Comparison Tool:
http://stothard.afns.ualberta.ca/downloads/CCT
 Circos (http://circos.ca)
 DNAPlotter:
(http://www.sanger.ac.uk/science/tools/dnaplotter)
 Easyfig (http://mjsull.github.io/Easyfig/)
 GenomeVx (http://wolfe.ucd.ie/GenomeVx)
 GView Server (https://server.gview.ca)
 progressiveMauve and ACT
6 August 2021 Phage Genomics - Evergreen 2021
EasyFig
6 August 2021 Phage Genomics - Evergreen 2021
CGView Comparison Tool
6 August 2021 Phage Genomics - Evergreen 2021
BLAST Ring Image Generator
6 August 2021 Phage Genomics - Evergreen 2021
Conclusion: The Kropinski wisdom
6 August 2021 Phage Genomics - Evergreen 2021
The Kropinski wisdom
1. Always use more than one tool.
2. Never blindly trust any automated (or manual)
process.
3. Use your eyes and hands: visual inspection/
manual proofreading, re-annotation
– Every apparently complicated file is still editable on
your favorite text editor (e.g., NotePad).
4. If you don’t know a gene’s function (if you
can’t convince your grandma), better keep it
unnamed than contribute to error propagation.
6 August 2021 Phage Genomics - Evergreen 2021

More Related Content

Similar to From Sequence to Knowledge (Tools for Phage Genome Annotation)

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!adcobb
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologies
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologiesTAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologies
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologiesalison.callahan
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRONPrabin Shakya
 
Plant ontology web services on Araport
Plant ontology web services on AraportPlant ontology web services on Araport
Plant ontology web services on AraportAraport
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
VectorBase - PopGenBase Meeting at ASTMH08
VectorBase - PopGenBase Meeting at ASTMH08VectorBase - PopGenBase Meeting at ASTMH08
VectorBase - PopGenBase Meeting at ASTMH08Yoosook Lee
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Monica Munoz-Torres
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian Aurisano
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesEBI
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Monica Munoz-Torres
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMonica Munoz-Torres
 

Similar to From Sequence to Knowledge (Tools for Phage Genome Annotation) (20)

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologies
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologiesTAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologies
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologies
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Plant ontology web services on Araport
Plant ontology web services on AraportPlant ontology web services on Araport
Plant ontology web services on Araport
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
VectorBase - PopGenBase Meeting at ASTMH08
VectorBase - PopGenBase Meeting at ASTMH08VectorBase - PopGenBase Meeting at ASTMH08
VectorBase - PopGenBase Meeting at ASTMH08
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 

More from Ramy K. Aziz

An introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationAn introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationRamy K. Aziz
 
The Opera of Phantome - 2016 (presented at the EMBO Viruses of Microbes 2016 ...
The Opera of Phantome - 2016 (presented at the EMBO Viruses of Microbes 2016 ...The Opera of Phantome - 2016 (presented at the EMBO Viruses of Microbes 2016 ...
The Opera of Phantome - 2016 (presented at the EMBO Viruses of Microbes 2016 ...Ramy K. Aziz
 
Systems Biology and Genomics of Microbial Pathogens
Systems Biology and Genomics of Microbial PathogensSystems Biology and Genomics of Microbial Pathogens
Systems Biology and Genomics of Microbial PathogensRamy K. Aziz
 
Giving and Receiving Feedback
Giving and Receiving FeedbackGiving and Receiving Feedback
Giving and Receiving FeedbackRamy K. Aziz
 
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011Ramy K. Aziz
 
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Ramy K. Aziz
 
If the dead bacteria could speak
If the dead bacteria could speakIf the dead bacteria could speak
If the dead bacteria could speakRamy K. Aziz
 

More from Ramy K. Aziz (9)

An introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationAn introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotation
 
The Opera of Phantome - 2016 (presented at the EMBO Viruses of Microbes 2016 ...
The Opera of Phantome - 2016 (presented at the EMBO Viruses of Microbes 2016 ...The Opera of Phantome - 2016 (presented at the EMBO Viruses of Microbes 2016 ...
The Opera of Phantome - 2016 (presented at the EMBO Viruses of Microbes 2016 ...
 
Systems Biology and Genomics of Microbial Pathogens
Systems Biology and Genomics of Microbial PathogensSystems Biology and Genomics of Microbial Pathogens
Systems Biology and Genomics of Microbial Pathogens
 
Giving and Receiving Feedback
Giving and Receiving FeedbackGiving and Receiving Feedback
Giving and Receiving Feedback
 
FootballOmics
FootballOmicsFootballOmics
FootballOmics
 
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
phiRAST Tutorial - The 19th Evergreen Phage Meeting 2011
 
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
Introduction to PhAnToMe Workshop, 19th Evergreen Phage Meeting, 2011
 
If the dead bacteria could speak
If the dead bacteria could speakIf the dead bacteria could speak
If the dead bacteria could speak
 
Rka nxt 2010_web
Rka nxt 2010_webRka nxt 2010_web
Rka nxt 2010_web
 

Recently uploaded

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 

Recently uploaded (20)

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 

From Sequence to Knowledge (Tools for Phage Genome Annotation)

  • 1. From Sequence to Knowledge Computational tools for phage genome annotation A helping hand through The Annotation Bottleneck Ramy K. Aziz Professor of Microbiology and Immunology, Children’s Cancer Hospital (Egypt 57357) & Faculty of Pharmacy, Cairo University Twitter: @azizrk
  • 2. PRELUDE 6 August 2021 Phage Genomics - Evergreen 2021
  • 3. A bit of history… • Since 2009, a Genomics Workshop has become an essential part of the world- famous Biennial Evergreen phage meeting • The challenge was: how to meet needs/expectations that are so many and so diverse, in ~4 hours • The next-level challenging request = objectively keeping up with all excellent tools that are being developed 6 August 2021 Phage Genomics - Evergreen 2021
  • 4. MOTIVATION WHY? 6 August 2021 Phage Genomics - Evergreen 2021
  • 5. “The analysis bottleneck” • Observation: – We generate more data than we can analyze. – We generate sequence data faster than we can analyze them. • Opinion: – Bottlenecks are not created equal! – It is important to define the question(s) before working on the answer(s)! 6 August 2021 Phage Genomics - Evergreen 2021
  • 6. “The analysis bottleneck” 6 August 2021 Phage Genomics - Evergreen 2021 Roux et al. Nature Biotech 2019
  • 7. “The analysis bottleneck” • The Lavigne paradox (2013) 6 August 2021 Phage Genomics - Evergreen 2021
  • 8. “The analysis bottleneck” • The Lavigne paradox (2013) 6 August 2021 Phage Genomics - Evergreen 2021
  • 9. EXPECTATIONS 6 August 2021 Phage Genomics - Evergreen 2021
  • 10. Attendees’ expectations • How many Evergreen/Annotation workshops you attended? • Have you: – annotated at least a phage genome? – compared several phage genomes? – worked on a viral metagenome? – used the command line (Unix, Linux, Mac Terminal) for sequence analysis? • To optimize the content, let’s take this survey on SOCRATIVE (http://socrative.com) – Enter ROOM: AZIZ15 6 August 2021 Phage Genomics - Evergreen 2021
  • 11. What biologists want? 6 August 2021 Phage Genomics - Evergreen 2021 • A flawless, fully automated machine that reads scientists’ mind, takes sequence as input and converts it into publishable knowledge Charlie Chaplin - Feeding Machine - Modern Times
  • 12. What life offers? 6 August 2021 Phage Genomics - Evergreen 2021
  • 13. What working in genomics really is: “It takes two to tango” 6 August 2021 Phage Genomics - Evergreen 2021 Biologist (aka human) Computer (aka machine) Give me everything tonight! Garbage IN  Garbage OUT
  • 14. DEFINING THE QUESTION(S) “Begin with the end in mind” 6 August 2021 Phage Genomics - Evergreen 2021
  • 15. What you want …... is from genome from metagenome 6 August 2021 Phage Genomics - Evergreen 2021 Incomplete frameshift - complete - accurate Credit: Andrew Kropinski Credit: Bas Dutilh faulty assembly
  • 16. What you want …... is from genome from metagenome 6 August 2021 Incomplete faulty assembly frameshift - complete - accurate Phage Genomics - Evergreen 2021 Credit: Andrew Kropinski Credit: Bas Dutilh
  • 17. A process of reconstruction 6 August 2021 Phage Genomics - Evergreen 2021
  • 18. A process of reconstruction • Experimentally 6 August 2021 Phage Genomics - Evergreen 2021 DNA GTCTCTCTNNNTCTCTTG
  • 19. A process of reconstruction • Experimentally • Computationally 6 August 2021 Phage Genomics - Evergreen 2021 DNA GTCTCTCTNNNTCTCTTG GTCTCTCTNNNTCTCTTG
  • 20. A process of reconstruction • Experimentally • Computationally 6 August 2021 Phage Genomics - Evergreen 2021 “Any phage one can get!” “eDNA” GTCTCTCTNNNTCTCTTG GTCTCTCTNNNTCTCTTG
  • 21. THE PROCESS / PIPELINE 6 August 2021 Phage Genomics - Evergreen 2021
  • 22. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 23. Countless tools and databases 6 August 2021 Phage Genomics - Evergreen 2021
  • 24. Countless tools and databases • Galaxy Apollo (Jason Gill et al.) – https://cpt.tamu.edu/galaxy-pub/ • VIGA (by Enrique Gonzalez Tortuero) – https://github.com/EGTortuero/viga – (https://www.biorxiv.org/content/10.1101/277509v1) – Email: E.GonzalezTortuero@salford.ac.uk 6 August 2021 Phage Genomics - Evergreen 2021
  • 25. Material/Resources • Data & links: – http://egybio.net/tutorial • Slides – http://bit.ly/annotation2018, http://bit.ly/phageRAST2018 – http://bit.ly/phagePATRIC – Old tutorials (more detailed, but missing latest ): • Evergreen 2011: http://slidesha.re/phantome1 • http://slidesha.re/phiRAST1 (by Karin Holmfeldt) • Evergreen 2013: http://bit.ly/phantome2 • Evergreen 2015: http://bit.ly/phantome3 • VoM 2016: http://bit.ly/annotation2016, http://bit.ly/phantome4 • Evergreen 2017: http://bit.ly/phigenomics2017 , http://bit.ly/phantome2017 6 August 2021 Phage Genomics - Evergreen 2021
  • 26. Material/Resources 6 August 2021 Phage Genomics - Evergreen 2021
  • 27. I. ANNOTATION OVERVIEW 6 August 2021 Phage Genomics - Evergreen 2021
  • 28. Desired outcome Well characterized genome, in which, ideally we know:  the location & function of all the genes  the location of promoters & terminators  the correct taxonomy PstI PstI 20 21 22 23 24 25 26 26A 27 28 29 30 31 32 33 30.0 kb Viruses; dsDNA viruses, no RNA stage; Caudovirales; Siphoviridae; T1virus 6 August 2021 Phage Genomics - Evergreen 2021
  • 29. Desired outcome: Create GenBank submission • Complete, accurate description of the genome and its taxonomy 6 August 2021 Phage Genomics - Evergreen 2021
  • 30. Desired outcome (2) 6 August 2021 Phage Genomics - Evergreen 2021
  • 31. Desired outcome (3) 6 August 2021 Phage Genomics - Evergreen 2021
  • 32. Desired outcome (4)  Protein products of concern, particularly for those interested in phage therapy:  Integrases  Toxins  Antimicrobial resistance genes PstI PstI 20 21 22 23 24 25 26 26A 27 28 29 30 31 32 33 30.0 kb 6 August 2021 Phage Genomics - Evergreen 2021
  • 33. II. CLASSIFICATION 6 August 2021 Phage Genomics - Evergreen 2021
  • 34. Classification • The phage sequence space (Lima-Mendez et al.) • The phage proteomic tree (Edwards & Rohwer) • New: VIP tree http://www.genome.jp/viptree 6 August 2021 Phage Genomics - Evergreen 2021
  • 35. III. AUTOMATED ANNOTATION 6 August 2021 Phage Genomics - Evergreen 2021
  • 36. THE CONCEPTS BEHIND THE TOOLS How to?
  • 37. It is all about Matching/ Comparing Classifying 6 August 2021 Phage Genomics - Evergreen 2021 From: Current Opinion in Biotechnology 2003, 14:303–310
  • 38. What to count? 6 August 2021 Phage Genomics - Evergreen 2021
  • 39. What to count? 6 August 2021 Phage Genomics - Evergreen 2021
  • 40. What to count? 6 August 2021 Phage Genomics - Evergreen 2021
  • 41. What to count? 6 August 2021 Phage Genomics - Evergreen 2021
  • 42. What to count? 6 August 2021 Phage Genomics - Evergreen 2021
  • 43. What to count? 6 August 2021 Phage Genomics - Evergreen 2021
  • 44. What to count? 6 August 2021 Phage Genomics - Evergreen 2021
  • 45. What to count? 6 August 2021 Phage Genomics - Evergreen 2021
  • 46. What to count? 6 August 2021 Phage Genomics - Evergreen 2021
  • 47. What to count? 6 August 2021 Phage Genomics - Evergreen 2021
  • 48. 6 August 2021 What to count? How to bin? How to classify these? Phage Genomics - Evergreen 2021
  • 49. 6 August 2021 What to count? How to bin? Assembly or long-reads Phage Genomics - Evergreen 2021
  • 50. 6 August 2021 What to count? How to bin? “Truth” Phage Genomics - Evergreen 2021
  • 51. 6 August 2021 What to count? How to bin? Similarity, variability, and functional prediction Phage Genomics - Evergreen 2021
  • 52. 6 August 2021 What to count? How to bin? Counting genes/ gene families (protein families)… Counting domains/ motifs Phage Genomics - Evergreen 2021
  • 53. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 54. Nomenclature Sins  hypothetical protein  DNA polymerase with no or poor quality evidence is far worse than:  DNA polymerase  hypothetical protein  Be cautious about using BLASTP hits in naming gps – is there additional evidence to back the designation up? 6 August 2021 Phage Genomics - Evergreen 2021
  • 55.  All of these describe homologs of the product of the coliphage T4 rIIA gene! rIIA protector from prophage-induced early lysis protector from prophage-induced early lysis protector from prophage-induced early lysis rIIA membrane-associated affects host membrane ATPase rIIA membrane-associated affects host membrane ATPase phage rIIA lysis inhibitor rIIA protector rIIA rIIA protein membrane integrity protector hypothetical protein unnamed protein product !!!!!! protein of unknown function 6 August 2021 Phage Genomics - Evergreen 2021 Consistent Nomenclature
  • 56. What do I call my gene product (i.e. protein)?  “phage hypothetical protein” – redundant  “gp87” (gp = gene product)  hypothetical protein  gp200 describes radically different proteins in Listeria, Enterococcus, Mycobacterium, Rhodococcus, Sphingomonas, Pseudomonas, • Bacillus and Synechococcus phage genomes  Add /note=“similar to gp43 of Escherichia coli phage T4” 6 August 2021 Phage Genomics - Evergreen 2021
  • 57. What do I call my gene product (i.e. protein)?  “phage hypothetical protein” – redundant  “gp87” (gp = gene product)  hypothetical protein  gp200 describes radically different proteins in Listeria, Enterococcus, Mycobacterium, Rhodococcus, Sphingomonas, Pseudomonas, • Bacillus and Synechococcus phage genomes  Add /note=“similar to gp43 of Escherichia coli phage T4” 6 August 2021 Phage Genomics - Evergreen 2021
  • 58. Bottom line: Manual vs. Automated • “Tortoises can tell you more about the road than rabbits… ” Khalil Gibran • “… but they may never reach the end!” • The best approach? – Human expert-based annotation 6 August 2021 Phage Genomics - Evergreen 2021
  • 59. PATRIC/SEED/RAST: Main concept One genome All genomes 6 August 2021 Phage Genomics - Evergreen 2021
  • 60. PATRIC/SEED/RAST: Main concept One genome All genomes 6 August 2021 Phage Genomics - Evergreen 2021 “Subsystems-based technologies were developed in the SEED with the view that the interpretation of one genome can be made more efficient and consistent if hundreds of genomes are simultaneously annotated in one subsystem at a time”
  • 61. RAST: automated annotation 6 August 2021 Phage Genomics - Evergreen 2021
  • 62. Subsystems-based tools (Extended RAST family) • (At least) Five ways to annotate a genome via RAST: – RAST (http://rast.nmpdr.org) • annotates online, saves your genome on server – Use your favorite gene caller then upload gbk file to RAST – myRAST (local) • uses the server but you can edit offline) – RASTtk (second-generation RAST) • modular • batch upload – PATRIC – Phanotator 6 August 2021 Phage Genomics - Evergreen 2021
  • 63. RAST (http://rast.nmpdr.org) 6 August 2021 Phage Genomics - Evergreen 2021
  • 64. RASTtk (RAST toolkit) 6 August 2021 Phage Genomics - Evergreen 2021
  • 65. RAST Video demos available • Find & watch: – http://tutorial.theseed.org 6 August 2021 Phage Genomics - Evergreen 2021
  • 66. PATRIC (https://www.patricbrc.org/) 6 August 2021 Phage Genomics - Evergreen 2021
  • 67. PATRIC (https://www.patricbrc.org/) 6 August 2021 Phage Genomics - Evergreen 2021
  • 68. IV. COMPARATIVE GENOMICS 6 August 2021 Phage Genomics - Evergreen 2021
  • 69. Genomic pairwise comparisons 6 August 2021 Phage Genomics - Evergreen 2021  EMBOSS Stretcher: http://emboss.bioinformatics.nl/cgi- bin/emboss/stretcher N.B. genomes must be collinear  BLASTN - NCBI  ANI (Average Nucleotide Identity): http://enve- omics.ce.gatech.edu/ani/  GGDC 2.0 (Genome to Genome Distance Calculator): http://ggdc.dsmz.de/distcalc2.php  jSpeciesWS – ANI: http://jspecies.ribohost.com/jspeciesws/
  • 70. Proteomic pairwise comparisons  CoreGenes http://binf.gmu.edu:8080/CoreGenes3.5/  tBLASTX  Remember that protein sequence is more conserved than DNA sequence; probably useful for more distant relationships. 6 August 2021 Phage Genomics - Evergreen 2021
  • 71. V. METADATA 6 August 2021 Phage Genomics - Evergreen 2021
  • 72. Standards for metadata reporting 6 August 2021 Phage Genomics - Evergreen 2021 Roux, et al.
  • 73. VI. VISUALIZATION 6 August 2021 Phage Genomics - Evergreen 2021
  • 74. Assembly Gene finding/ ORF calling tRNA calling Annotation (Assigning functions) orienting Validation (segmenter) Fixing frameshifts Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics From Sequence to Knowledge From raw sequence data to genome submission/ publication
  • 75. Servers & software  BLAST Ring Image Generator (http://brig.sourceforge.net)  CGView (http://wishart.biology.ualberta.ca/cgview)  CGView Comparison Tool: http://stothard.afns.ualberta.ca/downloads/CCT  Circos (http://circos.ca)  DNAPlotter: (http://www.sanger.ac.uk/science/tools/dnaplotter)  Easyfig (http://mjsull.github.io/Easyfig/)  GenomeVx (http://wolfe.ucd.ie/GenomeVx)  GView Server (https://server.gview.ca)  progressiveMauve and ACT 6 August 2021 Phage Genomics - Evergreen 2021
  • 76. EasyFig 6 August 2021 Phage Genomics - Evergreen 2021
  • 77. CGView Comparison Tool 6 August 2021 Phage Genomics - Evergreen 2021
  • 78. BLAST Ring Image Generator 6 August 2021 Phage Genomics - Evergreen 2021
  • 79. Conclusion: The Kropinski wisdom 6 August 2021 Phage Genomics - Evergreen 2021
  • 80. The Kropinski wisdom 1. Always use more than one tool. 2. Never blindly trust any automated (or manual) process. 3. Use your eyes and hands: visual inspection/ manual proofreading, re-annotation – Every apparently complicated file is still editable on your favorite text editor (e.g., NotePad). 4. If you don’t know a gene’s function (if you can’t convince your grandma), better keep it unnamed than contribute to error propagation. 6 August 2021 Phage Genomics - Evergreen 2021

Editor's Notes

  1. Gp200 from Pseudomonas phage 201phi2-1 is related to phiKZ gp120 and EL gp78
  2. Gp200 from Pseudomonas phage 201phi2-1 is related to phiKZ gp120 and EL gp78
  3. "Shifting the genomic gold standard for the prokaryotic species definition" Michael Richter and Ramon Rosselló-Móra. PNAS vol. 106 no. 45 pg 19126–19131, doi: 10.1073/pnas.0906412106 JSpeciesWS is a quick and easy to use online service to measure the probability if two or more (draft) genomes belong to the same species or not by pairwise comparison of (1) their Average Nucleotide Identity (ANI) and/or (2) correlation indexes of their Tetra-nucleotide signatures.
  4. Star - online