The document discusses whole genome shotgun sequencing. It explains that shotgun sequencing involves breaking the genome into random fragments, sequencing the fragments, and then assembling the sequences back into the original genome. It notes that this approach was pioneered by The Institute for Genomic Research and has revolutionized genome sequencing by allowing rapid determination of entire microbial genomes.
"Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis in...Jonathan Eisen
Talk by Jonathan Eisen given in December 2000 as guest seminar at the University of Maryland. Title; "Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis into a Single Composite Approach"
"Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis in...Jonathan Eisen
Talk by Jonathan Eisen given in December 2000 as guest seminar at the University of Maryland. Title; "Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis into a Single Composite Approach"
Eisen JA (2007) Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes. PLoS Biol 5(3): e82. doi:10.1371/journal.pbio.0050082
Experimenting with posting OpenAccess papers on Slideshare
Computational Enzymology of Ribozymes (from metal-ion to nucleobase catalysis...Fabrice Leclerc
the future of RNA worlds: seminar given at ANBIOφ / UPMC (Paris, France) on February 24th 2012 (http://www.scribd.com/doc/74685839/the-future-of-RNA-worlds)
A prelude to genetics of Mitochondria and Chloroplasts
the theory provides an explanation for the presence and source of organellar genome in eukaryotic cell
differentiation in microbes is a peculiar character, different microbes have a different mode of life some lives as a single cell, and some lives as complex life cycle by having different types of cells, coccoid, rod or sedentary cells it's all depend upon their
Microbiology has experienced a transformation during the last 25 years that has altered microbiologists' view of microorganisms and how to study them. The realization that most microorganisms cannot be grown readily in pure culture forced microbiologists to question their belief that the microbial world had been conquered. We were forced to replace this belief with an acknowledgment of the extent of our ignorance about the range of metabolic and organismal diversity.
Eisen JA (2007) Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes. PLoS Biol 5(3): e82. doi:10.1371/journal.pbio.0050082
Experimenting with posting OpenAccess papers on Slideshare
Computational Enzymology of Ribozymes (from metal-ion to nucleobase catalysis...Fabrice Leclerc
the future of RNA worlds: seminar given at ANBIOφ / UPMC (Paris, France) on February 24th 2012 (http://www.scribd.com/doc/74685839/the-future-of-RNA-worlds)
A prelude to genetics of Mitochondria and Chloroplasts
the theory provides an explanation for the presence and source of organellar genome in eukaryotic cell
differentiation in microbes is a peculiar character, different microbes have a different mode of life some lives as a single cell, and some lives as complex life cycle by having different types of cells, coccoid, rod or sedentary cells it's all depend upon their
Microbiology has experienced a transformation during the last 25 years that has altered microbiologists' view of microorganisms and how to study them. The realization that most microorganisms cannot be grown readily in pure culture forced microbiologists to question their belief that the microbial world had been conquered. We were forced to replace this belief with an acknowledgment of the extent of our ignorance about the range of metabolic and organismal diversity.
This presentation explains the meaning of curation and includes an introduction to the Apollo genome annotation editing tool and its curation environment.
Molecular pathology in microbiology and metagenomicsCharithRanatunga
INTRODUCTION
HISTORY
Steps
Analysis
Metagenomic Process
Sequence-based analysis
Function-based analysis
Application of metagenomics
Future Directions of metagenomics
Examples for metagenomics projects
Challenges and opportunities in personal omics profilingSenthil Natesan
The term ‘‘omic’’ is derived from the Latin suffix ‘‘ome’’ meaning mass or many. Thus, OMICS involve a mass (large number) of measurements per endpoint. (Jackson et al., 2006)
The functional state of a cell can be explained by the integrated set of different OMICS data, called molecular signature or biomarker.The same fact can be exploited to find out difference between diseased and normal.
For diagnosis of a diseases in future, personal OMICS profiling (POP) is indispensible.
The POP further confer advantage to produce personal drugs, based on POP.
This presentation was created by Ioanna Leontiou and it is intended as a creative and flexible tool for students on Biological sciences who focus on the chromosome segregation. It is created to facilitate students performing research projects in our lab (especially during Covid restrictions), but it is suitable for every student who wants to learn more about chromosomes and the molecular mechanism controlling chromosome segregation. The presentation includes a generic overview of the cell division, illustrates the chromosome structure and provides molecular details of the spindle assembly checkpoint, an important pathway that ensures high fedility of chromosome segregation through mitosis. It also includes an introduction to some of the molecular biology techniques used in a yeast lab and incoporates some fluorescent microscopy images/videos. At the end of the presentantion there is a list of open access scientific publications for further reading on the the molecular mechanism of spindle checkpoint and some links of some very interesting sites, which include a range of videos on laboratory molecular biology techniques, research talks and guided papers. The purpose of this presentantion is to create a piece of work that students could return to when needed. Diagramms and illustrations are also encouranged to be used by scientists, science communicators and educators.
This presentation is licensed under a Creative Common Attribution-ShareAlike 4.0 (CC BY-SA 4.0), unless otherwise stated on the specific slide.
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...Prasenjit Mitra
This set of slides gives an overview regarding the various omics technologies available and how they can be used for improvement in clinical setting or research
Similar to Talk for UC Davis Applied Phylogenetics Course at Bodega Bay (20)
Innovations in Sequencing & Bioinformatics
Talk for
Healthy Central Valley Together Research Workshop
Jonathan A. Eisen University of California, Davis
January 31, 2024 linktr.ee/jonathaneisen
Thoughts on UC Davis' COVID Current ActionsJonathan Eisen
Slides I used for a presentation to Chancellor May's leadership council about the current state of UC Davis' response to COVID and how it could be improved
20. Genome Sequences Have
Revolutionized Microbiology
• Predictions of metabolic processes
• Better vaccine and drug design
• New insights into mechanisms of evolution
• Genomes serve as template for functional
studies
• New enzymes and materials for engineering
and synthetic biology
Tuesday, March 8, 2011
21. General Steps in Analysis of
Complete Genomes
• Identification/prediction of genes
• Characterization of gene features
• Characterization of genome features
• Prediction of gene function
• Prediction of pathways
• Integration with known biological
data
• Comparative genomics
Tuesday, March 8, 2011
25. Why Completeness is
• Improves characterization of genome
features
– Gene order, replication origins
• Better comparative genomics
– Genome duplications, inversions
• Presence and absence of particular genes
can be very important
• Missing sequence might be important (e.g.,
centromere)
• Allows researchers to focus on biology not
sequencing
Tuesday, March 8, 2011
29. Phylogenomic Analysis
• Evolutionary reconstructions greatly
improve genome analyses
• Genome analysis greatly improves
evolutionary reconstructions
• There is a feedback loop such that these
should be integrated
Tuesday, March 8, 2011
30. Outline
• Phylogenomic Tales
– Selecting genomes for sequencing
– Species evolution
– Predicting functions of genes
– Uncultured microbes
– Searching for novel organisms and genes
Tuesday, March 8, 2011
31. Outline
• Phylogenomic Tales
– Selecting genomes for sequencing
– Species evolution
– Predicting functions of genes
– Uncultured microbes
– Searching for novel organisms and genes
• All of these going to be told in context of a
recent project “A Genomic Encyclopedia of
Bacteria and Archaea” (aka GEBA)
Tuesday, March 8, 2011
33. Major Microbial Sequencing
Efforts
• Coordinated, top-down efforts
– Fungal Genome Initiative (Broad/Whitehead)
– Gordon and Betty Moore Foundation Marine Microbial Genome
Sequencing Project
– Sanger Center Pathogen Sequencing Unit
– NHGRI Human Gut Microbiome Project
– NIH Human Microbiome Program
• White paper or grant systems
– NIAID Microbial Sequencing Centers
– DOE/JGI Community Sequencing Program
– DOE/JGI BER Sequencing Program
– NSF/USDA Microbial Genome Sequencing
• Covers lots of ground and biological diversity
Tuesday, March 8, 2011
35. As of 2002 Proteobacteria
TM6
OS-K • At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, March 8, 2011
36. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, March 8, 2011
37. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, March 8, 2011
38. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, March 8, 2011
39. Need for Tree Guidance Well Established
• Common approach within some eukaryotic
groups
• Many small projects funded to fill in some
bacterial or archaeal gaps
• Phylogenetic gaps in bacterial and archaeal
projects commonly lamented in literature
Tuesday, March 8, 2011
40. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of
OP8
Project Nitrospira
Bacteroides bacteria
Chlorobi
• A genome Fibrobacteres
Marine GroupA • Genome
WS3
from each of Gemmimonas sequences are
Firmicutes
eight phyla Fusobacteria
mostly from
Actinobacteria
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are only
Planctomycetes
Spriochaetes sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Solution I:
Dictyoglomus
Eisen, Ward, Aquificae
Thermudesulfobacteria
sequence more
Robb, Nelson, et Thermotogae
phyla
OP1
al OP11
Tuesday, March 8, 2011
42. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Still highly
OP10
Thermomicrobia
Chloroflexi
biased in terms
TM7
Deinococcus-Thermus
Dictyoglomus
of the tree
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Tuesday, March 8, 2011
44. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Same trend in
OP10
Thermomicrobia
Chloroflexi
Archaea
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Tuesday, March 8, 2011
45. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Same trend in
OP10
Thermomicrobia
Chloroflexi
Eukaryotes
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Tuesday, March 8, 2011
46. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Same trend in
OP10
Thermomicrobia
Chloroflexi
Viruses
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Tuesday, March 8, 2011
47. Proteobacteria
• GEBA TM6
OS-K • At least 40
Acidobacteria
• A genomic Termite Group
OP8
phyla of bacteria
encyclopedia Nitrospira
Bacteroides • Genome
Chlorobi
of bacteria Fibrobacteres
Marine GroupA
sequences are
and archaea WS3
Gemmimonas mostly from
Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria • Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter
OP10
• Solution: Really
Thermomicrobia
Chloroflexi Fill in the Tree
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Eisen & Ward, PIs Thermotogae
OP1
OP11
Tuesday, March 8, 2011
49. GEBA Pilot Project: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
Eisen, Eddy Rubin, Jim Bristow)
• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,
Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et
al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik
D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N.
Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, Eddy Rubin, Jim Bristow)
Tuesday, March 8, 2011
50. rRNA Tree of Life
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, March 8, 2011
54. B:
Ac
in t
ob
ac
te
B: ria # of Genomes
Am (H
Tuesday, March 8, 2011
in igh
10
15
20
25
30
35
0
5
an G
a C
B: B: er )
Ba Aq ob
ct uif ia
B: ero ica
B: e
D Ch ide
B: e ef lo te
r s
D rri ofl
ef ba e
B: e c xi
B: De B rrib ter
Ep lta : D act es
si Pr ei er
lo o n es
n te oc
Pr ob oc
ot a ci
B: e ct
G B: oba eri
am B F ct a
: ir e
B: m Fu mi ria
a
G P so cut
em ro ba e
t c s
B: ma eo te
ba ri
H tim c a
a t
B: loa ona eri
a
B: Pl nae de
an r te
Th c o s
Phyla
er B: to bia
m S m le
y s
B: od piro ce
es c te
T u h
B: he lfo ae s
rm b te
GEBA Pilot Target List
Th o a s
er de cte
m s ri
u a
A: ove lfo
H n bi
A: alo abu a
A: A b la
M rc ac e
A: et ha te
M han eo ria
et g
ha ob lob
ac i
A: no te
m r
A: The icr ia
Th rm obi
er oc a
m oc
op ci
ro
te
i
55. GEBA Pilot Project Overview
• Identify major branches in rRNA tree for
which no genomes are available
• Identify those with a cultured representative
in DSMZ
• DSMZ grew > 200 of these and prepped
DNA
• Sequence and finish 200+
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
• 1st paper Wu et al in Nature Dec 2009
Tuesday, March 8, 2011
56. GEBA Phylogenomic Lesson 1
The rRNA Tree of Life is a Useful Tool
for Identifying Phylogenetically Novel
Genomes
Tuesday, March 8, 2011
57. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press. 2007.
Based on tree from Pace 1997 Science
276:734-740
Tuesday, March 8, 2011
62. Network of Life
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, March 8, 2011
77. GEBA Phylogenomic Lesson 3
Phylogenetics guided genome
selection (and phylogenetics in
general) improves genome annotation
Tuesday, March 8, 2011
78. Predicting Function
• Key step in genome projects
• More accurate predictions help guide
experimental and computational analyses
• Many diverse approaches
• All improved both by “phylogenomic” type
analyses that integrate evolutionary
reconstructions and understanding of how
new functions evolve
Tuesday, March 8, 2011
79. From Eisen et
al. 1997 Nature
Medicine 3:
1076-1078.
Tuesday, March 8, 2011
80. Blast Search of H. pylori “MutS”
• Blast search pulls up Syn. sp MutS#2 with much higher p
value than other MutS homologs
• Based on this TIGR predicted this species had mismatch
repair
Based on Eisen
• Assumes functional constancy et al. 1997
Nature Medicine
3: 1076-1078.
Tuesday, March 8, 2011
81. Predicting Function
• Identification of motifs
– Short regions of sequence similarity that are indicative of
general activity
– e.g., ATP binding
• Homology/similarity based methods
– Gene sequence is searched against a databases of other
sequences
– If significant similar genes are found, their functional
information is used
• Problem
– Genes frequently have similarity to hundreds of motifs
and multiple genes, not all with the same function
Tuesday, March 8, 2011
82. MutL??
From http://asajj.roswellpark.org/huberman/dna_repair/mmr.html
Tuesday, March 8, 2011
83. Phylogenetic Tree of MutS Family
Aquae
Strpy
Bacsu
Synsp
Deira Helpy
Yeast
Human Borbu Metth
Celeg
mSaco
Yeast
Human Yeast
Mouse
Arath Celeg
Human
Arath
Human
Mouse
Spombe Fly
Yeast Xenla
Rat
Mouse
Yeast Human
Spombe Yeast
Neucr
Arath
Aquae Trepa
Chltr
DeiraTheaq
Thema BacsuBorbu Based on Eisen,
SynspStrpy 1998 Nucl Acids
Ecoli
Neigo Res 26: 4291-4300.
Tuesday, March 8, 2011
84. MutS Subfamilies
MSH5 MutS2
Aquae
Strpy
Bacsu
Synsp
Deira Helpy
Yeast
Human Borbu Metth
Celeg
mSaco
MSH6 Yeast
Human
Mouse
Arath
Yeast MSH4
Celeg
Human
Arath
Human
MSH3 Mouse
Fly
Spombe
Yeast Xenla
Rat
Mouse
Yeast
MSH1 Spombe
Human
Yeast
MSH2
Neucr
Arath
Aquae Trepa
Chltr
Deira
Theaq
BacsuBorbu
Thema
SynspStrpy
Ecoli
Neigo Based on Eisen,
1998 Nucl Acids
MutS1
Res 26: 4291-4300.
Tuesday, March 8, 2011
85. Overlaying Functions onto Tree
MutS2
MSH5 Aquae
Strpy
Bacsu
Synsp
Deira Helpy
Yeast
Human Borbu Metth
Celeg
MSH6 mSaco
Yeast
Human
Mouse
Arath
YeastMSH4
Celeg
Human
Arath
Human
MSH3 Mouse
Fly
Spombe
Yeast Xenla
Rat
Mouse
Yeast Human
MSH1 Spombe Yeast MSH2
Neucr
Arath
Aquae Trepa
Chltr
DeiraTheaq
BacsuBorbu
Thema
SynspStrpy Based on Eisen,
Ecoli
Neigo
1998 Nucl Acids
MutS1 Res 26: 4291-4300.
Tuesday, March 8, 2011
86. Functional Prediction Using Tree
MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions
Aquae
Strpy
Bacsu
Synsp
Deira Helpy
Yeast
Human Borbu Metth
Celeg
MSH6 - Nuclear mSaco
Repair
Yeast
Of Mismatches Human MSH4 - Meiotic Crossing
Mouse Yeast Over
Arath Celeg
Human
Arath
MSH3 - Nuclear Human
Mouse
RepairOf Loops Spombe Fly
Yeast Xenla
Rat
Mouse MSH2 - Eukaryotic Nuclear
Yeast Human Mismatch and Loop Repair
MSH1 Spombe Yeast
Neucr
Mitochondrial
Arath
Repair
Aquae Trepa
Chltr
DeiraTheaq
BacsuBorbu
Thema
SynspStrpy
Ecoli Based on Eisen,
Neigo
1998 Nucl Acids
MutS1 - Bacterial Mismatch and Loop Repair Res 26: 4291-4300.
Tuesday, March 8, 2011
88. PHYLOGENENETIC PREDICTION OF GENE FUNCTION
EXAMPLE A METHOD EXAMPLE B
2A CHOOSE GENE(S) OF INTEREST 5
3A 1 3 4
2B 2
IDENTIFY HOMOLOGS 5
1A 2A 1B 3B 6
ALIGN SEQUENCES
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
CALCULATE GENE TREE
Duplication?
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
OVERLAY KNOWN
FUNCTIONS ONTO TREE
Duplication?
2A 3A 1B 2B 3B 1 2 3 4 5 6
1A
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
Ambiguous
Duplication?
Species 1 Species 2 Species 3
1A 1B 2A 2B 3A 3B 1 2 3 4 5 6
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Based on Eisen,
1998 Genome
Duplication
Res 8: 163-167.
Tuesday, March 8, 2011
89. Phylogenetic Prediction of
• Termed phylogenomics (Eisen, et al 1997)
• Greatly improves accuracy of functional
predictions compared to similarity based
methods (e.g., blast)
• Automated methods now available
– Sean Eddy, Steven Brenner, Kimmen Sjölander,
etc.
• But …
Tuesday, March 8, 2011
91. Example 3: Non homology
methods
• Many genes have homologs in other species
but no homologs have ever been studied
experimentally
• Non-homology methods can make functional
predictions for these
• Example: phylogenetic profiling
Tuesday, March 8, 2011
92. Phylogenetic profiling basis
• Microbial genes are lost rapidly when not
maintained by selection
• Genes can be acquired by lateral transfer
• Frequently gain and loss occurs for entire
pathways/processes
• Thus might be able to use correlated presence/
absence information to identify genes with
similar functions
Tuesday, March 8, 2011
93. Non-Homology Predictions:
Phylogenetic Profiling
• Step 1: Search all genes in
organisms of interest against all
other genomes
• Ask: Yes or No, is each gene
found in each other species
• Cluster genes by distribution
patterns (profiles)
Tuesday, March 8, 2011
94. Carboxydothermus hydrogenoformans
• Isolated from a Russian hotspring
• Thermophile (grows at 80°C)
• Anaerobic
• Grows very efficiently on CO
(Carbon Monoxide)
• Produces hydrogen gas
• Low GC Gram positive
(Firmicute)
• Genome Determined (Wu et al.
2005 PLoS Genetics 1: e65. )
Tuesday, March 8, 2011
99. GEBA Lesson 3:
Phylogeny driven genome selection (and
phylogenetics) improves genome annotation
• Took 56 GEBA genomes and compared results vs. 56
randomly sampled new genomes
• Better definition of protein family sequence “patterns”
• Greatly improves “comparative” and “evolutionary”
based predictions
• Conversion of hypothetical into conserved hypotheticals
• Linking distantly related members of protein families
• Improved non-homology prediction
Tuesday, March 8, 2011
100. GEBA Lesson 4:
Metadata Important
Tuesday, March 8, 2011
101. GEBA Phylogenomic Lesson 5
Phylogeny-driven genome selection
helps discover new genetic diversity
Tuesday, March 8, 2011
102. Network of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, March 8, 2011
103. Protein Family Rarefaction
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
Tuesday, March 8, 2011
104. Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
105. Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
106. Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
107. Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
108. Wu et al. 2009 Nature 462, 1056-1060
Tuesday, March 8, 2011
111. Structural Novelty
• Of the 17000 protein families in the GEBA56, 1800
are novel in sequence (Wu)
• Structural modeling suggests many are structurally
novel too (D'haeseleer)
• 372 being crystallized by the PSI (Kerfeld)
Tuesday, March 8, 2011
112. GEBA Phylogenomic Lesson 6
Improves analysis of genome data
from uncultured organisms
Tuesday, March 8, 2011
113. Great Plate Count Anomaly
Culturing Microscope
Count Count
Tuesday, March 8, 2011
114. Great Plate Count Anomaly
Culturing Microscope
Count <<<< Count
Tuesday, March 8, 2011
116. rRNA Phylotyping
• Collect DNA from
environment
• PCR amplify rRNA
genes using broad (so-
called universal) primers
• Sequence
• Align to others
• Infer evolutionary tree
• Unknowns “identified”
by placement on tree
• Some use BLAST, but
not as good as phylogeny
Tuesday, March 8, 2011
117. rRNA PCR
The Hidden Majority Richness estimates
Hugenholtz 2002 Bohannan and Hughes 2003
Tuesday, March 8, 2011
120. rRNA phylotyping issues
• Massive amounts of data
– 1 x 10^6 new partial sequences with new 454
– 2 x 10^6 full length sequences in DB
• Alignments of new sequences not always
straightforward
• Solutions:
– Reliance on similarity scores (bad)
– High throughput automated phylogenetic tools
• STAP
• WATERs
Tuesday, March 8, 2011