Unraveling Multimodality with Large Language Models.pdf
Â
Microbes run the planet - Jonathan Eisen slides from #scifoo 2006
1. Microbes Can Grow On
Anything
• Energy
– Light
– Organic and inorganic chemicals
• Carbon
– Organic degradation
– Inorganic “fixation”
• CO2, CO, CH4
• Contol global cycling of most nutrients
– N, S, P,
– Can manipulate just about every form
6. How Survive at 100°C
• Change amino acid composition of all
proteins
• Change composition of membranes
• Add enzymes to repair heat specific damage
(e.g., deamination of DNA)
• Changing which metals are used as
cofactors in biological processes
• Cell wall coatings
10. How Survive at High Salt
• High salt will cause water to want to flow out of
cell
• Compensate by increasing solute concentrations in
cell
• Many organisms use different solutes
• Extreme halophiles fill up inside of cell with salts
also
• Enzymes from these organisms work well in
industrial applications where salts are present
19. rRNA Revolution
• Morphology and
physiology evolve too
rapidly
• Molecular systematics
is the only way
• 16s rRNA is the
choice
• Three domains
discovered
26. Metagenomics by Large Inserts
• Isolate, by filtration, all microbes in a sample
• Extract total DNA in very large pieces
• Clone those pieces as BACs into E.coli to get enough.
• ID BACs of interest (e.g., containing rRNA)
• Sequence and analyze the BACs like a bacterial genome
Sample
Gene
Filter Extract Clone Sequence
DNA List
concentrate Into
BACs
30. Limits of Large Insert Approach
• Large insert libraries less random and less
representative than small inserts
• Lower throughput
• Requires some thinking
34. Baumannia cicadellinicola genome project:
1° symbionts of the Glassy-winged Sharpshooter
• Sap feeding insects
• Carriers of Xylella
fastidiosa that causes
Pierce’s disease of
grapevines
• There are >20000
sharpshooter species,
Glassy-winged Sharpshooter within which
intracellular symbiotic
bacteria are wildspread
37. Sargasso Sea Shotgun Sequencing
QuickTime™ and a
TIFF (LZW) decompressor
shotgun
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
sequence
Analysis led by Venter Institute.
Eisen lab contributions by
Dongying Wu, Martin Wu,
Jonathan Badger
41. taxonomic content per SHOTGUN 16S
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G
S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S- S-
02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 29 30 31 32 33 34 35 36
Station
48. Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria three phyla
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1
OP11
49. Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia phyla are
OP3
Planctomycetes
Spriochaetes
only sparsely
Coprothmermobacter
OP10 sampled
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1
OP11
50. Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes
• Some other
NKB19
Verrucomicrobia
Chlamydia
phyla are only
OP3
Planctomycetes sparsely
Spriochaetes
Coprothmermobacter
OP10
sampled
Thermomicrobia
Chloroflexi
TM7
• Solution:
Deinococcus-Thermus
Dictyoglomus
Aquificae
sequence more
Thermudesulfobacteria
Thermotogae phyla
OP1
OP11
Editor's Notes
An example of the reasons predicting function is difficult comes from our work on Deincooccus radiodurans , the most radiation resistant organism known. When TIGR sequenced the genome of this species (just before I got to TIGR) I helped look at the predicted DNA repair genes in the genome in the hope that this analysis would tell us something about this species radiation resistance. I was very interested in this since my Ph.D was on comparative studies of DNA repair, especially in extremophiles.
Extension of rRNA analysis to uncultured organisms using PCR
Phylogenetic analysis of rRNAs led to the discovery of archaea
Functional prediction using a gene tree is just like predicting the biology of a species using a species tree
This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.
Metagenomics involves cloning large DNA fragments from environmental samples and then selecting specific fragments for further sequencing. The selection of specific fragments can be based on mapping rRNA genes to them, mapping functional genes, or by first doing end-sequencing and then selecting fragments with interesting end-sequences.
This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.
Metagenomic analysis led to the discovery of a new form of phototrophy in the ocean
PCR primers designed from the metagenomic sequences were then used to discover additional forms of the proteorhodopsin that is the basis for the new form of phototrophy
This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems. clone from the Sargasso Sea. This shows that this