Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequencing, DNA Synthesis and Single Cell Genomics, DOE Joint Genome Institute, Eddy Rubin, Copenhagenomics 2012
Similar to Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequencing, DNA Synthesis and Single Cell Genomics, DOE Joint Genome Institute, Eddy Rubin, Copenhagenomics 2012
Similar to Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequencing, DNA Synthesis and Single Cell Genomics, DOE Joint Genome Institute, Eddy Rubin, Copenhagenomics 2012 (20)
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequencing, DNA Synthesis and Single Cell Genomics, DOE Joint Genome Institute, Eddy Rubin, Copenhagenomics 2012
1. Gene and genome discovery through
metagenomicsgene synthesis and single cell
genomics
Eddy Rubin
Lawrence Berkeley National Lab
Berkeley CA
3. Do We Need Biofuels and Why Not Just Batteries
?
Energy Density
100 fold more energy / kilo of petroleum than in a
kilo of the very best battery
• A 747 jet needs 200,000
kilos of fuel to fly from
SF to Copenhagen.
With a 20,000,000 kilo battery
a 747 jet won’t get off the
ground
5. Cellulosic Biofuel Production
Need to discover of enzymes of greater diversity
Inefficient
with newproperties
&
Expensive
Enzymes
Biofuels
Biomass
Deconstruction Fermentation and Fuel Synthesis
7. Cow Rumen Highly Efficient at
Biomass Deconstruction
Fistulated Cow
8. 36 hr
Biomass ~55% reduction in
(Switchgrass) cellulose
Fistulated Cow
IlluminaHiSeq
4 billion reads
(200bp, 3kb, 5kb)
Metagenomic
Fiber-attached DNA
~ ½ Terabase of Sequence Data
microbes
9. ½ Tb of Rumen Sequence
Gb This study
300,000
600,000 Warnecke 07
(Sanger, Termite Hindgut)
1/2 of a TB
500,000
250,000
200,000
400,000
150,000
300,000
200,000
100,000
71Mb 80Mb
100,000
50,000
0 Termite hindgut TammarWallaby
10. Total Assembly Based Pipeline
Gene HMMSearch
Assembly prediction (Glyco Hydrol & CB
Module domains)
179,092 scaffolds Thousands of Potential
2.5 M genes
Cellulolytic Enzymes
11. Genes Known to Deconstruct
Cellulose
Cellulose
-(1-4)
Endoglucanase
EC 3.2.1.4
Cellobiohydrolase
EC 3.2.1.91 & EC 3.2.1.150 -Glucosidase
EC 3.2.1.21 & EC 3.2.1.74
Previously Newly Discovered in
Known Total Rumen Data
3350
11
~556
12. Many in “computational” enzymes predicted
But are the in silica predictions real & functional?
Digital Information Functional Information
(Sequence Data) ???????????? (Biochemical Activity)
Large Scale Gene Synthesis
13. From Sequence Datato Function Info
From Sequence to Functional
SAMPLE
SEQUENCE EXPRESSION
DATA ANALYSIS SYNTHESIS
VALIDATE SYSTEM
Next-Gen Expression
Oligo synthesis and assembly Sequencing Vectors
14. GH1 Functional Analysis
Industrial process requires:
-Activity at 70 degrees C
Cellobiose -Stability at pH 4.5
GH1
Glucose
15. Select 300 Candidates for Synthesis and Functional
Characterization Maximizing Phylogenetic Space
Cellobiose
GH1
Glucose
16. Synthetic GH1 activity profiles
(Temperature & pH )
Settemperature optima
17 Enzymes identified active at pH 4.5 & 70 degrees C
17. How can we use this information
to improve bioenergy yield?
18. How can we use this information
to improve bioenergy yield?
Biofuel Amylase
Corn
19. Conclusion
Trawling deep metagenomic data is a successful
strategy to massively add to the diversity of enzymes
with desired activities
20. Conclusion
Trawling deep metagenomic data is a successful
strategy to massively add to the diversity of enzymes
with desired activities
Can we assemble genomes from deep metagenomic data?
21.
22.
23.
24. Information in Genomes vs Genes
Pathways, Which organism is doing what, Capabilities of
particular organism…
26. What we have are scaffolds but no Genomes
Bin of Scaffolds into Draft Genomes
Tetra Nucleotide Frequency and Sequence Coverage
Fragments (scaffolds) of DNA from the same bacterial
species have a similar tetranucleotideferquenciy
Fibrobacteres
Proteobacteria
Cyanobacteria
27. Assembled from Cow Rumen Microbiome 15 Draft Genomes
None of Which Have Ever Been Previously Reported
(1.8-3.3 MB)
28. Proof that in silica assembled genomes of hard to culture
organisms without a reference genome?
29. Single Cell Genome Sequencing
72h
rumen community
FACS
single cells
Multiple Displacement Amplification
(MDA)
Shot gun sequencing
and genome assembly
of DNA from isolated
single amplified genomes single cell
30. Metagenomic versus Single Cell
Derived Genome
Binned MetagenomicScaffolds
(Draft In Silica 3.1 Mb Genome)
Single Cell
Genome Reads
Single Cell Genome Reads Map to
every MetagenomicScaffold
Suggests that the scaffolds that bin
together are from the same
organism
>90% of single cell genome reads map to this single draft genome
Suggests that the draft genome is fairly complete
31. Conclusion
Ultra deep metagenomic
sequencing, even with short
reads, likely to increasingly
become a method of choice to
identify genes and characterize
the genomes of uncultured
organisms
Will enable the exploitation of the
diverse capabilities present in
environmental organisms to offer
biotech solutions
32. Voxelation
Gene expression tomography
Alex Sczyrba
Matthias Hess
TanjaWoyke
Voxelation
+ =
GET
Lo thruput + Hi thruput = Hi thruput
3D info 0D info 3D info
Crump Institute for Molecular Imaging
At JGI we looked quite a few organisms that live on plant materials. These includes herbivore mammals, birds, insects and even mollusks. The share one thing in common: none of these carry cellulolytic enzyme genes in their own genome, they rely on the microbial communities in their guts, for biomass degradation. Here we chose the cow to study the microbiome in its rumen, the reason why we did this is on the next slide.
Compared to two similar enzyme discovery studies from biomass degradation communities done previously at JGI, we obtained a massive amount of data. Four years ago Termite hindgut using Sanger sequencing got 71Mb. Last year From Wallaby foregut microbial communities, using a combination of Sanger and 454, and got about 800Mb of sequences. With more data we expected to find more enzymes encoded by the rumen microbes, but before everything we’ll have to deal with this terabyte scale data beast.
From this strategy we we assembled 15in silica draft genomesWith the single cell datata available we can now ask the question whether the draft genomes are real matching the single cell genomeThe question does this represent a real genomeThe first thing we see is that each scafolds have a significant number of hits from the single cell data which indicates that the set of scafolds in this draft genome bin are dervied from the same organsmThe second observation is that the majority of reads from the single cell genome maps to this draft genome suggesting that our draft does include most of the active genome pof this oranism this indicates