At JGI we looked quite a few organisms that live on plant materials. These includes herbivore mammals, birds, insects and even mollusks. The share one thing in common: none of these carry cellulolytic enzyme genes in their own genome, they rely on the microbial communities in their guts, for biomass degradation. Here we chose the cow to study the microbiome in its rumen, the reason why we did this is on the next slide.
Compared to two similar enzyme discovery studies from biomass degradation communities done previously at JGI, we obtained a massive amount of data. Four years ago Termite hindgut using Sanger sequencing got 71Mb. Last year From Wallaby foregut microbial communities, using a combination of Sanger and 454, and got about 800Mb of sequences. With more data we expected to find more enzymes encoded by the rumen microbes, but before everything we’ll have to deal with this terabyte scale data beast.
From this strategy we we assembled 15in silica draft genomesWith the single cell datata available we can now ask the question whether the draft genomes are real matching the single cell genomeThe question does this represent a real genomeThe first thing we see is that each scafolds have a significant number of hits from the single cell data which indicates that the set of scafolds in this draft genome bin are dervied from the same organsmThe second observation is that the majority of reads from the single cell genome maps to this draft genome suggesting that our draft does include most of the active genome pof this oranism this indicates
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequencing, DNA Synthesis and Single Cell Genomics, DOE Joint Genome Institute, Eddy Rubin, Copenhagenomics 2012
Gene and genome discovery throughmetagenomicsgene synthesis and single cell genomics Eddy Rubin Lawrence Berkeley National Lab Berkeley CA
Do We Need Biofuels and Why Not Just Batteries ? Energy Density 100 fold more energy / kilo of petroleum than in a kilo of the very best battery• A 747 jet needs 200,000 kilos of fuel to fly from SF to Copenhagen. With a 20,000,000 kilo battery a 747 jet won’t get off the ground
Cellulosic Biofuels CO2 CO2Biomass CO2 Land Use
Cellulosic Biofuel ProductionNeed to discover of enzymes of greater diversity Inefficient with newproperties & Expensive Enzymes Biofuels Biomass Deconstruction Fermentation and Fuel Synthesis
Biomass degradation communities being studied by the JGIMarsupials Birds Insects Mollusks Hoatzin (Stink Bird) Ruminate
Cow Rumen Highly Efficient at Biomass Deconstruction Fistulated Cow
36 hr Biomass ~55% reduction in(Switchgrass) cellulose Fistulated Cow IlluminaHiSeq 4 billion reads (200bp, 3kb, 5kb) Metagenomic Fiber-attached DNA ~ ½ Terabase of Sequence Data microbes
½ Tb of Rumen Sequence Gb This study300,000 600,000 Warnecke 07 (Sanger, Termite Hindgut) 1/2 of a TB 500,000250,000200,000 400,000150,000 300,000 200,000100,000 71Mb 80Mb100,00050,000 0 Termite hindgut TammarWallaby
Total Assembly Based Pipeline Gene HMMSearchAssembly prediction (Glyco Hydrol & CB Module domains) 179,092 scaffolds Thousands of Potential 2.5 M genes Cellulolytic Enzymes
Genes Known to Deconstruct Cellulose Cellulose -(1-4) Endoglucanase EC 188.8.131.52 Cellobiohydrolase EC 184.108.40.206 & EC 220.127.116.11 -Glucosidase EC 18.104.22.168 & EC 22.214.171.124 Previously Newly Discovered in Known Total Rumen Data 3350 11 ~556
Many in “computational” enzymes predictedBut are the in silica predictions real & functional?Digital Information Functional Information (Sequence Data) ???????????? (Biochemical Activity) Large Scale Gene Synthesis
From Sequence Datato Function Info From Sequence to FunctionalSAMPLE SEQUENCE EXPRESSIONDATA ANALYSIS SYNTHESIS VALIDATE SYSTEM Next-Gen Expression Oligo synthesis and assembly Sequencing Vectors
GH1 Functional Analysis Industrial process requires: -Activity at 70 degrees CCellobiose -Stability at pH 4.5 GH1Glucose
Select 300 Candidates for Synthesis and Functional Characterization Maximizing Phylogenetic Space Cellobiose GH1 Glucose
Synthetic GH1 activity profiles (Temperature & pH ) Settemperature optima17 Enzymes identified active at pH 4.5 & 70 degrees C
How can we use this information to improve bioenergy yield?
How can we use this information to improve bioenergy yield? Biofuel Amylase Corn
Conclusion Trawling deep metagenomic data is a successfulstrategy to massively add to the diversity of enzymes with desired activities
Conclusion Trawling deep metagenomic data is a successfulstrategy to massively add to the diversity of enzymes with desired activitiesCan we assemble genomes from deep metagenomic data?
Information in Genomes vs GenesPathways, Which organism is doing what, Capabilities of particular organism…
What we have are scaffolds but no Genomes Bin of Scaffolds into Draft Genomes Tetra Nucleotide Frequency and Sequence Coverage Fragments (scaffolds) of DNA from the same bacterial species have a similar tetranucleotideferquenciy Fibrobacteres Proteobacteria Cyanobacteria
Assembled from Cow Rumen Microbiome 15 Draft Genomes None of Which Have Ever Been Previously Reported (1.8-3.3 MB)
Proof that in silica assembled genomes of hard to culture organisms without a reference genome?
Single Cell Genome Sequencing 72h rumen community FACS single cellsMultiple Displacement Amplification (MDA) Shot gun sequencing and genome assembly of DNA from isolated single amplified genomes single cell
Metagenomic versus Single Cell Derived Genome Binned MetagenomicScaffolds (Draft In Silica 3.1 Mb Genome) Single Cell Genome Reads Single Cell Genome Reads Map to every MetagenomicScaffold Suggests that the scaffolds that bin together are from the same organism>90% of single cell genome reads map to this single draft genome Suggests that the draft genome is fairly complete
ConclusionUltra deep metagenomicsequencing, even with shortreads, likely to increasinglybecome a method of choice toidentify genes and characterizethe genomes of unculturedorganismsWill enable the exploitation of thediverse capabilities present inenvironmental organisms to offerbiotech solutions
Voxelation Gene expression tomography Alex Sczyrba Matthias Hess TanjaWoyke Voxelation + = GETLo thruput + Hi thruput = Hi thruput3D info 0D info 3D info Crump Institute for Molecular Imaging