Until recently, the properties and compositions of the microbiota in the planet are still largely a black box. Next generation sequencing (NGS) has proven to be an invaluable tool for investigating diverse environmental and host-associated microbial communities, helping to generate enormous new data sets that can be mined for information on the composition and functional properties of vastly great numbers of microbial communities.
2. Genome sequencing has come of age, and genomics will become
central to microbiology's future.
Carl Woese
3. farm animal health,
aquaculture,
vegetables and fruits
production, etc.
identifying drug target,
overcoming antibiotic
resistance and some
diseases, etc.
novel energy
production
systems,
pollution
control, etc.
food industries
(such as fermented
foods and liquors
industries), some
emerging industries,
etc.
输入输入
输入
The application of microbial sequencing
输入
7. The weapon: next-generation sequencing
Platform Chemistry
PCR
amplification
Read
lengt
h
Reads/run
TH/run;
Run time
Disadvantages Applications
Roche 454
Sequencing-
by-synthesis
(Pyrosequenci
ng)
EmRCR >400 1,000,000 0.4-
0.6Gb; 7-
10h
PCR biases; asynchronous
synthesis; homopolymer
run; base insertion and
deletion errors; emPCR IS
cumbersome and
technically challenging
De novo genome
sequencing, RNA-seq,
resequencing/targeted
resequencing
Illumina
Polymerase-
based
sequencing-
by-synthesis
Bridge
amplification
75/2
×
100a
40,000,000 3-6/200*
Gb; 3-4
days
PCR biases; low
multiplexing capability of
samples
De novo genome
sequencing, RNA-seq,
resequencing/targeted
resequencing,
metagenomics, ChIP
SOLiD
Ligation-
based
sequencing
Em PCR 35-
40
85,000,000 10-20Gb;
7 days
EmPCR is cumbersome
and technically
challenging PCR biases;
long run time
Transcript counting,
mutation detection, ChIP,
RNA-seq, etc.
10. The Applications of Sequence Data
De novo assembly of entire genomes to generate primary genetic
sequences for the detailed genetic analysis of an microbial organism.
Resequencing for the discovery of variants that differ to known genome
sequences of a closely related strain.
Species classification and novel gene discovery in genomic surveys of
microbial communities.
“Seq-based” assays that determine the genomic content and abundance
of mRNAs, small RNAs, and non-coding RNAs (RNA-seq); or measure
profiles of DNA-protein complexes, methylation sites, and DNase I hyper-
sensitity sites.
01
02
03
04
17. Applications of metagenomics
1. Determination of microbial diversity
2. Discovery of novel pathways
3. Exploration of the diversity of targeted genes
4. Identification of traits of uncultured microbes
5. Patterns of community versus population diversity
18. The challenges of metagenomics
1. The complexity of sequencing data
2. The recovery of sufficiently purified high-molecular-weight DNA
without bias.
3. The unequal abundance of community members.
4. The assembly of genomes
Hello, welcome to watch the CD Genomics’s video about Microbial genomics. At first, do you know what microbes are? When you hear this word, what comes to your mind?
Carl Woese, the microbiologist who define the kingdom Archae, once said, “Genome sequencing has come of age, and genomics will become central to microbiology’s future.” Microbial genomics is an important field by utilizing nucleic acid sequencing technologies to investigate the genomic information of a single microbial species or microbial communities from a wide spectrum of environments.
Microbial genomics directly helps us explore the origins, evolution, and catalysts linked to disease outbreaks, and provides insights contributing to basic research, industrial, and environmental innovations.
Next, we are going to study microbial genomics from three aspects, including backgrounds, applications and classifications, as well as the prospect of microbial genomics.
A microorganism or microbe, is a microscopic organism, which may exist in its single-celled form, or in a colony of cells. They are about one tenth the size of a typical human cell. They are found all around us, and even inside our bodies. The category “microbes” includes prokaryotic microbes (such as bacteria and archaea), eukaryotic microbes (such as fungi, protozoan, and algae), and acellular microbes (such as virus, virusoid, viroid, and prion).
It’s hard to calculate how many microbic species exist on the Earth. Researchers estimated there might be one trillion species out there using the laws of math. However, only 100,000 microbial species have been sequenced, and only about 10,000 microbial species have been grown in labs. That is to say, there are a vast number of microbes to be discovered and fully understood.
After the first commercial next-generation Sequencing (NGS) platform Roche 454 was promoted in 2005, NGS has revolutionized microbial toxonomy and classification and has altered the landscape of microbial genome projects. It produces sequence data around one hundred times faster and cheaper than Sanger approach. The common next-generation sequencing platforms includes Roche 454, Illumina and SOLiD technology.
The competitive third generation technologies are already in production, which are characterized by portability (such as MinION), speed, longer reads, and the allowance for direct detection of epigenetic markers. The third generation sequencing has the unique advantages in ctDMA and single-cell sequencing. The representatives of the third generation are PacBio's SMRT and Oxford Nanopore's MinION. GenoCare prototype was developed DirectGenomics as the first clinical third generation sequencing platform in 2015.
Although the third generation sequencing technology is largely limited by relatively high error rate and high cost, the speed of sequencing is important and promising in the clinical setting to allow for timely diagnosis and clinical actions. The relatively long reads permit a near-complete viral genome sequencing directly from a primary clinical sample with high accuracy (about 97-99% identity). In addition, the third generation sequencing technologies (including both MinION and SMRT platform) have been used in the 16S ribosomal RNA gene sequencing, and it turns out that the error rate of PacBio is comparable to that from the next generation sequencing platforms, such as 454 and Illumina MiSeq.
The sequence data allows de novo assembly of entire, resequencing for the discovery of variants, species classification and novel gene discovery, and “Seq-based” assays.
According to Solieri et al. (2012), the applications of microbial genomics can be classified into community genome sequencing (also called metagenomics), whole genome sequencing, and genotype-phenotype association mapping.
Whole genome sequencing here refers to de novo sequencing of microbial genomes. The availability of complete genome sequences of closely related organisms provides a possibility for reconstructing events of genome evolution. Roche 454 has been generally considered more fit for genomes containing abundant repeated sequences due to the production of larger fragments over 400 bp. However, Illumina/Solexa platforms have been improved for eukaryotic de novo sequencing by (i) the longer read length using better extension reagents and chemistries; (ii) the development of paird-end tag sequencing, in which both ends of fragments are sequenced to offer more information; and (iii) novel assembly algorithms that deal with numerous short reads.
NGS platforms can be applied at three levels to explore genotype-phenotype relationships. The first is the detection of individual genetic variation (SNPs, small insertions or deletions), and large-scale structural variations (such as copy number variations) within a population for which a reference genome is available (re-sequencing). The second and third levels are the adoptions of transcriptomics (RNA-seq) or the genome-wide analysis of DNA and protein interactions (ChIP-seq).
RNA-seq has been a central approach to profiling mRNA populations. The traditional techniques, such as microarrays, have several limitations. First, they can not detect low-abundance transcripts. Second, the discovery of novel transcripts is limited. However, NGS platforms-based RNA-seq can get over these drawbacks. This methodology allows us to annotate transcripts, including protein encoding sequences and non-coding sequences; determine the transcriptional structure of genes; and quantify the expression level of transcripts under specific conditions.
Chromatin immunoprecipitation sequencing can provide whole-genome mapping of DNA-binding protein sites. This approach has become an indispensable tool for investigating gene regulation and epigenetic mechanisms.
The inability to isolate and cultivate many types of microorganisms has long limited the microbial studies. The cultivable fraction of microbes is low, less than 1%-a remarkable phenomenon known as “The Great Plate Count Anomaly”. Metagenomics, also called community genomics, provides the solution for cultivation bottleneck, by assessing the genetic content of an uncultured microbes. The brief process includes the isolation of DNA from environmental samples, the cloning of DNA into artificial bacterial chromosome, Plasmid, or Fosmid, and the following sequencing.
This technology can be oriented to several goals, first, the determination of microbial diversity. Both shotgun and targeted sequencing are alternative to microbial diversity analysis, though targeted sequencing are more rapid and affordable, and readily to analyze. For example, 16S rRNA gene-based sequencing is commonly used in phylogenetic studies. 16s rRNA gene is a highly conserved component to bacteria and archaea, providing a viable gene marker for species identification and taxonomic determination. Analogously, 18S rRNA and ITS (internal transcribed spacer) are efficient marker for species identification and phylogenetic studies in fungi. However, the PCR-based amplification methods are limited by amplification bias. Other PCR-free library omits the PCR step to produce a better understanding of low-abundance mutation, such as the Illumina PCR-free library preparation procedures.
Second, the discovery of novel pathways. Entire pathways may be identified and recovered for expression in a heterologous host by cloning large DNA segments. Pathways are potentially important pharmaceutical products.
Third, the exploration of the diversity of targeted genes. Community genomics can be used to screen genes for desired functions and properties. Some important ecological and functions, such as pollution degradation, biogeochemical cycle processes, and pathogenesis are particularly favored. Therefore, the diversity and evolution analysis of genes for particular processes can be achieved with this approach.
Fourth, the identification of traits of uncultured microbes. The cloning of large fragments of environmental DNA provides access to identifying traits in unculturable microbes. If the cloning is comprehensive enough, a reasonable gene content of the investigational microbial community can be obtained.
Fifth, the patterns of community versus population diversity. A microbial community is made up of different species and there are also considerable subspecies. Understanding how the environment manages the distribution and dynamics of species versus subspecies diversity is important to inform microbial ecology on the expected outcomes of succession and eventually evolution. The degree of subspecies variation is also important for community genome assembly.
metagenomics are faced with several challenges. First, the complexity of sequencing data. Metagenomics generates unprecedented amounts of sequence data. Meanwhile, it increases the difficulty in bioinformatics analyses. Second, the recovery of sufficiently purified high-molecular-weight DNA without bias. Recovery of high-molecular-weight DNA is important for genome sequence assembly. To decrease recovery bias, generally, rigorous cell lysis are needed, but the cell lysis process may shear DNA fragments. Third, the unequal abundance of community members.
Although random shotgun sequencing is a forceful technique for community genomics, the coverage is excessive for the dominant member and little or absent for infrequent members. The unique abundance of community members. Fourth, the assembly of genomes. The assembly of genomes in mixed communities is another challenge, especially when there are more than twenty species.
The Prospect of microbial genomics. First, the strategies. Studies only on model organisms are far from sufficient for deciphering life, and organisms cannot be fully understood in isolation from communities. The advent of metagenomics is providing a powerful tool for exploring complex communities. Microbial genomics is empowered by both metagenomics and new technologies.
Although powerful, metagenomics is limited by fragmented nature of metagenome assemblies for short reads and by the lack of single-cell resolutions. While metagenome assemblies often collapse due to strain heterogeneity, single-cell genomics has the ability to get it over.
Therefore, the hope for the future may lay in the direction: single-cell genomics. Community metagenomics can be partnered with single-cell genomics. Single-cell sequencing can be used to sequence several individual cell types presented in the community in parallel with community sequencing. This would provide representative reference genomes for that community and permit a more integrated understanding of the community and its members.
Second, the computational methods.
Over the last decade, there has been more than a 90% reduction in the cost of genomics technologies. Faced with much more data than we can analyze, computational methods for comparative analysis must be implemented. The most promising approach for this bottleneck involves a conceptual change-namely, the realization that effective comparative analysis do not require to compare all genes with all other genes.
Third, the future directions.
Since the first complete microbial genome (Haemophilius influenzae) was published in 1995, a dramatic rise in the number of sequenced microbial species has occurred. Genomics focuses are gradually moving to proteome and metabolome, in effort to explore cellular interactions, ecology and evolution.
Thanks for your watching.