Beyond metagenomics: Integration of
              complementary approaches for the
                study of microbial comm...
particular environments. Studies aimed at understanding complex              in assessing community composition and dynami...
Metatranscriptomics                                                          as metatranscriptomics and related words, ret...
with the most abundant type detected using DNA (Nogales et al.,             subunits, followed by a selective hybridizatio...
efficient way of feeding microarray probe design to match an
ecosystem’s particular genomic and transcriptional content (P...
extraction technique can influence recovery, it is often useful to                   The information generated by MS regar...
microbial biofilm (Tyson et al., 2004). While this study unveiled          results indicated that a large number of the id...
community after stress by cadmium exposure (Lacerda et al., 2007).          environmental microbial communities. With the ...
chromatography (LC-MS), gas chromatography (GC-MS) or                         substrates for that enzyme, thus overall bal...
strains (Ikeda et al., 2003; Omura et al., 2001; Peric-Concha and              backgrounds lies in their capacity to suppl...
screen genes encoding type I PKS in metagenomics shotgun data                incredible potential for the detection of act...
in microbial communities (He et al., 2007; Leigh et al., 2007; Rhee           environmental microbial communities (Maron e...
development of new technologies will open the way for more in-                                  El Fantroussi, S., Urakawa...
Beyond Metagenomics- Integration Of Complementary Approaches For The Study Of Microbial Communities
Beyond Metagenomics- Integration Of Complementary Approaches For The Study Of Microbial Communities
Upcoming SlideShare
Loading in …5
×

Beyond Metagenomics- Integration Of Complementary Approaches For The Study Of Microbial Communities

6,638 views

Published on

Cubillos-Ruiz A, Junca H, Baena S, Venegas I, Zambrano MM. 2009. Beyond Metagenomics: Integration of complementary approaches for the study of microbial communities. In Metagenomics: Theory, Methods and Applications - Editor: Diana Marcos. Horizon Scientific Press. ISBN: 978-1-904455-54-7

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,638
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
175
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Beyond Metagenomics- Integration Of Complementary Approaches For The Study Of Microbial Communities

  1. 1. Beyond metagenomics: Integration of complementary approaches for the study of microbial communities 1,2 Andrés Cubillos-Ruiz, 1,2Howard Junca, 2,3Sandra Baena, 2,4 Ivonne Venegas and 1, 2María Mercedes Zambrano 1 Corpogen Research Center, Carrera 5 No. 66A – 34, Bogotá, Colombia. 2 Colombian Center for Genomics and Bionformatics of Extreme Environments - Gebix, Carrera 5 No. 66A – 34, Bogotá, Colombia 3 Department de Biology, Pontificia Universidad Javeriana, POB 56710, Bogotá, Colombia 4 Department de Microbiology, Pontificia Universidad Javeriana, POB 56710, Bogotá, Colombia Abstract Advances in genomics have had a great impact on the field of microbial ecology. Metagenomics in particular holds great promise for accessing and characterizing microbial communities. However, the high diversity and level of complexity present in microbial communities represent an obstacle to understanding these assemblages given the current approaches. The integration of microbial community structure with function, taking into account uncultured microbes in diverse environments, remains particularly challenging. The anticipated increase in metagenomic data available in the future will require high-throughput methods for data management and analysis of these large and complex microbial communities. Integration of complementing technologies like microarrays, high throughput sequencing and bioinformatics and of novel tools and “meta” approaches, such as metaproteomics, metatranscriptomics and meta-metabolomics, will be required to understand the role of microbes in different ecological habitats. In spite of the many challenges, the field offers promising perspectives for achieving a more comprehensive view of microbial communities and how microorganisms adapt to and function within their ecosystems. Introduction The field of genomics has led to a conceptual shift in the way we descriptions to valuable information regarding metabolic potential approach biological systems by enabling researchers to go beyond (Xu, 2006). studies of isolated components and address global functions and One of the major challenges in the field of microbial complex ecosystem interactions (Bertin et al., 2008). Recent ecology is to understand how microorganisms in a community technological advances have also paved the way for novel interact with each other and how the community structure is experimental approaches to the study of microbial communities related to ecosystem function. Research in microbial diversity and that seemed largely implausible less than a decade ago. The rapidly technological advances over the last decades have led to a new growing area of metagenomics has applied the tools of genomics to appreciation of the diversity of microbiological life in our planet analyze complex microbial assemblages and has become a powerful and provided tools for accessing a broad spectrum of microbial strategy for exploring and characterizing microbial communities in communities. The use of culture-independent methods has been diverse settings. The appeal behind the metagenomics approach lies crucial to our understanding and estimates of microbial diversity, largely in its ability to bypass cultivation and offer a unique which now greatly surpass original calculations that were limited by opportunity to directly sample and gain new insights regarding culture-dependent methods. Modern molecular tools have therefore natural microbial assemblages. Metagenomic explorations therefore been fundamental to our growing recognition of the extent of enable examination of complex communities and microorganisms microbial diversity and the capacity of microorganisms to influence of difficult access, providing a more comprehensive view of the global ecosystem functioning (Schmidt, 2006). However, much populations present that can go from more extensive phylogenetic remains to be learned regarding microorganisms and their roles in 1
  2. 2. particular environments. Studies aimed at understanding complex in assessing community composition and dynamics and assignment communities require novel and more holistic approaches as well as of roles to encoded proteins, depends on available information integration of methodologies in order to understand the ecology of stored in the databases. Thus our capacity to derive information populations and factors that control their activities. In this respect from metagenomic samples is also constrained by our current metagenomics, coupled to complementing high-throughput knowledge regarding gene sequences and proteins, most of which strategies for studying expression profiles and microbial metabolic comes from sequenced genomes (Pignatelli et al., 2008). One of the potential, offers a unique opportunity for examining uncultured most substantial technical improvements is perhaps the recent microbes and assessing their role in an ecosystem (Turnbaugh and introduction of massively parallel sequencing technologies that Gordon, 2008). generate large amounts of sequence information at reduced costs. Metagenomics holds an undisputed advantage in terms of The use of high-throughput approaches will, no doubt, lead to an accessing and examining complex and difficult to study natural increase in the generation of metagenomic data that will in turn microbial communities. However, the metagenomic approach that require additional and more sophisticated bioinformatics tools to studies the entire DNA content of a community is still limited in manage this information and carry out processes such as assembly, its scope and capacity to derive ecologically meaningful information gene prediction, annotation, and metabolic reconstruction (Steward regarding the complex interactions that drive and shape and Rappe, 2007). communities. Difficulties inherent to this strategy, from problems Metagenomics is therefore at the point where scientific associated with extraction of genomic material to loss of relevant questions focused on understanding the interaction among information regarding the microorganisms and the ecosystem, microorganisms and their roles in the environment can start to be necessarily limit the information that can be obtained from a addressed. This will require coupling genotypic and phenotypic particular study. Problems related to limited recovery of DNA have analyses through the implementation of novel, powerful and been addressed recently by amplification of the isolated material innovative tools and the concerted integration of other “omic” using multiple strand displacement (MDA), a strategy that can also approaches such as proteomics and transcriptomics (see Figure 1). be applied to single cells (Lasken, 2007). This is done by means of The formidable plasticity displayed by microorganisms is related to the isothermal proof reading multiple displacement amplification their metabolic versatility, the interaction of complex regulatory activity of phi29 DNA polymerase, an enzyme discovered almost networks and their capacity to trigger differential responses that 30 years ago that has now been recognized as a powerful means for become evident in the expressed metabolic potential. Focusing on obtaining up to micrograms of DNA from minute amounts of the global analysis of all genes and expression profiles, can therefore starting material (Binga et al., 2008). This enzyme has been used reveal information beyond what can be gathered from studies of for amplification of metagenomic DNA and tested on soil DNA individual genes, contributing substantially to our understanding of templates probed against microarrays (Gonzalez et al., 2005; Wu et the physiology and the strategies involved in microbial adaptation al., 2006). Since metagenomics involves direct isolation of DNA to changing environmental conditions (Schweder et al., 2008). The from the environment, information regarding particular phenotypic major challenge in the future will be to integrate experimental traits is lost together with the capacity to carry out additional approaches and formulate questions aimed at deriving relevant analyses regarding the physiology of specific microbes. Depending ecological information, questions that can only be addressed in the on the questions being addressed, simplification of the microbial context of intact communities where population requirements and community might be a viable alternative in order to facilitate interactions are at work (Turnbaugh and Gordon, 2008). interpretation of the data and the reconstruction of genomic information. This could be achieved either through enrichment of certain populations or by following diverse cultivation strategies aimed at recovering microorganisms that can be further analyzed in the lab. The study of isolates or the reconstruction of genomes from simplified communities could provide relevant information in terms of understanding the role of microbes within their particular niche (Steward and Rappe, 2007; Tyson et al., 2004). More sophisticated approaches, such as cell sorting and microfluidics have also been tried (Cardenas and Tiedje, 2008; Warnecke and Hugenholtz, 2007). Another major drawback of metagenomics is that gene discovery is carried out at the expense of genomic context Figure 1. “Omics” approach to the study of microbial ecology. Microbial and in the absence of information regarding the organisms communities are influenced and shaped by both biotic and abiotic factors. The themselves. Deriving useful genomic data thus relies on the capacity “omic” strategies target different levels of the information flux, starting with the metagenome and increasing in complexity. The integration of these approaches of bioinformatics to reassemble and make sense of the massive can provide a more comprehensive of view of a community structure and amount of sequence information generated. The taxonomic function in a defined spatial and temporal setting. classification of metagenomic sequences, which could greatly help 2
  3. 3. Metatranscriptomics as metatranscriptomics and related words, retrieved only 10 Definition and origins citations starting in 2006. While this raw search can miss some Metatranscriptomics is the high-throughput detection and analysis, relevant publications on metatranscriptomic studies, it does suggest in sequence diversity and associated functions, of the transcripts that this is a new and emerging field. Reasons for the apparent (RNA molecules) extracted from samples where more than one delay in reports of research in this field, with respect to research in microbial genome type is present. It is essentially a transcriptomic the general area of metagenomics, are essentially related to technical study in samples containing multiple cell types, species or difficulties and previously identified limitations inherent to operational taxonomic units (OTUs). The word performing studies using environmental RNA. “metatranscriptomic” is derived by analogy with “metagenomic”. The inherent instability of RNA molecules has been one In the strict sense of the definition, metatranscriptomics could of the most limiting factors for the development of include all the work involving direct extraction and detection of metatranscriptomics. Transcriptional studies had already revealed RNA sequences from environmental samples, i.e. those involving the complexity of working with RNA, an unstable molecule of reverse transcription, target amplification, sequencing and analyses rapid turnover and short cellular half-life (seconds to minutes) of 16S rRNA gene transcripts (Felske et al., 1996a; Nogales et al., when compared to the informative and more stable molecules of 2001b; Small et al., 2001a; Weinbauer et al., 2002). However, if DNA. The lability of RNA molecules can also contrast with the one considers metagenomics mostly as a sequence-based approach proteome, which can have variable protein half-lives that are (excluding function-based screenings), metatranscriptomics could dependent on the specific protein’s biochemical nature and be restricted to analyses that have a broader scope and encompass localization. The transient nature of a given RNA population will total mRNA and/or rRNA transcripts in a sample. This approach is therefore influence the expression profiles observed, providing at made possible by massive sequencing efforts and ideally does not best a snapshot of what are probably highly dynamic patterns of involve cloning procedures or targeted PCR amplifications. expression (Velculescu et al., 1995). Another factor limiting the However, the widespread use of 16S rRNA gene amplifications to capacity for deep sequence-based transcriptomic analyses of characterize microbial communities could be considered as a special metagenomes is the low quantities of transcripts inherently present case since this gene is still extremely useful for exploring diversity and/or recovered from environmental samples. This is due to the and complexity in microbial communities (Tringe and Hugenholtz, substantially lower biomass content found in these samples when 2008). Metatranscriptomics complements the metagenomic compared with a pure bacterial culture (Amann et al., 1995). In approach by focusing on the expressed subset of genes addition, components that contaminate samples and are co- (metatranscriptome), thus reducing the complexity of the data to be extracted with the nucleic acids (Griffiths et al., 2000), such as analyzed. This allows, for example, detection of sequences humic acids in soils, can interfere with additional steps in sample associated with a particular environmental condition that may not processing like quantification, enzymatic amplification, be so readily identified in metagenomic studies and increases the modification or hybridization (Alm et al., 2000; Roh et al., 2006). chance of detecting ecologically relevant active functions. The These problems, despite being shared with metagenomics, are discovery of functions being induced in a sample as a response to a particularly critical for the demanding methodological steps certain environmental condition (exerted pressure) also gives insight involved in metatranscriptomic studies. However, improvements in into processes of adaptation and enriches our understanding of sample recovery and purification over the last years have opened the communities previously captured through metagenomic sequence way for global analyses that involve detection and identification of surveys. Thus, this approach gives a composite view of the transcripts from environmental samples. transcriptionally active subset of the genomes present in a community under the environmental condition sampled. As we will From 16S rRNA transcript sequencing to total metatranscriptome describe below, metatranscriptomics is now possible thanks to the pyrosequencing recent integration of various developments in different technical In many cases, the first approach to characterizing an and theoretical fields such as nucleic acids sequencing technologies, environmental microbial community still relies on a description of hybridization-based (array) transcriptomics, new molecular biology the taxonomical composition of the sample, usually based on 16S applications of well-characterized enzymes, microbial ecology rRNA gene amplification and sequencing. In the late 90s, some techniques to improve quantities, stability and detection of RNA reports described the so-called “active fraction” of the microbial molecules, and the emergence of bacterial phylogenomics and community by extracting RNA, generating cDNA and then related bioinformatics tools customized for metagenomic datasets, determining the sequence complexity in ribosomal genes (Felske et among others. al., 1996b; Nogales et al., 2001a). The community composition differed depending on whether DNA or RNA was used for 16S Limitations in analyzing the metatranscriptome rRNA gene amplification, with some phylogenetic groups found The exploitation of transcriptomics to assess the active subset of only in one of the two clone libraries from the same sample. In genes in a given environmental microbial community metagenome addition, predominant 16S rRNA types were more evident when is very recent, with reports appearing only in the last five years. A RNA was used as template, a reflection of a dominant search carried out in February 2009 for key terms in PubMed, such transcriptionally active species that did not necessarily correlate 3
  4. 4. with the most abundant type detected using DNA (Nogales et al., subunits, followed by a selective hybridization and removal of the 2001b). These studies revealed the discrepancy between observed rRNA. Another alternative takes advantage of a difference between predominant species or genome types and the transient expression mRNA and rRNA, which allows a processive 5´-3´ exonuclease to profile of particular microbes within a community. This transient digests rRNA having a 5´ monophosphate. This strategy was used expression is reflected by the amount of rRNA transcripts recovered to analyze the mRNA sequence content by pyrosequencing in and is influenced by the conditions at the time of sampling. These marine surface waters (Frias-Lopez et al., 2008; Gilbert et al., initial studies struggled with the technical difficulty of extracting 2008). Metatranscriptomics studies that use mRNA decrease the RNA from environmental samples and paved the way for complexity in a meaningful and useful way, offering the advantage improvements required for the analyses of transcripts from of recovering sequences for putative proteins that otherwise can be environmental samples (Hurt et al., 2001). Superior protocols and overlooked or underrepresented in metagenomic surveys commercial kits thus became available, improving the reproducibility, quality and quantity of nucleic acids being Future perspectives in metatranscriptomics extracted from various environmental sources. Despite these Nowadays, metatranscriptomic studies consist of deep sequence advances, there are still problems inherent to these procedures that surveys of the expressed genes from overwhelmingly complex require experimental fine-tuning in order to optimize procedures metagenomes (Raes and Bork, 2008; Urich et al., 2008). Although for diverse environmental samples. a powerful approach to understanding functionality, this strategy is The recently developed high-throughput sequencing still a relatively isolated and transient picture of what can be an technologies have obvious advantages in terms of exploring the amazingly diverse and largely unknown community. However, metatranscriptome. Pyrosequencing, which is based on the metatranscriptomics offers several advantages over the large-scale detection of the released pyrophosphate, represents a turning point sequence-based metagenomic approach that seeks broad sequence because it dispenses with cloning and provides a fast and coverage. By centering the analysis on the functions detected, this economical alternative for obtaining large-scale sequence approach reduces the sequence complexity and provides a more information. The basic steps involved in the pyrosequencing-based meaningful alternative to the study of heterogeneous communities. metatranscriptomic approach are: isolation of environmental RNA One of the advantages of working with libraries generated from (eRNA), generation of complementary ecDNAs by random-primed expressed transcriptional units is the increased chance of finding reverse transcription that are then treated to produce double protein coding, functional sequences and assigning possible roles to stranded DNA fragments of the environmental cDNAs (ds these proteins within a metabolic context (Dunlap et al., 2006). ecDNA). These ds ecDNAs are then ligated to adaptors, emulsified, Thus metatranscriptomics can facilitate understanding the and subjected to the 454-sequencing process (Leininger et al., variations within an ecosystem and the possible correlations 2006). These DNAs contain information of the expressed between environmental variables and function (Gianoulis et al., ribosomal genes (rRNA, taxonomical-community structure 2009). It can also be used to target specific functions of information) and protein-coding genes (mRNA – metabolic environmental importance (Gilbert et al., 2009; Shrestha et al., functions) within a microbial community and thus provide relevant 2008) and has the potential of identifying genes that could go input for more detailed downstream analyses (protein-based undetected in larger metagenomic sequencing datasets. The analyses or microarray design) at an unprecedented depth of construction and analysis of cDNA libraries from diverse coverage. This approach, which avoids the well-known biases environments has revealed several unique sequences and the associated with culturing, primer-probe specificity and sensitivity, potential to uncover a high degree of novelty within microbial PCR amplification, cloning and screening, was used by Urich et al. communities (McGrath et al., 2008). From a more pragmatic point to rapidly and simultaneously characterize both the structure and in of view, metatranscriptomics can be useful for describing the situ function of a soil microbial community (Urich et al., 2008). network of activities taking place in an ecosystem in order to The simultaneous analysis of both actively transcribed rRNA and obtain, for example, a specific metabolite. mRNA sequences obtained by pyrosequencing was thus useful for Several improvements and developments are still required taxonomic profiling of the community and assessing actively in order to more fully exploit this approach. One important aspect transcribed genes and functional information. for future studies in metatranscriptomics is to define the rates of In some cases it is desirable to focus on protein-coding genes and environmental RNA turnover (Kuechenmeister et al., 2009). This exclude the ribosomal content from the analysis. This focuses the will allow us to fine-tune and correct metatranscriptomic work on predictions regarding functionality or networking of the observations, and to assess possible correlations with microbial possible metabolic pathway present. It also increases coverage and diversity, composition and functions, as well as with the can reveal more diversity associated with a specific function. In environmental conditions present. An efficient coupling of microbial transcriptomics and metatranscriptomics, the exclusion of metatranscriptomics with other techniques used in environmental rRNA molecules is presently done by two methods. One method microbiology will also become more prevalent. These will include involves capturing and removing the ribosomal content by using other “omic” approaches, high-throughput sequencing and probes to target highly conserved regions on the ribosomal microarrays, where metatranscriptomics can provide a more 4
  5. 5. efficient way of feeding microarray probe design to match an ecosystem’s particular genomic and transcriptional content (Parro et al., 2007; Small et al., 2001a, b; Urich et al., 2008). Metatranscriptomics will also be used in conjunction with complementing strategies, such as stable isotope probing on nucleic acids, a technique that detects the incorporation of a supplied isotope into the DNA or RNA of the bacterial species metabolizing the substrate (Lueders et al., 2004). What will probably be very important, however, will be to increase the number of studies that follow the same community across temporal variations in order to have a more accurate notion of the expression dynamics involved. The development of additional data mining tools to better interpret and integrate metatranscriptomics with data derived from Figure 2. Schematic overview of the metaproteomic approach in microbial complementing strategies should allow us to relate environmental ecosystems. factors with community performance and improve our capacity to detect and predict adaptation and evolution of microbial The metaproteomic approach communities affected by natural or artificial pressures. The term proteomics, which was first used in 1995, can be defined as the large-scale study of the proteome, or the complete protein Metaproteomics complement, expressed by a genome under different conditions Metaproteomics has emerged over the last years as a powerful (Graves and Haystead, 2002). This term is used to represent the strategy that can contribute significantly towards our understanding array of proteins that are expressed in a biological compartment of ecosystem functioning in microbial ecology (Wilmes et al., (cell, tissue, organ or organism) at a particular time under a 2008) (Figure 2). It is evident that this ecological information particular set of conditions (Beranova-Giorgianni, 2003). Because cannot be obtained from the study of the genes alone and that proteins are key structural and functional molecules, molecular genomics is limited in terms of elucidating critical aspects of characterization of proteomes is important for a complete microbial interactions (Graves and Haystead, 2002). In fact, an understanding of biological systems. Therefore proteomic studies, important difference with respect to genomic studies is that which involve different disciplines such as molecular biology, proteomics can reflect the dynamics of a system and capture biochemistry and bioinformatics, can provide a more integrated changes driven by shifts in environmental conditions (Hagenstein view of a biological system by detecting modifications of its entire and Sewald, 2006). The fact that proteins, not genes, are directly protein fraction. Although proteomics has been used extensively to responsible for the phenotypes of cells makes proteomics an study microorganisms in pure culture, information derived from excellent tool for approaching functionality and revealing changes these protein profiles may not necessarily reflect processes occurring in protein synthesis and folding that result from rapid physiological in complex microbial communities found in natural settings responses (Lacerda et al., 2007). These protein expression profiles (Wilmes and Bond, 2006). Moreover, the focus of research on reflect specific microbial activities in a given ecosystem and can be microbial ecology goes beyond the individual species to study more informative than either identification of functional genes whole assemblages and ecosystems. In this respect, the present or even of their corresponding messenger RNAs (Benndorf metaproteomic approach goes further than single microorganisms et al., 2007; Wilmes and Bond, 2006). Proteomics is also useful to encompass the spectrum of proteins present in a microbial because it can identify functional genes of importance within a community, giving a glimpse of its functional potential. community and can verify metabolic processes inferred from Information generated using this strategy therefore complements metagenomic data. In addition, the generation of de novo peptide environmental genomic databases and contributes to our sequences confers specificity in the identification of proteins and understanding of natural ecosystems. phylogenetic origin of proteins (Wilmes and Bond, 2006). While the rapid progress in technologies for both protein separation and Experimental approach in metaproteomics identification, such as chromatography and mass spectrometry, has A metaproteomic analysis includes several technically challenging triggered exciting developments in the field, metaproteomics will steps, beginning with the extraction of microbial proteins from the surely gain more momentum with the advent and incorporation of surrounding matrix and ending with their identification (Maron et additional tools and strategies for exploring microbial communities. al., 2007b). The protein fraction in any ecosystem involves secreted and cellular proteins, some of which can be attached to the cell wall or embedded in membranes (integral proteins). The choice of the protein extraction technique is crucial due to the complexity of native microbial communities, the heterogeneity of natural environments, and the presence of interfering compounds that can affect the efficiency of extraction (Ogunseitan, 2006). Since the 5
  6. 6. extraction technique can influence recovery, it is often useful to The information generated by MS regarding peptide mass define this step on the basis of the protein fraction being targeted or sequence is then compared against published nucleotide or and on the subsequent method of protein analysis (Hecker, 2003). protein databases in order to predict and identify proteins (Wilmes There are many protocols for this purpose, including differential and Bond, 2006). This identification therefore depends on the centrifugation, resolving soluble proteins in separate gels, and information available and relies heavily on bioinformatics tools for employing reagents with stronger solubilization power for pellets comparison and identification of homologues in databases. enriched with membrane proteins (Molloy et al., 2000). The most commonly used technique in proteomics to Metaproteomics and microbial ecology separate and resolve complex protein mixtures is polyacrylamide gel The growing number of reports on the characterization of electrophoresis (PAGE) either in one (1-DE) or two dimensions (2- microbial ecosystems in recent years is indicative of the great DE). 2-DE first uses isoelectric focusing (IEF) in immobilized pH potential behind the metaproteomic approach. In-depth analyses of gradients followed by separation based on molecular weight using metaproteomic expression profiles are fundamental to our sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS- understanding of microbial interactions and of the role played by PAGE) in the second dimension. Despite being widely used for certain microorganisms in global nutrient cycles (Schweder et al., separation of proteins, 2DE is time-consuming and labor intensive 2008). The first studies on metaproteomics were carried out in and is limited in its capacity to resolve all the proteins in complex microbial habitats with limited microbial diversity, but nowadays samples or environments (Graves and Haystead, 2002). In the range of habitats studied has increased to include complex addition, PAGE separation can lead to an under representation of microbial communities. To date, metaproteomic analyses have been very large, or very small, proteins as well as of integral membrane conducted on microbial communities found in soils, activated proteins, and may fail to detect low abundance proteins. To bypass sludge, wastewaters, acid mine drainage biofilms, marine the limitations of protein electrophoresis, alternative ways of ecosystems and even the human gastrointenstinal tract (Kan et al., separating proteins have been developed, one of which involves 2005; Klaassens et al., 2007; Schulze et al., 2005; Sowell et al., high performance liquid chromatography (HPLC) (Graves and 2009; Tyson et al., 2004; Wilmes et al., 2008). Haystead, 2002). In a pioneering study aimed at identifying proteins in Once proteins have been separated, spots resolved in 2D dissolved organic matter (DOM) from complex environments such gels are digested with a protease, usually trypsin, and subjected to as lake waters, water extracted from soils and soil particles, Schulze analysis using mass spectrometry (MS) for protein identification et al. showed that, despite the limitations of the approach at the (Domon and Aebersold, 2006). The peptides must be ionized for time, specific taxonomic groups could be identified and proteomic MS and this is achieved usually by either matrix-assisted laser composition varied depending on the ecosystem, and that the desorption/ionization (MALDI) or electrospray ionization (ESI) strategy could be useful for assessing the functionality of an techniques. New ionization methods include desorption ecosystem (Schulze et al., 2005). More recently, protein electrospray ionization (DESI) and the recently developed surface- fingerprinting has been used to study natural communities and assisted laser desorption/ionization (SALDI) method that uses a evaluate the correlation between community structure and non-volatile inorganic matrix of germanium on a silicon surface ecosystem function. In one study, protein fingerprints generated by (Seino et al., 2007). Ionization is followed by mass analysis in a standard SDS-PAGE and ribosomal DNA fingerprints were used to mass spectrometer using different analyzers such as the commonly analyze indigenous microbial communities in freshwater samples. used quadrupole mass analyzers, time-of-flight (TOF) instruments, Results showed that variations in the genetic and functional ion trap mass analyzers that trap molecular ions in a 3-D electric structure were complex and varied depending on the perturbations field, and tandem mass spectrometry (MS/MS), which can be used imposed on the community (Maron et al., 2007a). More recent to acquire sequence information. There are several different mass work using the same strategy to analyze bacterial communities analyzers and the choice of equipment will be defined by several inoculated into sterile soils differing in their physicochemical criteria. Triple-quadrupole mass spectrometers, for example, are properties showed a correlation between the functional structure of most commonly used to obtain amino acid sequences while the community, as assessed by protein fingerprinting, and the quadrupole-TOF (qTOF) is used for amino acid sequencing and physicochemical characteristics of the soil (Maron et al., 2008). determination of modifications. MALDI-TOF is usually used for Both metagenomics, and more recently metaproteomics, peptide mass fingerprinting, MALDI-QqTOF allows both peptide have been applied to the study of a natural biofilm community mass fingerprinting and amino acid sequencing, and FT-ICR dominated by few species that is associated with acid mine drainage (Fourier transform ion cyclotron resonance) is useful because it can (AMD), an environmental problem that arises largely from achieve higher resolution and accuracy (Graham et al., 2007; microbial activity. By using shotgun cloning and sequencing of the Graves and Haystead, 2002). Detection has been improved thanks DNA retrieved directly from the environment, Tyson et al. were to developments such as MS/MS and TOF/TOF instrumentation able to reconstruct almost complete genomes of Leptospirillum with optimized laser quality or direct analysis in real time (DART) group II and Ferroplasma type II, and to partially recover three (Lasaosa, 2008). other genomes from this underground, low-complexity AMD 6
  7. 7. microbial biofilm (Tyson et al., 2004). While this study unveiled results indicated that a large number of the identified SAR11 metabolic pathways and insight into survival strategies, community peptides belonged to periplasmic substrate-binding proteins, proteomics carried out on this AMD biofilm provided information consistent with observations that the periplasmic space represents a about how these microorganisms function in their natural large proportion of the volume of the extremely small SAR11 cells. environment. The combination of mass spectrometry–based Other abundant proteins included proteins mediating oxidative proteomics and community genomic analysis revealed key stress and re-folding, as well as nutrient acquisition. These findings functions and how these were partitioned among community indicate that the metaproteomes of SAR11, Prochlorococcus and members (Ram et al., 2005). More recently, community genomic Synechococcus bacteria reflect adaptation to fluctuating data sets were used to identify expressed proteins from the environmental conditions where cells have to survive the damage dominant member of an AMD biofilm (Lo et al., 2007). The imposed by light and oxidative stress while competing for limited results showed genome-wide recombination patterns due to genetic nutrients (Sowell et al., 2009). exchange between closely related bacterial populations that could be The potential of metaproteomics has also been used for underlying the capacity of these microorganisms to survive in this understanding the complex relationship among microorganisms very acidic and metal-rich ecosystem. In this study the capacity to present in wastewater treatment plants (WWTP). The discriminate peptides with slight differences in composition metaproteome of a laboratory-scale activated sludge system enabled identification of sequence variants from proteomic data. optimized for enhanced biological phosphorus removal (EBPR) was Thus coupling proteomic and genomic data conveyed information first analyzed using 2D PAGE. This work identified highly both about the genome structure and the activities present in this expressed proteins, possibly from the dominant and uncultured community. It also highlighted the importance of using such strain Rhodocyclus-type polyphosphate-accumulating organism (PAO), strain-resolved community proteomics to complement culture- and established the viability of carrying out proteomics on a independent metagenomics analysis of microbial communities. complex community such as this for which cultivation is difficult The oceans, which cover more than 70% of the Earth’s (Wilmes and Bond, 2004). Subsequent work compared protein surface, constitute the largest natural habitat in the world and as expression in sludge from two EBPR systems with different levels of such are the subject of intense studies in microbial ecology. Marine phosphorus removal (Wilmes et al., 2008). This study was able to microorganisms, which are extremely diverse and play fundamental identify proteins that were highly expressed by the dominant PAO roles in global biogeochemical processes, are subjected to and revealed several proteins that could be linked to the metabolic fluctuating environments due to changes in the water conditions activities occurring in these EBPR systems. Another interesting (Thomas et al., 2007). One of the first studies using study used metaproteomics to analyze the proteins found in the metaproteomics on natural aquatic microbial assemblages in the extracellular polymeric substances (EPS) in full-scale activated Chesapeake bay established the feasibility of the approach and sludge systems (Park et al., 2008). Extraction of EPS proteins is identified several proteins that corresponded to dominant bacterial technically challenging and was therefore evaluated using three groups (Kan et al., 2005). Marine alphaproteobacteria are different cation-associated extraction methods, followed by sample ubiquitous in marine ecosystems and outstanding in their capacity fractioning and proteomic analysis. While the results showed that to persist in oligotrophic waters, an adaptive trait of biological the protein profiles were different for the various extraction importance and of great interest in marine microbiology (Sowell et methods, several sewage-derived and bacterial proteins were al., 2008; Thomas et al., 2007). A proteomic approach was used to identified, some of which were ubiquitous and therefore potentially identify proteorhodopsin proteins, light-dependent proton pumps useful as biomarkers to monitor operations. predicted to be important in terms of supplying energy for marine Advanced molecular technologies have also led to microbial metabolism, in the alphaproteobacteria SAR11 strain interesting applications in areas such as bioremediation, a biological HTCC1062 (“Pelagibacter ubique”) (Giovannoni et al., 2005). An process based on the catabolic capability of microorganisms to accurate mass and time (AMT) tag library was then generated for degrade and/or eliminate polluting materials from an ecosystem. quantitative examination of proteomic profiles of this cultured Increasing our knowledge of the microbial communities involved in strain to identify differentially expressed genes and create a key physiological processes and understanding the relationship comprehensive library of peptide AMT tags to improve further between microbial diversity and physiological routes involved in proteomic analyses of this microorganism (Sowell et al., 2008). biodegradation processes in polluted environments could enhance Subsequent metaproteomics analysis of the communities present in bioremediation processes. With this in mind, a new protein the north-western Sargasso Sea were carried out to understand the extraction procedure was developed and applied to a soil mechanisms involved in survival in these oligotrophic waters. The microcosm and a contaminated aquifer (Benndorf et al., 2007). analysis of the metaproteome in surface samples, using capillary The analysis of these metaproteomes was consistent with the liquid chromatography (LC)-tandem mass spectrometry, identified bacterial metabolic pathways expected in these ecosystems and peptides that could be mapped to proteins from the SAR11 clade, showed the potential of using this approach to identify possible followed by Prochlorococcus and Synechococcus, both of which are biomarkers indicative of biodegradation processes. In another dominant marine photosynthetic bacteria (Sowell et al., 2009). The study, proteomics was used to assess the response of a microbial 7
  8. 8. community after stress by cadmium exposure (Lacerda et al., 2007). environmental microbial communities. With the capacity to sample The analysis showed significant changes in the microbial physiology the total protein pool of a given natural population, the and the capacity to detect rapid changes within the community, metaproteomics strategy provides a unique opportunity to obtain providing evidence of toxicity and insight into mechanisms of functional information regarding natural communities and link this tolerance. information to population structure. The identification of peptide sequences, based on information of sequenced microorganisms and Challenges and future perspectives metagenomes, will improve in the years to come, offering more It can be generally argued that the analysis of proteins through precise identification of specific enzyme and putative functions and metaproteomics provides extremely useful functional information helping our understanding of the adaptations and response to regarding microbial communities, more so than metagenomics or changing conditions. It can be anticipated that environmental even metatranscriptomics (Stenuit et al., 2008). Despite its evident proteomics will prove extremely useful in several fronts. For appeal and the great methodological and technical advances in example, the identification of conserved proteins could serve as terms of extracting and analyzing proteins directly from markers for specific habitas. Proteins that change upon environmental samples, the approach is still hampered by several environmental perturbation could be used as indicators of stress on limitations. Some of the inherent limitations of the approach natural populations and ecosystems (Maron et al., 2007b). In include low protein extraction yields, difficulty in identifying addition to identification of protein biomarkers, metaproteomics peptides through database searches due to reduced coverage of can also be very useful in the field of ecotoxicology by detecting known protein sequences, and ambiguity in interpreting data in the minor changes in the proteome or metaproteome and quantifying absence of any corresponding metagenomic information. As a the effects of stressors on natural populations, communities, and consequence of the diversity of protein function and structure there ecosystems (Nesatyy and Suter, 2007). Environmental proteomics is no single universal extraction method available. This will require can also lead to the identification of known or novel biochemical both adjustments to established procedures and improvements in functions involved in complex biogeochemical processes and can the efficiency of protein extraction, especially from highly help to address the role played by the succession of populations contaminated samples. Other major challenges involve protein within an ecosystem. As techniques and databases become more separation and identification techniques (Maron et al., 2007b) and robust, the likelihood will increase of assigning phylogenetic bioinformatic capacity for analysis and management of the large affiliation and possible catalytic function to proteins from complex volumes of data generated (Nesatyy and Suter, 2007; Wilke et al., environments (Rodriguez-Valera, 2004). Finally, metaproteomics 2003). Thus improvements in sample preparation, MS techniques can complement other meta-approaches in addressing fundamental and data capture and analysis will have to be paralleled by advances questions in microbial ecology such as the relationship between in bioinformatics tools designed for both organizing and processing community structure and function and how these communities proteomics and metaproteomics data (Yang and Zhang, 2008). contribute to ecosystem dynamics and stability. Another major problem with metaproteomic studies is that assignment of peptide masses determined by MS relies on known Metagenomics and metabolomics peptide sequences in databases. Despite the increasing amount of available microbial peptide sequences, most of the proteins derived Metabolomics in short from environmental microorganisms still lack reference sequences Metabolomics, which has been defined as the study of global in databases (Schweder et al., 2008). Thus the limited number of metabolite profiles in a biological system under a given set of organisms represented in the protein and gene sequence databases conditions, is one of the most recent technologies introduced in the constrains the efficient application of cutting-edge high-throughput systems biology approach (Goodacre et al., 2004). This rapidly proteomics to environmental samples (Nesatyy and Suter, 2007). expanding area of scientific research faces many technological In addition, the high genetic variation in natural populations, as challenges in its aim to encompass one of the outermost levels of well environmental changes that affect the organisms’ responses the information flux that displays greater complexity than do the could hamper the interpretation of protein expression levels from genome, the transcriptome or the proteome. While genomics and environmental samples. Another critical aspect in the approach is proteomics study macromolecular building blocks (DNA and the reproducibility of the results. The difficulty associated with proteins, respectively), metabolomics deals with structurally and efforts at reducing the sources of variability has been made evident physicochemically diverse small-molecule metabolites (typically by the discrepancy in results obtained in different laboratories <1000 Da) (Han et al., 2008). As a consequence of this complexity, involved in the analysis of the same protein mixture (Tao, 2008). there is no single method that enables a comprehensive One additional and also very important challenge in the field will metabolomic analysis. Despite this limitation many analytical always be that of testing and validating the functional information methods can be applied to examine metabolites from different obtained. chemical classes and have provided invaluable information about In spite of the many limitations, metaproteomics still the metabolome of model microorganisms (Mashego et al., 2007). provides a powerful tool to study the functional diversity of Metabolomic analysis typically is carried out by mass spectrometry (MS), usually coupled to a separation methodology such as liquid 8
  9. 9. chromatography (LC-MS), gas chromatography (GC-MS) or substrates for that enzyme, thus overall balance of the pathway can capillary electrophoresis (CE-MS). The stand-alone nuclear be maintained. Such responses have been made evident from MCA magnetic resonance (NMR) technique has also been widely used. A studies where the perturbation of the system in response to a complete review of the methodologies used in metabolomics has mutation is measured by determining the sensitivity coefficients of been recently published (Oldiges et al., 2007). The analysis of fluxes and metabolite concentrations. These coefficients are metabolites varies depending on the aims of the research and has consistently higher for metabolites than for fluxes, demonstrating been done using three different strategies (Peric-Concha and Long, that perturbations of the system are more accurately measured 2003): i) Metabolite fingerprinting uses spectra obtained either when the metabolome is analyzed (Cascante et al., 2002). This from NMR or MS analyses to create a fingerprint of the control of the metabolism is possible because the individual metabolites that are produced by a biological system; it is not components of metabolic networks are tightly connected, ensuring quantitative and usually does not provide information about that the flux alters only slightly (Nielsen, 2003). As consequence, specific metabolites. ii) Metabolite profiling is the semi-quantitative the measurement of all the metabolites in a system comprises and analysis of a group of specific metabolites (e.g. carbohydrates or amplifies any perturbation of the levels lying upstream (proteome polyketides). iii) Metabolite target analysis is the quantitative or transcriptome) (Mendes et al., 1996; Urbanczyk-Wochniak et analysis of metabolites and is targeted to a subset of molecules that al., 2003) and as such is more sensitive to the physiological participate in a specific aspect of metabolism. responses of complex biological systems than either transcriptomics or proteomics (Kell, 2006). Metabolome of an ecosystem Metabolites are not merely the end product of gene One of the aims of the metagenomic approach is to reveal the expression but rather result from the interaction of the genome microbial gene diversity present in the ecosystem, a step that constituents with the environment. Thus investigating the full constitutes investigation at the lowest level of the genetic extent of the meta-metabolome is not possible by just inspecting information flux (metagenome) of a microbial community. This the metabolic potential encoded in the metagenome. So far, metagenome is more stable when compared with levels of metagenomics studies have inferred habitat-specific metabolic information that are further downstream, such as RNA and demands on the basis of the identification of predominant gene proteins, since it is the result of evolutionary processes over families, but experimental confirmation for complex systems members in a given population and is not as fluctuating and remains elusive because of the lack of a robust analytical transient as the transcriptome, the proteome or the metabolome methodology for deconvoluting of all the metabolites present in (Han et al., 2008). It has now become evident that the fraction of complex mixtures (Hollywood et al., 2006). In spite of the genes available from culturable microorganisms is minimal in technical challenges, current methodologies for analyzing the comparison with the global microbial gene pool present in the metabolome can contribute to our understanding of microbial environment. Commensurate with this idea, microbial community function and to the discovery of new interesting communities in natural ecosystems should be expected to harbor a bioactive metabolites. broad collection of metabolites that are synthesized in response to environmental cues. Some of these metabolites might not be Metagenomics and metabolomics for natural products prospection present in the current set of culturable microorganisms or they Microbial secondary metabolism produces a wealth of small might not have been detected due to the lack of knowledge molecules collectively known as natural products that are used in regarding specific signals required under standard laboratory natural environments for interspecies competition and conditions to elicit their production. There should therefore be a communication. These small molecules have been an important startling variety of unexplored metabolites produced in natural source of therapeutically useful agents such as antibiotics, environments, many of which might be produced by non- antifungals, immunosuppressive agents and anticancer agents culturable microorganisms in an environment-dependent manner. (Clardy and Walsh, 2004). Nearly all known natural products have For this reason, the metabolome of a microbial community (meta- been discovered by growing organisms as isolated species and metabolome) is extended to include the complete set of metabolites analyzing their extracts for small molecules. It is estimated that with formed by the whole community as a result of its interaction with this traditional strategy only 10-20% of the culturable bacterial the biotic and abiotic factors present in a given niche. In the natural product repertoire, and only 1-2% of the small molecules systems biology approach it has long been known that metabolomic potentially produced by the global microbial population have been data represent integrative information. According to the metabolic discovered (Baltz, 2006; Watve et al., 2001). Bacterial genome control theory (also known as Metabolic Control Analysis, MCA) sequencing efforts have only recently focused on Actinomycetales, (Cascante et al., 2002), small changes in the transcriptome and the one of the most prolific groups of small-molecule antimicrobial proteome have only minor effects on the overall metabolic fluxes producers. Examination of the natural product repertoire encoded but have significant effects on the concentration of metabolite in the 26 currently available Actinomycetes genomes revealed that, intermediates of the pathway. For instance, the reduction in the on average, there are two or three dozen gene clusters potentially activity of an enzyme can trigger an increase in the concentration of capable of producing a small molecule. However, only a few of these molecules have actually been identified for each of these 9
  10. 10. strains (Ikeda et al., 2003; Omura et al., 2001; Peric-Concha and backgrounds lies in their capacity to supply a variety of promoters Long, 2003). The potential for secondary metabolite production and transcriptional, regulatory and post-translational machineries revealed in these bacterial genomes suggests that the current that extend the capability to express exogenous DNA. Furthermore, strategy of analyzing isolated microbial species is insufficient for some of these strains are themselves natural products producers and exploiting their metabolic potential. In fact, most secondary therefore might already have the biosynthetic apparatus and metabolites are not produced constitutively but, quite the contrary, necessary primary precursors to support the synthesis of are encoded by “cryptic” genes that are triggered only in response to heterologous small molecules (Peric-Concha and Long, 2003). environmental cues (Peric-Concha and Long, 2003). The Despite these efforts, the frequency of detecting any given activity biosynthetic pathways of secondary metabolites are highly complex from metagenomic libraries is low and high-throughput screening and can involve gene clusters that can comprise up to 100 kb of of thousands of clones is usually required in order to obtain a small DNA sequence that encodes refined molecular machines known as number with the desired biological activity (Henne et al., 2000; polyketide synthases (PKS) and nonribosomal peptide synthetases Rondon et al., 2000a). While functional screens for antibiosis or (NRPS) (Fischbach et al., 2008). For a complete review of these enzyme action are commonplace, a broader search for novel genetic elements and their distribution throughout bacterial chemical entities in metagenomic libraries, particularly in the lineages, please see Donadio et. al (2007). absence of a biological screen, will require comprehensive assays Recent surveys of diverse environments using that directly measure the total chemical complement, or the metagenomics and other molecular approaches have increased our metabolome, of the expression host (Peric-Concha and Long, awareness regarding the extent of microbial diversity present in 2003). Carrying out a metabolomics-based screen using a various ecosystems, diversity that should also harbor a remarkable metagenomic library should theoretically meet two fundamental variety of novel and yet to be exploited natural products. There is a conditions: it has to be scalable to process thousands of clones in a discrepancy, however, between the number of identified gene high-throughput manner and it has to be sufficiently sensitive to clusters that potentially encode small molecules and the relatively detect any change produced in the metabolite profile of the host small number of these molecules that have been discovered. This cell as a consequence of harboring the environmental DNA. The discrepancy results most probably from our outdated view of implementation of such screens may reveal silent phenotypes (i.e. microorganisms as isolated entities separated from their natural functions conferred by the expression of heterologous DNA that do environments. Bacterial genomics of model culturable organisms not display evident biological activity, but that modify the overall and metagenomics of uncultured bacterial consortia present in behavior of the metabolome) of metagenomic clones that are able association with marine sponges and soil communities have to overcome the barrier of heterologous gene expression. revealed numerous gene clusters of PKS and NRPS for which no To efficiently exploit the metabolic potential of microbial molecules have been identified (Donadio et al., 2007; Ginolhac et communities, we must abandon the outdated paradigm of isolating al., 2004; Kim and Fuerst, 2006; Piel et al., 2004; Schirmer et al., microorganism or genes from their natural environment and shift 2005). The probability of these gene clusters being junk DNA in towards an eco-systems biology approach where the ecological role microbial genomes is very low since the metabolic cost of of the molecules is the principal biological question. In accordance maintaining such massive biosynthetic systems is high and the with this ecology-based approach the combination of selective pressure for maintenance must be correspondingly strong metagenomics, metatranscriptomics and meta-metabolomics is (Fischbach et al., 2008). Thus our inability to detect the strongly needed to unveil the function of secondary metabolites in corresponding molecule must be related to our poor understanding situ. Here we provide a view of how these three approaches can be of the underlying regulatory networks and to the lack of knowledge combined in order to study the natural product repertoire of regarding the environmental signals required to elicit production. microbial communities present in a given ecosystem. How can we access this extensive reservoir of natural First, metagenomics through cloning-independent products? Heterologous expression of metagenomic DNA libraries sequencing of the metagenome can determine the diversity in Escherichia coli have allowed detection of biological activities and (richness and abundance) of its members by using ribosomal DNA provided a proof of principle that transcription and translation of markers and can also provide sequence information of the entire biosynthetic pathways are possible (MacNeil et al., 2001; collection of genes contained in a population. Discovery of novel Rondon et al., 2000a, b). Nevertheless, this approach is greatly biosynthetic gene clusters is the first goal of this line of work. Based limited by the fact that most genes may not be expressed in on the catalytic rules of studied assembly line enzymes it is possible domesticated hosts since cloned genes from environmental to combine bioinformatics and knowledge-based predictions to organisms have to be compatible with the host’s genetic machinery. identify scaffolds corresponding to natural products. Furthermore In an attempt to overcome this limitation, heterologous expression predictions regarding the structure and physicochemical properties, has been successfully achieved in additional hosts such as based on the organization of genes encoding enzyme modules, can Pseudomonas, Ralstonia, Streptomyces and related actinomycete assist with the selection and tracking of products in the species (Craig et al., 2009; Martinez et al., 2004a, b; Wang et al., environment that may be interesting in the search for novel 2000). The advantage of using bacterial hosts with diverse genetic bioactivities. For instance, novel bioinformatics packages are able to 10
  11. 11. screen genes encoding type I PKS in metagenomics shotgun data incredible potential for the detection of activities and monitoring (Foerstner et al., 2008). The program package ClustScan can the dynamics of microbial communities. Microarrays, which have annotate gene clusters encoding modular biosynthetic enzymes, been used extensively for analysis of gene expression, are being including PKS, NRPS, and hybrid (PKS/NRPS) enzymes, and is adapted for use in environmental samples (Gentry et al., 2006). also able to predict some chemical structures and make inferences They have the advantage of providing rapid information on a great about domain specificities and function of the predicted small- number of genes and supplying quantification data without having molecule products (Starcevic et al., 2008). However, information to clone DNA. There have been spectacular advances in microarray based merely on gene clusters is limited and does not yet faithfully design and commercial availability, improving the coverage, density predict end product structures. This can be particularly true for and limit of detection of gene or transcript copies (Bouchie, 2002). clusters with multiple tailoring enzymes, hidden biosynthetic genes In environmental setups, microarray technology has not been as or genes for novel small molecules produced by assembly line extensively used as for genomic or transcriptomic comparisons of enzymes that operate in an unconventional way (Sattely et al., single organisms. This is due to the relatively high amounts of 2008). nucleic acids needed to detect a signal and to the complexity The prediction of the biosynthetic pathways and the underlying the design of multiple probes to target and cover an hypothetical structure of secondary metabolites is the first step uncharacterized diversity. Arrays designed for environmental towards the identification and understanding of natural products in applications therefore contain probes for detection of well-defined the ecosystem. Once a comprehensive list is made of the gene gene families of known environmental bacterial functions (Iwai et clusters found in the microbial community, a metatranscriptomic al., 2008; Taroncher-Oldenburg et al., 2003; Wu et al., 2006). Due analysis of the ecosystem can then be carried out to analyze the to the difficulty in recovering large amounts of environmental expression dynamics of the genes making up the predicted clusters. DNA, these arrays in many cases require PCR amplification of This analysis can shed light on how spatial and temporal conditions specific genes prior to hybridization, a step that can introduce influence differential expression of secondary pathways (Raes and biases. Alternatives to avoid biases associated with PCR Bork, 2008). Subsequent linking of identified gene clusters and amplification include either extraction from larger amounts of expression profiles to microbial species within an ecosystem is an sample or the amplification of genomic material using the phi29 important but difficult task that has nevertheless been achieved by polymerase (Binga et al., 2008). In the case of protein coding genes, co-cloning of a phylogenetic marker (Beja et al., 2000). Nowadays, the use of arrays can substantially increase our capacity to detect the use of single-cell isolation and sequencing technologies provide small variants within the context of a particular gene family since all promising alternatives to this seemingly daunting endeavor (Walker known possible variations can be targeted simultaneously. and Parkhill, 2008). Thus the identification of actively transcribed However, the detection of environmental mRNA is particularly gene clusters encoding small molecules uses both metagenomics cumbersome due to the low amount of single gene transcripts, and metatranscriptomic approaches and is based on bioinformatic which even for highly expressed genes can still be 100 times less tools to predict metabolite scaffold structure and reveal information when compared to the more abundant rRNAs. Various arrays have regarding physicochemical properties. Using this data the been developed for the study of microbial communities and these metabolomics approach can be maneuvered to identify a fraction of include: 1) phylogenetic arrays based on 16S rRNA, 2) community the molecules known to be expressed from gene clusters in a arrays with signature genes and 3) functional gene arrays with defined spatial and temporal environmental setting. Additional information for genes involved in metabolic pathways. information regarding hypothetical chemical properties also The most extensively used phylogenetic marker in narrows the search space in the overall metabolite profile of the microbial ecology is undoubtedly the 16S rRNA gene. This is an community. This type of identification will require specialized ideal marker for community profiling given the large amount of extraction protocols for the meta-metabolome and extremely sequence data, coupled to the intrinsic characteristics of this sensitive analytical tools in order to deconvolute the hundreds of molecule. Phylogenetic arrays have only recently begun to be used similar low-concentration metabolites found in such a complex to study microbial communities in diverse settings, with some of chemical background. Much hope is held on the application of the the first reports appearing in recent years (Loy et al., 2002) and ultrahigh-field Fourier transform ion cyclotron resonance mass further extended to include analysis of either DNA or RNA spectrometry (FTICR-MS) that has been useful to profile over 400 obtained from the environment (Adamczyk et al., 2003; El metabolites in a short period of time (Han et al., 2008). The Fantroussi et al., 2003; Gentry et al., 2006). A recently developed combination of all of these eco-systems biology approaches will high density 16S rRNA PhyloChip that targets 8741 bacterial and help us to mine and understand the metabolic potential concealed archaeal taxa has been used to compare coverage with respect to in microbial populations (Raes and Bork, 2008). clone libraries and to inspect diversity in environmental communities (DeSantis et al., 2007; Yergeau et al., 2007; Yergeau Microarrays et al., 2009). Functional arrays that contain genes involved in key Microarrays are a powerful high-throughput technique for the biogeochemical process, including a comprehensive array called simultaneous analysis of thousands of target molecules that has GeoChip, have also been developed and used for detecting activities 11
  12. 12. in microbial communities (He et al., 2007; Leigh et al., 2007; Rhee environmental microbial communities (Maron et al., 2007b; Raes et al., 2004; Steward et al., 2004; Yergeau et al., 2007). and Bork, 2008). The incorporation of additional technologies like Despite the great potential of applying microarray cell sorting and microfluidics, together with advances in isolation technology for the specific, quantitative and rapid assessment of techniques, will prove extremely useful for complementing these microbial communities, the analysis of environmental samples studies using isolates or more simplified communities. Thus represents several challenges. As occurs with other strategies, multifaceted approaches will probably become more extensively microarrays detect the most abundant organisms or molecules used when engaging in comprehensive explorations of in situ present in a given ecosystem and can therefore have problems communities. In addition to providing novel genomic and associated with low sensitivity. There are also difficulties related physiological information, these novel approaches will also prove to with recovery of genetic material due to low biomass present in the be fundamental for the search and discovery of novel bacterial sample or problems with extraction procedures. In addition, the functions for biotechnological or clinical applications. All together results can be difficult to interpret due to the large amount of array the field promises stimulating new developments that will very data generated, information which can occasionally also be likely reshape our vision of microbial interactions and communities misleading due to signals generated by cross-hybridization with in their natural settings. related sequences. Finally, and perhaps most importantly, is the fact Despite these exciting prospects, some of the inherent that microarrays rely on previously gathered information for probe difficulties associated with “omic” approaches to study whole design and will therefore miss any novel genes found in the communities, such as efficient isolation of nucleic acids and community that are not represented in the array (Gentry et al., proteins from environmental samples, still hamper progress and 2006; Wagner et al., 2007). Thus exploratory studies using thus need to be overcome for the efficient integration of various microarrays may overlook functions residing in environmental disciplines. It is anticipated, however, that the involvement of more populations that have not yet been described and which might very research groups will precipitate innovations and the capacity to likely represent a large fraction of the community (Pignatelli et al., overcome many of these difficulties, paving the way for more in- 2008). depth studies of microbial communities and diversity. One of the key concerns for the future on any “meta” and “omic” approach is Future perspectives how to handle and make sense of the vast amount of sequence data The field of microbial ecology has made substantial progress thanks that will be generated from such explorations (Chen and Pachter, to novel molecular and genomic approaches that allow estimations 2005). The use of massively parallel sequencing technologies, and explorations of the vast majority of uncultured microorganisms coupled to reduced costs, are expected to expand our capacity to in our planet. Metagenomics is now facing new challenges generate data. Therefore, the development of novel and precipitated by ongoing developments and novel tools for research sophisticated bioinformatics tools will become essential for data of complex microbial communities. As evidenced by recent reports, management and analysis of metagenomic data involving assembly, the focus of these studies has started to shift from mere descriptions identification and assignment of functions to expressed proteins of ecosystems to the generation of more comprehensive and and phylogenetic affiliation to sequence reads. Another aspect of complex datasets aimed at deriving relevant ecological information. importance in the field should involve reproducibility of results and Technological innovations, the development of more economical, functional experimental validation of sequence-derived efficient and high-throughput strategies and modifications to information, an important point that has been largely neglected in existing methodologies will most probably continue to flourish in the post-genomic era, given the experimental challenges involved. the near future. This will probably lead to increased access and The capacity to explore ecosystems at an unprecedented application of these technologies, prompting research into a depth will undoubtedly lead to improvements on our actual survey broader spectrum of environments. We will probably see “meta” of microbial diversity. The deeper resolution obtained by the new strategies being used successfully for investigating diverse microbial sequencing technologies, coupled to explorations using “omic” consortia and addressing the role of uncultured microbes in their approaches, will not only allow us to assess less abundant organisms natural settings. Tackling some of the fundamental and interesting and yield clues regarding the prevalence and distribution of questions driving research in microbial ecology will however require particular groups of organisms, but will also lead to key the integration of diverse fields of study, such as geochemistry, information about niche adaptation. One especially interesting biochemistry, and genetics, among others, and techniques that development in the last years has been the unprecedented capacity expand on the basic metagenomics strategy and move beyond of metagenomics to reveal viral diversity. Viruses, which are towards a more integrative eco-systems biology approach. Thus abundant and harbor an immense genetic diversity, affect microbial multidisciplinary teams and complementation with additional community dynamics and are therefore an integral part of “meta” approaches, such as metaproteomics, transcriptomics and microbial ecology. It is expected that in the future the application metabolomics to capture the expressed potential of microbial of “meta” approaches will broaden our view of this viral diversity populations, will surely lead to a more global and comprehensive and include analyses regarding their ecological role (Allen and picture of the evolution, complexity and functionality of Wilson, 2008). Thus as has occurred in the recent past, the 12
  13. 13. development of new technologies will open the way for more in- El Fantroussi, S., Urakawa, H., Bernhard, A.E., Kelly, J.J., Noble, P.A., Smidt, H., Yershov, G.M., and Stahl, D.A. (2003). Direct profiling of environmental depth and large-scale environmental explorations. The integration microbial populations by thermal dissociation analysis of native rRNAs of strategies and methodologies will add new dimensions to the hybridized to oligonucleotide microarrays. Appl. Environ. Microbiol. 69, study of microbial communities, expand our appreciation of 2377-2382. Felske, A., Engelen, B., Nubel, U., and Backhaus, H. (1996a). Direct ribosome microbial diversity and allow us to answer more sophisticated isolation from soil to extract bacterial rRNA for community analysis. Appl. questions regarding the role of microorganisms within a Environ. Microbiol. 62, 4162-4167. Felske, A., Engelen, B., Nubel, U., and Backhaus, H. (1996b). Direct ribosome community. These composite explorations will therefore prove to isolation from soil to extract bacterial rRNA for community analysis. Appl be pivotal in our search for a more comprehensive understanding of Environ Microbiol 62, 4162-4167. Fischbach, M.A., Walsh, C.T., and Clardy, J. (2008). The evolution of gene collectives: microbial community dynamics and function. How natural selection drives chemical innovation. Proc. Natl. Acad. Sci. U S A 105, 4601-4608. References Foerstner, K.U., Doerks, T., Creevey, C.J., Doerks, A., and Bork, P. (2008). A computational screen for type I polyketide synthases in metagenomics shotgun data. PLoS ONE 3, e3515. Adamczyk, J., Hesselsoe, M., Iversen, N., Horn, M., Lehner, A., Nielsen, P.H., Schloter, Frias-Lopez, J., Shi, Y., Tyson, G.W., Coleman, M.L., Schuster, S.C., Chisholm, S.W., and M., Roslev, P., and Wagner, M. (2003). The isotope array, a new tool that Delong, E.F. (2008). Microbial community gene expression in ocean employs substrate-mediated labeling of rRNA for determination of surface waters. Proc. Natl. Acad. Sci. U S A 105, 3805-3810. microbial community structure and function. Appl. Environ. Microbiol. 69, Gentry, T.J., Wickham, G.S., Schadt, C.W., He, Z., and Zhou, J. (2006). Microarray 6875-6887. applications in microbial ecology research. Microb. Ecol. 52, 159-175. Allen, M.J., and Wilson, W.H. (2008). Aquatic virus diversity accessed through omic Gianoulis, T.A., Raes, J., Patel, P.V., Bjornson, R., Korbel, J.O., Letunic, I., Yamada, T., techniques: a route map to function. Curr. Opin. Microbiol. 11, 226-232. Paccanaro, A., Jensen, L.J., Snyder, M., et al. (2009). Quantifying Alm, E.W., Zheng, D., and Raskin, L. (2000). The presence of humic substances and environmental adaptation of metabolic pathways in metagenomics. Proc. DNA in RNA extracts affects hybridization results. Appl. Environ. Microbiol. Natl. Acad. Sci. U S A 106, 1374-1379. 66, 4547-4554. Gilbert, J.A., Field, D., Huang, Y., Edwards, R., Li, W., Gilna, P., and Joint, I. (2008). Amann, R.I., Ludwig, W., and Schleifer, K.H. (1995). Phylogenetic identification and Detection of large numbers of novel sequences in the in situ detection of individual microbial cells without cultivation. metatranscriptomes of complex marine microbial communities. PLoS Microbiol. Rev. 59, 143-169. ONE 3, e3042. Baltz, R.H. (2006). Marcel Faber Roundtable: is our antibiotic pipeline unproductive Gilbert, J.A., Thomas, S., Cooley, N.A., Kulakova, A., Field, D., Booth, T., McGrath, J.W., because of starvation, constipation or lack of inspiration? J. Ind. Microbiol. Quinn, J.P., and Joint, I. (2009). Potential for phosphonoacetate utilization Biotechnol. 33, 507-513. by marine bacteria in temperate coastal waters. Environ. Microbiol. 11, Beja, O., Aravind, L., Koonin, E.V., Suzuki, M.T., Hadd, A., Nguyen, L.P., Jovanovich, 111-125. S.B., Gates, C.M., Feldman, R.A., Spudich, J.L., et al. (2000). Bacterial Ginolhac, A., Jarrin, C., Gillet, B., Robe, P., Pujic, P., Tuphile, K., Bertrand, H., Vogel, rhodopsin: evidence for a new type of phototrophy in the sea. Science T.M., Perriere, G., Simonet, P., et al. (2004). Phylogenetic analysis of 289, 1902-1906. polyketide synthase I domains from soil metagenomic libraries allows Benndorf, D., Balcke, G.U., Harms, H., and von Bergen, M. (2007). Functional selection of promising clones. Appl. Environ. Microbiol. 70, 5522-5527. metaproteome analysis of protein extracts from contaminated soil and Giovannoni, S.J., Bibbs, L., Cho, J.C., Stapels, M.D., Desiderio, R., Vergin, K.L., Rappe, groundwater. ISME J. 1, 224-234. M.S., Laney, S., Wilhelm, L.J., Tripp, H.J., et al. (2005). Proteorhodopsin in the Beranova-Giorgianni, S. (2003). Proteome analysis by twodimensional gel ubiquitous marine bacterium SAR11. Nature 438, 82-85. electrophoresis and mass spectrometry: strengths and limitations. Trends Gonzalez, J.M., Portillo, M.C., and Saiz-Jimenez, C. (2005). Multiple displacement Analyt. Chem. 22, 273-281. amplification as a pre-polymerase chain reaction (pre-PCR) to process Bertin, P.N., Medigue, C., and Normand, P. (2008). Advances in environmental difficult to amplify samples and low copy number sequences from natural genomics: towards an integrated view of micro-organisms and environments. Environ. Microbiol. 7, 1024-1028. ecosystems. Microbiology 154, 347-359. Goodacre, R., Vaidyanathan, S., Dunn, W.B., Harrigan, G.G., and Kell, D.B. (2004). Binga, E.K., Lasken, R.S., and Neufeld, J.D. (2008). Something from (almost) nothing: Metabolomics by numbers: acquiring and understanding global the impact of multiple displacement amplification on microbial ecology. metabolite data. Trends Biotechnol. 22, 245-252. ISME J. 2, 233-241. Graham, R.L.j, Graham, C., and McMullan, G. (2007). Microbial proteomics: a mass Bouchie, A. (2002). Shift anticipated in DNA microarray market. Nat Biotechnol 20, spectrometry primer for biologists. Microb. Cell Fact. 6, 26. 8. Graves, P.R., and Haystead, T.A. (2002). Molecular biologist's guide to proteomics. Cardenas, E., and Tiedje, J.M. (2008). New tools for discovering and characterizing Microbiol. Mol. Biol. Rev. 66, 39-63. microbial diversity. Curr. Opin. Biotechnol. 19, 544-549. Griffiths, R.I., Whiteley, A.S., O'Donnell, A.G., and Bailey, M.J. (2000). Rapid method Cascante, M., Boros, L.G., Comin-Anduix, B., de Atauri, P., Centelles, J.J., and Lee, for coextraction of DNA and RNA from natural environments for analysis P.W. (2002). Metabolic control analysis in drug discovery and disease. Nat. of ribosomal DNA- and rRNA-based microbial community composition. Biotechnol. 20, 243-249. Appl. Environ. Microbiol. 66, 5488-5491. Chen, K., and Pachter, L. (2005). Bioinformatics for whole-genome shotgun Hagenstein, M.C., and Sewald, N. (2006). Chemical tools for activity-based sequencing of microbial communities. PLoS Comput. Biol. 1, 106-112. proteomics. J. Biotechnol. 124, 56-73. Clardy, J., and Walsh, C. (2004). Lessons from natural molecules. Nature 432, 829- Han, J., Danell, R.M., Patel, J.R., Gumerov, D.R., Scarlett, C.O., Speir, J.P., Parker, C.E., 837. Rusyn, I., Zeisel, S., and Borchers, C.H. (2008). Towards high-throughput Craig, J.W., Chang, F.Y., and Brady, S.F. (2009). Natural products from environmental metabolomics using ultrahigh-field Fourier transform ion cyclotron DNA hosted in Ralstonia metallidurans. ACS Chem. Biol. 4, 23-28. resonance mass spectrometry. Metabolomics 4, 128-140. DeSantis, T.Z., Brodie, E.L., Moberg, J.P., Zubieta, I.X., Piceno, Y.M., and Andersen, He, Z., Gentry, T.J., Schadt, C.W., Wu, L., Liebich, J., Chong, S.C., Huang, Z., Wu, W., G.L. (2007). High-density universal 16S rRNA microarray analysis reveals Gu, B., Jardine, P., et al. (2007). GeoChip: a comprehensive microarray for broader diversity than typical clone library when sampling the investigating biogeochemical, ecological and environmental processes. environment. Microb. Ecol. 53, 371-383. ISME J. 1, 67-77. Domon, B., and Aebersold, R. (2006). Mass spectrometry and protein analysis. Hecker, M. (2003). A proteomic view of cell physiology of Bacillus subtilis--bringing Science 312, 212-217. the genome sequence to life. Adv. Biochem. Eng. Biotechnol. 83, 57-92. Donadio, S., Monciardini, P., and Sosio, M. (2007). Polyketide synthases and Henne, A., Schmitz, R.A., Bomeke, M., Gottschalk, G., and Daniel, R. (2000). nonribosomal peptide synthetases: the emerging view from bacterial Screening of environmental DNA libraries for the presence of genes genomics. Nat. Prod. Rep. 24, 1073-1109. conferring lipolytic activity on Escherichia coli. Appl. Environ. Microbiol. 66, Dunlap, W.C., Jaspars, M., Hranueli, D., Battershill, C.N., Peric-Concha, N., Zucko, J., 3113-3116. Wright, S.H., and Long, P.F. (2006). New methods for medicinal chemistry-- Hollywood, K., Brison, D.R., and Goodacre, R. (2006). Metabolomics: current universal gene cloning and expression systems for production of marine technologies and future trends. Proteomics 6, 4716-4723. bioactive metabolites. Curr. Med. Chem. 13, 697-710. 13

×