Rick Stevens: Prospects for a Systematic Exploration of Earths Microbial Diversity

2,133 views

Published on

Rick Stevens opening keynote for the 1st Earth Microbiome Project meeting in Shenzhen

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,133
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
46
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Rick Stevens: Prospects for a Systematic Exploration of Earths Microbial Diversity

  1. 1. Rick Stevens<br />Argonne National Laboratory<br />The University of Chicago<br />
  2. 2.
  3. 3.
  4. 4. Institute for Computing in Science (ICiS)2010 Summer Session, Snowbird, Utah<br />July 17-24|Computational Methods and Terabase Metagenomics | J. Gilbert, F. Meyer, R. Stevens<br />Participants: 13 University, 9 Government and 3 Industry; 13 sessions<br />These discussions became the first meeting of the Earth Microbiome Project and enabled the definition of a working committee, an implementation group, and a three-year plan.<br />July 24-31| Future of the Field| F. Streitz and A. White<br />Participants: 18 University, 4 Government and 6 Industry; 15 sessions<br />Steering committee members and a select group of participants met to assess the state-of-the-art in scientific computing and identified areas for future programs. <br />July 31-Aug. 7 |Optimization in Energy Systems | M. Anitescu and J. Meza<br />Participants: 16 University, 10 Government and 3 Industry; 24 sessions<br />Researchers from different areas discussed the major challenges facing the energy sector, and in particular, problems arising in optimization.<br />Aug. 7-14 |Integrating, Representing, and Reasoning over Human Knowledge| J. Evans, I. Foster, A. Rzhetsky<br />Participants: 18 University, 4 Government and 6 Industry; 16 sessions<br />Participants were encouraged to think broadly about opportunities for transformative changes in knowledge that may become possible as data, computing, and collaboration are harnessed at exceptionally large scales.<br />
  5. 5. A Core Group Emerged<br />Jack Gilbert<br />Folker Meyer<br />Rob Knight<br />Jonathan Eisen<br />Jed Fuhrman<br />Janet Jansson<br />Bin Hu<br />Mark Bailey<br />Rick Stevens<br />
  6. 6. We need a new Idea<br />Sequencing is getting cheap.. VERY cheap<br />Terabase project becoming increasingly feasible<br />Diversity studies are limited by sampling depth<br />Need combination of breadth and depth<br />Computing is scaling up to handle large data<br />Supercomputing capabilities will keep scaling for a while<br />Interest in range of metagenomics questions<br />Thousands of uncoordinated studies<br />Crowdsourcing of samples increasingly feasible<br />But how to agree on protocols<br />
  7. 7. EMP High-Level Concept<br />Goal: A community approach to systematically approach the problem of characterizing microbial life on earth<br />Strategy: combination of extremely deep metagenomics sequencing and very-large scale horizontal surveys to refine our understanding of:<br />Global microbial diversity, dispersion and biogeography<br />Microbial community structure and dynamics<br />Microbial contributions to the global nutrient cycles<br />
  8. 8. Big Science?<br />Earth Microbiome Project<br />Map % fraction of microbiological habitats<br />Volume > 100x larger<br />> 1 PB of data<br />~1M samples<br />> 100K new genomes<br />Millions of novel proteins<br />Largest reference collection of metagenomics, field guide to the microbial universe used by scientists for decades to come<br />Sloan Digital Sky<br />Mapped ¼ sky<br />Volume 100x larger<br />15 TB data<br />Position/Brightness of > 100M objects<br />Distance to 100K quasars<br />New types of objects<br />The SDSS will be a new reference point, a field guide to the universe that will be used by scientists for decades to come.<br />
  9. 9. But its not a Complete Parallel <br />EMP will have distributed sampling<br />EMP will have distributed sequencing<br />EMP will have distributed analysis<br />EMP will have common protocols<br />EMP will have common standards<br />EMP might have centralized archive of data<br />EMP might have repository of samples<br />
  10. 10. What is the EMP model? <br />A framework of standard practices that enables massively comparable meta-analyses of independent projects<br />An network oriented organizational model to advance Large-scale Microbial Ecology research – establishing and coordinating projects proposed by the community which can be advanced using the EMP framework of standards and access to partner Centers<br />
  11. 11.
  12. 12. Infrastructure for Coordination<br />
  13. 13. Common standards for:<br />Sampling -> Methods tailored to environment<br />Georeferenced metadata<br />DNA Extraction -> MoBio kit<br />Sequencing -> 515/806 for 16S, Illumina PE<br />Analysis -> QIIME (16S), MG-RAST/IMG, etc.<br />Concept: begin with defined, open (though imperfect) protocols, bless with “EMP seal of approval” new protocols that show equivalence<br />
  14. 14. Why do we need the EMP?<br />Microbial life is vast <br />1030 organisms on Earth<br />106 – 109 or more species, massive gene/protein diversity<br />Requires a systematic approach with a common framework<br />Reduce duplication, maximize coverage, improve comparability between studies<br />Structures existing studies led by different PI’s into clusters of Driving Projects<br />EMP standard protocols allow much better comparability between projects<br />Leverage community structures and crowd sourcing<br />
  15. 15. EMP Pilot Projects<br />High-Impact science targets<br />Large-scale survey projects to identify diversity hotspots and plan deeper studies<br />Small number of very deep demonstrations<br />Hypothesis driven programmatic problems<br />Technical targets to debug the EMP approach<br />Community sourcing with standard protocols<br />High-levels of multiplexed sequencing<br />Environmental parameter characterization <br />Metadata and sample database<br />Analysis pipelines<br />
  16. 16. Earth Microbiome Project: Attacking Basic Science Questions<br />Coordination of community efforts to address long standing issues in environmental microbiology<br />How much diversity is there, what is driving it and where do we find it?<br />Are there diversity hotspots?<br />Does microbial biogeography exist, if so what patterns are present and can we predict the patterns?<br />Are some taxa endemic and if so how unique are they?<br />Does global dispersal happen, how much and between where and is there support for Baas Becking hypothesis?<br />Are the long tails of community distributions covergent in taxa?<br />Are rare taxa somewhere abundant?<br />How many places do we have to look to capture X diversity?<br />How do the patterns in microbial communities relate to macro ecological patterns?<br />
  17. 17. Curtis and Sloan on Microbial Diversity<br />Perhaps patterns in global microbial diversity affect community composition, stability and functionality at a local level. <br />If, as we argue, diversity matters, then patterns in global diversity could have a substantial effect on studies that seek to link community function and structure, strategies for seeking new drugs, for probiotics, bioaugmentationor studies to determine the persistence of chemicals.<br />Curtis and Sloan, Current Opinion in Microbiology 2004, 7:221-226<br />
  18. 18. Curtis and Sloan Continued,<br />To understand a microbial system at a local level we will have to understand something of the metacommunity from which it is drawn.<br />Moreover, we will have to correctly understand the relationship between random factors and deterministic factors.<br />Curtis and Sloan, Current Opinion in Microbiology 2004, 7:221-226<br />
  19. 19. What can we learn from extremely Deep Sequencing?<br />Latitude, Ph, Mineral Content, Rainfall, Mean Temperature, Insolation, etc.<br />
  20. 20. Estimates of Global <br />Diversity<br />NT/Nmax ~ 10 for soil<br />NT/Nmax ~ 4 for aquatic<br />Curtis, T.P. et al. (2002) Estimating prokaryotic diversity and its<br />limits. Proc. Natl. Acad. Sci. U. S. A. 99, 10494–10499<br />
  21. 21. Pedros-Alio 2006 <br />Are Most microbial taxa rare? Possibly Inactive?<br />
  22. 22. Does a microbial biogeography exist?<br />If yes can we map it?<br />From Martiny et al 2006 “Microbial biogeography Review”<br />
  23. 23. How Cosmopolitan are Mircrobes?<br />From Martiny et al 2006 “Microbial biogeography Review”<br />
  24. 24. Earth Microbiome Project: Attacking Programmatic Questions<br />Improve understanding of microbial processes underlying the global carbon and nitrogen cycles<br />Support process models development and uncertainty analysis for DOE mission critical environments (e.g. permafrost, oceans, subsurface)<br />Discovery of novel microbial medicated global carbon pathways <br />Improve our understanding of community structure/diversity/productivity/stability relationships<br />Support community engineering and community design for targeting applications<br />Search for novel biological functions relevant to bioprocessing, biofuels and bioremediation<br />Targeting searching for organisms and communities containing DOE relevant to synthesis and degradation pathways<br />Novel pathway discovery<br />
  25. 25. The abundance of prokaryotic carbon and other elements may be compared with the statement of Kluyverthat about one-half of the ‘‘living protoplasm’’ on earth is microbial (2). <br />Because most of the plant biomass is made up of extracellular material such as cell walls and structural polymers, the protoplasmic biomass of prokaryotes probably far exceeds that of plants, and Kluyver’s well-accepted estimate is probably much too conservative.<br />
  26. 26. Integrating Microbial Processes into Global Climate Models <br />
  27. 27. Relative Metabolic Flux – Community Level Prediction<br /><ul><li> Predicting the metabolome from metagenomics data!
  28. 28. RMF returns a list of metabolites and whether those metabolites are more or less likely to be consumed or synthesized in one environment relative to another.
  29. 29. When linked to Model-SEED – provides information relevant for ecologists</li></li></ul><li>Integrating Microbial Metabolism into Soil Ecology Models<br />Combining physiochemical descriptions of soil content and structure with microbial models in agent-based simulations<br />Metagenomic data collection<br />Air and water<br />CO2 and organic matter<br />Collecting<br />samples<br />Integrating these models into a flux balance community model:<br />Sequencing<br />Soil nutrients<br />Sequence fragments<br />Forming flux balance models of individual microbe metabolism<br />Biomass<br />Associating fragments to taxonomical groups<br />Assembly of most prevalent microbes into complete genomes<br />
  30. 30. Unmapped World of Microbial Uses of Metals<br />
  31. 31. EMP at the Right Time<br />Leverages the availability of continued advances in sequencing capacity<br />Terabases to Petabases and beyond<br />Evolution of sequencing center Models<br />Push towards aggregation of projects (i.e. scale up)<br />Community driven but coordinated<br />Open, Real-time coordination, immediate data availability<br />Novel approaches to address the scaling issues in sample collection and prep<br />Crowd sourced samples, distributed prep?<br />Targets both wide survey and deep sampling<br />“Mapping”  followed by targeted attacks<br />
  32. 32. EMP Products and Deliverables<br />Metagenomics datasets from many thousands of environments with standardized metadata<br />Georeferenced inventory of global microbial 16s sequences<br />Reference genomes recovered from the shotgun metagenomics datasets<br />Community structure profiles for many thousands of communities<br />Microbial protein catalog capturing global protein and gene diversity<br />
  33. 33.
  34. 34. <ul><li> Explore fundamental principles governing the distribution of global diversity
  35. 35. Projects explore environmental gradients:
  36. 36. Temperature – Antarctica, Brazil, North America, Arctic Tundra, Hydrothermal vents
  37. 37. Light availability – Water columns in the Pacific and Atlantic from surface to the abyssal plain.
  38. 38. pH – UK, North America, China biogeographic soils
  39. 39. Nutrients and O2 - Temperate Bog Lakes</li></li></ul><li><ul><li> Determining whether everything has the potential to be everywhere.
  40. 40. Projects request deep 16S rRNA sequencing of representative samples:
  41. 41. Globally distributed soil samples from China, Australia, India, Argentina, Peru, USA and Antarctica
  42. 42. Globally distributed time series samples from English Channel, Barrier Reef in Australia, Bermudan North Atlantic, Temperate Pacific and Tropical Pacific
  43. 43. Zoo-animal microbiota from China, Chicago and San Diego</li></li></ul><li><ul><li> Identify and model the role of microbial communities in carbon partitioning in different ecosystems.
  44. 44. Projects using deep shotgun metagenomics to explore modeled metabolomics:
  45. 45. Temporal and spatial distributed samples from the gulf oil spill
  46. 46. Samples spanning the northern tundra belt from Canada, USA, Russia, Sweden
  47. 47. Water column and time-series samples from coastal and open ocean marine observatories</li></li></ul><li>
  48. 48. EMP Open Standards<br />DP1<br />DP2<br />DP3<br />DP4<br /><ul><li>Multiple layers
  49. 49. At the bottom individual or consortium led hypothesis driven proposals
  50. 50. Individual projects cluster into proposed Driver Projects (DPs)
  51. 51. EMP standard protocols enable comparability across projects</li></li></ul><li>What Does EMP Need?<br />EMP <br />Community<br />Sampling<br />Downstream<br />Applications<br />Sample<br />Preparation<br />Sequencing<br />Quality<br />Assurance<br />Three rate limiters<br />Sample collection and handling<br />Prep-Sequencing-QA<br />Analysis<br />
  52. 52. Earth Microbiome Project Potential Dataflows<br />Annotation &<br />Statistical Analysis<br />16S/18S rRNA<br />Metagenomics<br />Metatranscriptomics<br />Genome Assembly<br />modelSEED & <br />RMF<br />Environmental Parameters<br />Metagenome Datasets<br />(1,000’s of Campaigns)<br />Provision of targets for novel enzymes<br />Model Metabolome<br />Characterization of Novel Proteins<br />Metametabolomics<br />GC/MS & NMR<br />Gap-filling for model<br />
  53. 53. EMP needs new kinds of interfaces toSequencingworkflows<br />Large-scale community projects will by necessity develop internal tracking systems<br />Sampling, LIMS etc.<br />Transacting with Seq Centers could be enhanced by interfacing between the internal/external tracking and LIMS systems<br />Large-scaleEMPpilots could help develop this<br />Services partners will also need this type of interfaces<br />
  54. 54. What would change this strategy?<br />Availability of “direct” interrogation of complex microbial environments<br />Geochemical environmental mapping (nm->um)<br />Environmental metabolomics and proteomics<br />Roving cellular scale reporters and probes<br />Dramatic improvements in microbial microcosm experimental capabilities<br />Artificial community construction<br />Time dependent high-resolution measurements<br />
  55. 55. Phases of EMP<br />Timeline<br />2011<br />Expert-Group consensus on EMP standards: sampling, extraction, sequencing, informatics<br />Building the Global Environmental Sample Database (GESD)<br />Pilot Project:<br />10,000 samples acquired, extracted, sequenced and analyzed by five core centers (ANL, LBNL, UC-Boulder, JGI, and BGI).<br />2012 and beyond- Ongoing EMP: <br />Biological Driver Projects “collect” individual science driven sequencing proposals (e.g. JGI-CSP, BGI, ANL, etc.)<br />EMP acts as a conceptual framework to allow comparative analysis within and between Driver Projects.<br />
  56. 56. Thanks to the EMP Leadership<br />Jack Gilbert<br />Folker Meyer<br />Rob Knight<br />Jonathan Eisen<br />Jed Fuhrman<br />Janet Jansson<br />Bin Hu<br />Mark Bailey<br />
  57. 57. Argonne National Laboratory Institute for Genomic and Systems Biology<br />

×