2011Field talk at iEVOBIO 2011


Published on

A keynote talk at iEVOBIO 2011 meeting - http://ievobio.org/. Has been a great meeting.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2011Field talk at iEVOBIO 2011

  1. 1. The role of grass-roots data sharing communities, standards and megasequencing projects in the genomics revolution Dawn Field NERC Centre for Ecology and Hydrology  
  2. 2. Opportunities and Challenges <ul><li>The era of genomics is just beginning... </li></ul><ul><li>...how will we cope with the data? </li></ul><ul><li>...how will we gain the most knowledge from this investment in data? </li></ul>
  3. 3. PARADIGM SHIFT Nikos Kyrpides 1960-1990 16S RNA 1990-2010 Genomes 2010-2020 Pangenomes
  4. 4. GREAT CHALLENGES P. Chain et al. Science, 2009 Nikos Kyrpides 1995-2009 2010-2015 Finished 1000 3000 Draft 1000 10000
  5. 6. Nikos Kyrpides Culturable Unculturable
  6. 7. The trend is now increasingly geared towards ever more ambitious megasequencing projects...
  7. 9. And democratization of access to sequencing power... Just one example....
  8. 10. (~80) 41 metagenomes “ Global Ocean Survey” Sanger sequencing (Rusch et al, 2007) Metagenomics: Putting data generating capacity into perspective with an example from Bergen (1) 1 metagenome Sargasso Sea Sanger sequencing (Venter et al, 2005) (~120) 4 metagenomes & 4 metatranscriptomes Bergen mesocosm experiment Pyrosequencing (Gilbert et al, 2008) Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I. (2008) Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS ONE. Aug 22;3(8):e3042.
  9. 11. The Bergen ocean acidification study produced 19% of the reads produced in the GOS study and 5% of the total basepairs of sequence. Further evidence for the “Unknown Genome” and the Dark Matter of the Tree of Life
  10. 12. The Data - Flood - Tsunami - Deluge ?
  11. 13. the data bonanza
  12. 14. To exploit fully the promise of these data we need both scientific innovation and community agreement on how to provide appropriate stewardship of these resources for the benefit of all.   Requires the evolution of our scientific, technological and sociological thinking....
  13. 15. SuperMarket The Genome Catalogue
  14. 16. DataMarket Norman Morrison
  15. 17. Packaging data
  16. 18. Labels for data <phenotype> <environmental context>
  17. 19. <ul><li>Principles : </li></ul><ul><li>Not everything should be ‘standardized’ </li></ul><ul><li>Aggregation of data, information, and knowledge requires </li></ul><ul><li>standard ways of doing things </li></ul><ul><li>Standards provide foundations; Standards should drive innovation </li></ul><ul><li>(think of electrical plugs or the internet) </li></ul><ul><li>Pick the right concepts to standardize – at the right time, with </li></ul><ul><li>the right people </li></ul><ul><li>Requires good ‘group think’ – or ‘systems thinking’ </li></ul>standards
  18. 20. <ul><li>Community-driven solutions: </li></ul><ul><li>The Common Path: </li></ul><ul><ul><li>Identify the problem </li></ul></ul><ul><ul><li>Define a community to address it </li></ul></ul><ul><ul><li>Define scope of the solution </li></ul></ul><ul><ul><li>Implement solution </li></ul></ul><ul><ul><li>Gain adoption of solution </li></ul></ul>
  19. 21. The Genomic Standards Consortium GSC 10 Argonne, 2010 GSC 11, Hinxton, 2010 Innovation through Collaboration GSC 12 Bremen, 2011 GSC 13 BGI 2012
  20. 22. The GSC’s Mission <ul><li>the implementation of new genomic standards </li></ul><ul><li>methods of capturing and exchanging metadata </li></ul><ul><li>harmonization of metadata collection and analysis efforts across the wider genomics community </li></ul>
  21. 23. The GSC fulfills its mission by <ul><li>Organizing meetings </li></ul><ul><li>Forming working groups </li></ul><ul><li>Creating Consensus Products </li></ul>
  22. 24. Pelin Yilmaz et al 2011
  23. 26. Use of MIGS/MIMS/MIENS <ul><li>Please provide this minimum information when you publish </li></ul><ul><ul><ul><li>a genome </li></ul></ul></ul><ul><ul><ul><li>a metagenome </li></ul></ul></ul><ul><ul><ul><li>a gene marker study (i.e. ribosomal genes) </li></ul></ul></ul><ul><li>Genbank, EMBL and DDBJ now accept this information and encourage its submission to their public DNA databases </li></ul>
  24. 27. Labels for data <MIGS> <MIMS>
  25. 29. Goal: International effort to sequence a reference genome for every cultured Archaeal and Bacterial organism (~9,000 microbes ) The Microbial Earth Project Phase I: Sequence one representative from every characterized microbial type species GEBA HMP
  26. 30. Source: Jack A. Gilbert Argonne National Labs http://earthmicrobiome.org
  27. 31. Field et al unpublished work on a Metadata Coverage Index (MCI) MCI > 50
  28. 32. GSC 5 at the EBI 2008
  29. 35. J Bacteriology PNAS Nature Science SIGS PLoS ONE Genome Research PLoS Genetics Nat Biotech BMC Genomics Total genome publications (1995 - 2011) Top ten journals publishing genome reports Total 1160 Genome publications in 60 peer reviewed publications Source - GenomesOnline Database May 28, 2011
  30. 36. Incentives for compliance
  31. 37. MIGS compliant marine phage genomes
  32. 38. GSC 9 at the JCVI – April 2010
  33. 40. Darwin Core GSC MIxS Peter Dawyndt Darwin core vs GSC MixS standard
  34. 41. Darwin Core GSC MIxS standard Taxon Identification Occurrence IPR related info Event Location GeologicalContext SamplingProtocol EnvironmentalConditions Darwin core vs GSC MixS standard Peter Dawyndt
  35. 42. Preliminary (first) conclusions <ul><li>DC & GSC checklist more complementary than overlapping </li></ul><ul><ul><li>how can we make these standards completely orthogonal? </li></ul></ul>
  36. 44. <ul><li>http://gensc.org </li></ul>More Information about the GSC...
  37. 45. Feast of the Mind
  38. 46. Labels for data <soil> <water>
  39. 47. http://environmentontology.org Member of OBO Foundry http://obofoundry.org
  40. 48. <ul><li>1) Pick terms </li></ul><ul><li>2) View hits </li></ul><ul><li>3) Browse </li></ul><ul><li>4) Follow links to primary data </li></ul>– building on ontologies Users : http://ontogrator.org Morrison et al, 2011 SIGS
  41. 49. <ul><li>Ontogrator approach depends on quality of </li></ul><ul><li>Data Resources </li></ul><ul><li>Knowledge Organization Systems (KOS) used </li></ul><ul><li>Can we use this approach to improve both? </li></ul><ul><li>Can we complete the virtuous cycle? </li></ul>
  42. 51. Field, et al 2009. Science . 326:234-236.  http://biosharing.org
  43. 53. Conclusions <ul><li>The era of genomics is just beginning… </li></ul><ul><li>Self-organization by the scientific community can pay dividends (i.e. consensus building, large-scale co-ordination) </li></ul><ul><ul><li>Standards are keys to unlocking data </li></ul></ul><ul><ul><li>Group thinking overcomes the tragedy of the commons </li></ul></ul><ul><li>Emerging key players from the molecular domain – “one stop shops” </li></ul><ul><ul><li>Genomic Standards Consortium </li></ul></ul><ul><ul><li>BioSharing – driving cross-community collaborations </li></ul></ul>
  44. 54. Feast of the Mind
  45. 55. Future <ul><li>Analysis – proof sharing is beneficial </li></ul><ul><li>Making the field of data sharing more quantitative </li></ul><ul><ul><li>Objective measures of consensus </li></ul></ul><ul><ul><li>Useful Metrics: i.e. Metadata coverage index (MCI) </li></ul></ul><ul><ul><li>Modelling – i.e. how to best incentivize data sharing? </li></ul></ul><ul><li>Further shared concepts </li></ul><ul><ul><li>Minimum Information about a Sampling Site (MISS) </li></ul></ul><ul><ul><li>Minimum Data Policy </li></ul></ul><ul><ul><li>PubData? </li></ul></ul>
  46. 56. Acknowledgements <ul><li>Bergen and L4 metagenomics </li></ul><ul><li>Jack Gilbert Sue Huse </li></ul><ul><li>Ian Joint Paul Swift </li></ul><ul><li>Paul Somerfield Rob Knight </li></ul><ul><li>NEBC </li></ul><ul><li>Bela Tiwari </li></ul><ul><li>Tim Booth </li></ul><ul><li>Mesude Bicak </li></ul><ul><li>CEH </li></ul><ul><li>Norman Morrison </li></ul><ul><li>Dave Hancock </li></ul><ul><li>University of Manchester </li></ul><ul><li>Henning Hermjakob </li></ul><ul><li>Chris Taylor </li></ul><ul><li>European Bioinformatics Institute </li></ul><ul><li>Susanna Sansone </li></ul><ul><li>Philippe Rocca-Serra </li></ul><ul><li>Eamonn Maguire </li></ul><ul><li>Oxford University </li></ul><ul><li>Genomic Standards Consortium </li></ul><ul><li>Peter Sterk </li></ul>
  47. 57. Acknowledgements Coordination, workshops, working groups, infrastructure and exchange visits Additional workshop funds Local Hosts of GSC workshops Sponsors of GSC 9 and GSC 10 GSC Funding RCN4GSC