Three challenges for metabolomics study databases<br />Kees van Bochove  <br />June 2011Metabolomics Society Meeting<br />
Metabolomics database<br />If you search for ‘metabolomics database’ you get 400K+ results, most of them recent<br />By fa...
Outline<br />Storage of study metadata<br />How to represent the biological context of samples in the database<br />Repres...
Data Support Platform: website<br />
DSP: Open Source strategy<br />We are not the only consortium storing data! <br />Reach sustainability by working togethe...
Challenge 1: Study metadata<br />Without proper and comprehensive description of the biological context of the sample, a m...
Data levels in DSP<br />Lineastudy, code 06-E6P, inclusion criteria..<br />Femalehuman, 46 yearsold, BMI 26.4<br />5ml blo...
Studywizardallowsfor complex designs<br />
Example of a studytimeline<br />
Allowforflexiblestudydescription: ‘templates’ for metadata fields<br />
Excel importer<br />
Challenge 2: representation of metabolomics data<br />Preprocessing<br />Identification<br />Quantification<br />
How to implement preprocessing?<br />We chose not to in the end<br />Supplied mzMatch pipeline in earlier stage, but prepr...
How to implement metabolite identity?<br />Consensus at standardization workshops: InChI key to identify structure<br />No...
How to implement quantification?<br />At the moment, we store only peak area or intensity, and any Internal Standard and Q...
Imported metabolomics dataset<br />
And again – Excel import!<br />
Challenge 3: embedding of data<br />Metabolomics is often not the only performed analysis on samples<br />Important to cro...
Transcriptomics module<br />
NextGenerationSequencing module<br />
Query composer<br />
Query resultson sample level<br />
Next focus<br />We have several tools developed within NMC, such as spectral tree analysis tool<br />Reach sustainability ...
Galaxy (toolbox / visualization) <br />
Distributeddeployment of NMC DSP<br />Study owners host study metadata at own institution<br />Metabolomics labs host meta...
Conclusion<br />Many compound databases, few databases with actual study data<br />Very hard to represent LC-MS measuremen...
Acknowledgements<br />TjeerdAbma<br />Adem Bilican<br />JildauBouwman<br />Christine Chichester<br />Sudeshna Das<br />Mar...
Metabolomics Society meeting 2011 - presentatie Kees
Upcoming SlideShare
Loading in...5
×

Metabolomics Society meeting 2011 - presentatie Kees

928

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
928
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Metabolomics Society meeting 2011 - presentatie Kees

  1. 1. Three challenges for metabolomics study databases<br />Kees van Bochove <br />June 2011Metabolomics Society Meeting<br />
  2. 2. Metabolomics database<br />If you search for ‘metabolomics database’ you get 400K+ results, most of them recent<br />By far the most of these databases are compound-centric, few have real study data<br />Of the metabolomics study databases, most are GC-MS, many NMR, almost no LC-MS<br />
  3. 3. Outline<br />Storage of study metadata<br />How to represent the biological context of samples in the database<br />Representation of data preprocessing, identification and quantification<br />How to represent the assumed identities for targeted and untargeted analyses<br />How to represent quantification and internal standard samples<br />Connection to other ‘omics’ data<br /><ul><li>How to make sure that metabolomics results can be tied into genetics, transcriptomics etc.</li></li></ul><li>Netherlands Metabolomics Centre<br />Major metabolomics labs:<br />DCL (mammalian focus), at Leiden University<br />PRI (plant focus), at Wageningen University<br />TNO<br />DSM<br />Unilever<br />Data Support Platform<br />Infrastructure project to create a metabolomics data sharing platform covering all types of samples, running 2008-2012<br />
  4. 4. Data Support Platform: website<br />
  5. 5. DSP: Open Source strategy<br />We are not the only consortium storing data! <br />Reach sustainability by working together with active open source projects like dbNP and Galaxy<br />Everyone can start their own database using the same open source technology, in fact we use this strategy internally<br />
  6. 6. Challenge 1: Study metadata<br />Without proper and comprehensive description of the biological context of the sample, a metabolomics results database is useless<br />Especially for mammalian studies, study designs are often complex, involving multiple factors, timepoints, samples etc.<br />NMC strategy: partner with database initiatives from neighbor projects: NuGO (nutrigenomics), NBIC (bioinformatics), NTC (toxicogenomics) etc.: dbNP initiative http://dbnp.org<br />
  7. 7. Data levels in DSP<br />Lineastudy, code 06-E6P, inclusion criteria..<br />Femalehuman, 46 yearsold, BMI 26.4<br />5ml blood was taken at 4w after start study<br />Blood sample<br />Metabolomics LC-MS lipidomicsassay<br />{ LPC17:0: RT 1,416 Area 5469406 , … }<br />
  8. 8. Studywizardallowsfor complex designs<br />
  9. 9. Example of a studytimeline<br />
  10. 10. Allowforflexiblestudydescription: ‘templates’ for metadata fields<br />
  11. 11. Excel importer<br />
  12. 12. Challenge 2: representation of metabolomics data<br />Preprocessing<br />Identification<br />Quantification<br />
  13. 13. How to implement preprocessing?<br />We chose not to in the end<br />Supplied mzMatch pipeline in earlier stage, but preprocessing is often too intertwined with measurement SOP<br />Move from vendor specific software to general frameworks like XCMS, mzMatch, mzMine etc. would be beneficial for comparability of data, but in practice requires a lot of effort/tuning<br />
  14. 14. How to implement metabolite identity?<br />Consensus at standardization workshops: InChI key to identify structure<br />Not always clear which structure(s) a peak represents, and with untargeted metabolomics we might have no clue<br />So we store ‘features’, which are specific to measurement SOP and preprocessing SOP, and link those to metabolite identity records<br />
  15. 15. How to implement quantification?<br />At the moment, we store only peak area or intensity, and any Internal Standard and Quality Control sample data is stored along with the biological sample data<br />We expect that preprocessing / quality control is done before data import<br />Working now on adding more levels of quantification, i.e. concentration<br />
  16. 16. Imported metabolomics dataset<br />
  17. 17. And again – Excel import!<br />
  18. 18. Challenge 3: embedding of data<br />Metabolomics is often not the only performed analysis on samples<br />Important to cross-linked to other environmental and genetic data<br />Thanks to our partners, NuGO, NBIC etc. there are also modules for next generation sequencing, transcriptomics, and clinical chemistry data<br />All this data is cross-queryable<br />
  19. 19. Transcriptomics module<br />
  20. 20. NextGenerationSequencing module<br />
  21. 21. Query composer<br />
  22. 22. Query resultson sample level<br />
  23. 23. Next focus<br />We have several tools developed within NMC, such as spectral tree analysis tool<br />Reach sustainability by merging those tools in one analytical platform<br />Use existing bioinformatics open source project: Galaxy<br />Re-use existing projects from collaborators: MetaboAnalyst from Human Metabolome Project, Alberta, Canada – David Wishart<br />
  24. 24. Galaxy (toolbox / visualization) <br />
  25. 25. Distributeddeployment of NMC DSP<br />Study owners host study metadata at own institution<br />Metabolomics labs host metabolomics modules<br />Data access is governed by study owners<br />TNO<br />studies<br />DSM<br />studies<br />TNO<br />clinical<br />chemistry<br />PRI<br />studies<br />Shared<br />processing<br />&<br />evaluation<br />toolbox<br />WUR<br />transcriptomics<br />DCL<br />metabolomics<br />PRI<br />metabolomics<br />etc...<br />
  26. 26. Conclusion<br />Many compound databases, few databases with actual study data<br />Very hard to represent LC-MS measurements in a meaningful way<br />Storing study design and sample metadata is key to analysis<br />Many benefits of open collaboration, as opposed to closed-source in-house solutions<br />Test it: http://test.nmcdsp.org<br />login withusername ‘nmc’ and password ‘noordwijkerhout’<br />Suggestions/remarks to kees@thehyve.nl<br />
  27. 27. Acknowledgements<br />TjeerdAbma<br />Adem Bilican<br />JildauBouwman<br />Christine Chichester<br />Sudeshna Das<br />Marjan van Erk<br />Chris Evelo<br />PrasadGajula<br />Roeland van Ham<br />Thomas Hankemeier<br />Margriet Hendriks<br />Guido Hooiveld<br />Robert Horlings<br />Peter Horvatovich<br />Rob Hooft<br />Machiel Jansen<br />Jim Kaput<br />KostasKarasavvas<br />Bart Keijser<br />Matthew Lange<br />ScottMarshall<br />Barend Mons<br />Ben van Ommen<br />LinettePellis<br />Janneke van der Ploeg<br />MarijanaRadonjic<br />Theo Reijmers<br />Erik Roos<br />Marco Roos<br />Frans Paul Ruzius<br />JahnSaito<br />SusannaSansone<br />SiemenSikkema<br />Rob Stierum<br />Eugene van Someren<br />Morris Swertz<br />Chris Taylor<br />Michael van Vliet<br />Jeroen Wesbeek<br />KatyWolstencroft<br />Suzan Wopereis<br />Gooitzen Zwanenburg<br />

×