MOLGENIS  and the eXtensible Genotype And Phenotype database project (xgap) Morris A. Swertz et al DAM & BOSC sigs Stockho...
Outline <ul><li>MOLGENIS database generator </li></ul><ul><ul><ul><li>Free toolbox of automated best practices auot-genera...
MOLGENIS, why and how biologist biologist biological challenges inbreed 100 10.000 1,000,000 100,000 10,000 10 10,000,00 Q...
MOLGENIS, why and how biologist biologist biological challenges suitable infrastructure inbreed 100 10.000 1,000,000 100,0...
MOLGENIS, why and how biologist biologist Reinventing wheels, Wasting time Hard to integrate biological challenges suitabl...
Alternative strategy http://www.molgenis.org Swertz & Jansen (2007)  Nature Reviews Genetics  8, 235-243 http://www.molgen...
MOLGENIS, why and how Platform and generators Little language <!-- entity organization --> <entity   name= &quot;Experimen...
Upgrade to new research Platform and generators Little language <!-- entity organization --> <entity   name= &quot;Experim...
Upgrade to new software tools Platform and software generators Little language <!-- entity organization --> <entity   name...
Demo
Step 1: model* m.a.swertz@rug.nl /  individuals expressions probes *Can also extract automatically from an existing database
m.a.swertz@rug.nl /
Step 2: generate Download and customize... Model file XML Generate APIs in Java, R, Web services and HTTP MyScript Plugins...
Step 3: use result <ul><li>Lets see </li></ul>
eXtensible Genotype And Phenotype database  for QTL and GWAS experiments
<ul><li>Locus Specific database </li></ul><ul><li>Clin. Trial metabase </li></ul><ul><li>NextGen sequencing </li></ul><ul>...
XGAP - DAM Challenges <ul><li>Challenges: </li></ul><ul><li>Share  data between QTL collaborators </li></ul><ul><li>Variet...
First objective m.a.swertz@rug.nl /  researcher researcher annotations Raw and processed data database my GaP
<ul><li>Genotype data </li></ul>1. Data model M A R K E R S Subjects: STRAINS DATA ELEMENTS T r a i t s: TRAIT    SUBJECT...
1. Data model <ul><li>What about QTL data? </li></ul>/ P R O B E S Traits: MARKERS T r a i t s:
1. Model /  DATA  (matrix) TRAIT SUBJECT DATA  ELEMENT TRAIT    SUBJECT
1. Model /  DATA  ELEMENT TRAIT SUBJECT DATA  ELEMENT <ul><li>Annotations: </li></ul><ul><li>Individual, </li></ul><ul><li...
Extensions for new experiments DATA  ELEMENT TRAIT SUBJECT <ul><li>PROBE </li></ul><ul><li>Name </li></ul><ul><li>Gene </l...
Protocol graph from FuGE <ul><li>XGAP extends on standard FuGE  (Jones ea, NatBiot 2007) </li></ul>/  FuGE: Jones et al  N...
UML: XGAP extends FuGE m.a.swertz@rug.nl /  Uniform core to ease sharing of data and tools Various traits for new research...
2. Mode, run MOLGENIS
Connect to R statistics Workflow ready web-services UML documentation of your model Edit & trace your data Import/export t...
Proof of the pudding
Ongoing work
Next step: add processing Sheets thanks to Joeri van der Velde and Danny Arends Generalize for all MOLGENIS instances: (1)...
Next step: semantics <ul><li>The Pheno-OM project </li></ul><ul><li>Integrating mouse and man </li></ul>Generalize for all...
<ul><li>Exploit standard generated interfaces </li></ul><ul><li>Big distribute big data and tools </li></ul><ul><li>Meta a...
Acknowledgements <ul><li>Joeri van der Velde </li></ul><ul><li>Joris Lops </li></ul><ul><li>Tomasz Adamusiak </li></ul><ul...
<ul><li>See us at the NBIC booth </li></ul><ul><li>Add generator targets </li></ul><ul><ul><ul><li>We have funding for </l...
Upcoming SlideShare
Loading in …5
×

Swertz Molgenis Bosc2009

822
-1

Published on

Published in: Technology, Travel
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
822
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • International data standards: adopt them but preserve flexibility Flexible (standard) data models: ease extension ‘beyond’ today for new research Dynamic software infrastructure: how to ensure a long life (30 yrs!)
  • Swertz Molgenis Bosc2009

    1. 1. MOLGENIS and the eXtensible Genotype And Phenotype database project (xgap) Morris A. Swertz et al DAM & BOSC sigs Stockholm, June 27 2009 EBI Biobanking platform
    2. 2. Outline <ul><li>MOLGENIS database generator </li></ul><ul><ul><ul><li>Free toolbox of automated best practices auot-generate useful data apps (sql,java,R,soap) from simple models </li></ul></ul></ul><ul><ul><ul><li>As open platform that harmonizes data syntax, programming interfaces, user interaction, pluggable </li></ul></ul></ul><ul><li>Demo </li></ul><ul><li>eXtensible Genotype And Phenotype (xgap) </li></ul><ul><ul><ul><li>To store various high-throughput genotype and phenotypes harmonized </li></ul></ul></ul><ul><ul><ul><li>as platform for collaboration and analysis tools </li></ul></ul></ul><ul><li>Current work </li></ul>
    3. 3. MOLGENIS, why and how biologist biologist biological challenges inbreed 100 10.000 1,000,000 100,000 10,000 10 10,000,00 QTL profiles network correlate genome strains individuals markers expressions preprocess probes microarrays 100 hybridize 100,000 genotype genotypes norm exprs. map
    4. 4. MOLGENIS, why and how biologist biologist biological challenges suitable infrastructure inbreed 100 10.000 1,000,000 100,000 10,000 10 10,000,00 QTL profiles network correlate genome strains individuals markers expressions preprocess probes microarrays 100 hybridize 100,000 genotype genotypes norm exprs. map bioinformatician softw engineers €
    5. 5. MOLGENIS, why and how biologist biologist Reinventing wheels, Wasting time Hard to integrate biological challenges suitable infrastructure bioinformatician softw engineers inbreed 100 100.000 10,000,000 1000 10,000 10 1000 genotype individuals mass peaks genotypes QTL profiles strains network SNP arrays correlate LC/MS genome map preprocess aligned peaks €
    6. 6. Alternative strategy http://www.molgenis.org Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243 http://www.molgenis.org
    7. 7. MOLGENIS, why and how Platform and generators Little language <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot;ExperimentID(autonum)&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; Blueprint model <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot; ExperimentID(autonum )&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; + bioinformatician softw engineer inbreed 100 10.000 1,000,000 100,000 10,000 10 10,000,00 QTL profiles network correlate genome strains individuals markers expressions preprocess probes microarrays 100 hybridize 100,000 genotype genotypes norm exprs. 10.000 map biologist biologist http://www.molgenis.org Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243
    8. 8. Upgrade to new research Platform and generators Little language <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot;ExperimentID(autonum)&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; Blueprint model <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot;ExperimentID(autonum)&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; + bioinformatician softw engineer biologist biologist New Biology New Biology http://www.molgenis.org Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243 inbreed 100 100.000 10,000,000 1000 10,000 10 1000 genotype individuals mass peaks genotypes QTL profiles strains network SNP arrays correlate LC/MS genome map preprocess aligned peaks
    9. 9. Upgrade to new software tools Platform and software generators Little language <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot;ExperimentID(autonum)&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; Blueprint model <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot;ExperimentID(autonum)&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; + bioinformatician softw engineer biologist biologist inbreed 100 100.000 10,000,000 1000 10,000 10 1000 genotype individuals mass peaks genotypes QTL profiles strains network SNP arrays correlate LC/MS genome map preprocess aligned peaks http://www.molgenis.org Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243
    10. 10. Demo
    11. 11. Step 1: model* m.a.swertz@rug.nl / individuals expressions probes *Can also extract automatically from an existing database
    12. 12. m.a.swertz@rug.nl /
    13. 13. Step 2: generate Download and customize... Model file XML Generate APIs in Java, R, Web services and HTTP MyScript Plugins FormGen MenuGen TreeGen PluginGen MatrixGen JTypeGen JDBCMapGen JListGen JReadCsvGen HSQLGen JDatabaseGen MySQLGen    RMatrixGen WSGen RListGen data infrastructure user interaction infrastructure Communication infrastructure
    14. 14. Step 3: use result <ul><li>Lets see </li></ul>
    15. 15. eXtensible Genotype And Phenotype database for QTL and GWAS experiments
    16. 16. <ul><li>Locus Specific database </li></ul><ul><li>Clin. Trial metabase </li></ul><ul><li>NextGen sequencing </li></ul><ul><li>Proteo/Metabolomics </li></ul><ul><li>Animal Observations </li></ul>Example projects
    17. 17. XGAP - DAM Challenges <ul><li>Challenges: </li></ul><ul><li>Share data between QTL collaborators </li></ul><ul><li>Variety of species/methods </li></ul><ul><li>Reuse the ad-hoc analysis protocols </li></ul><ul><li>Aim: </li></ul><ul><li>Simple common data model and format </li></ul><ul><li>Common interaction layers (R, SOAP) </li></ul><ul><li>Platform for reusable protocols/tools </li></ul><ul><li>Reuse between individual projects </li></ul>inbreed 100 10.000 1,000,000 100,000 10,000 10 10,000,00 QTL profiles network correlate genome strains individuals markers expressions preprocess probes microarrays 100 hybridize 100,000 genotype genotypes norm exprs. 10.000 map Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information Associated data files process material 10,000
    18. 18. First objective m.a.swertz@rug.nl / researcher researcher annotations Raw and processed data database my GaP
    19. 19. <ul><li>Genotype data </li></ul>1. Data model M A R K E R S Subjects: STRAINS DATA ELEMENTS T r a i t s: TRAIT  SUBJECT Looking at standards and existing data sets Simple enough for everybody to create
    20. 20. 1. Data model <ul><li>What about QTL data? </li></ul>/ P R O B E S Traits: MARKERS T r a i t s:
    21. 21. 1. Model / DATA (matrix) TRAIT SUBJECT DATA ELEMENT TRAIT  SUBJECT
    22. 22. 1. Model / DATA ELEMENT TRAIT SUBJECT DATA ELEMENT <ul><li>Annotations: </li></ul><ul><li>Individual, </li></ul><ul><li>Strain, </li></ul><ul><li>Sample, </li></ul><ul><li>… </li></ul><ul><li>Annotations </li></ul><ul><li>Phenotype </li></ul><ul><li>Probe </li></ul><ul><li>Marker </li></ul><ul><li>Mass Peak </li></ul><ul><li>… </li></ul><ul><li>Data: </li></ul><ul><li>PhenotypeValues </li></ul><ul><li>Raw </li></ul><ul><li>QTLs </li></ul><ul><li>other </li></ul>columns rows dimension ELEMENT
    23. 23. Extensions for new experiments DATA ELEMENT TRAIT SUBJECT <ul><li>PROBE </li></ul><ul><li>Name </li></ul><ul><li>Gene </li></ul><ul><li>Chromosme </li></ul><ul><li>Locus </li></ul><ul><li>MARKER </li></ul><ul><li>Name </li></ul><ul><li>Allele </li></ul><ul><li>Chromosme </li></ul><ul><li>Locus </li></ul><ul><li>MASSPEAK </li></ul><ul><li>Name </li></ul><ul><li>MZ </li></ul><ul><li>RetentionTime </li></ul><ul><li>Panel </li></ul><ul><li>Name </li></ul><ul><li>Type: CSS, RIL.. </li></ul><ul><li>Parent Panels </li></ul><ul><li>INDIVIDUAL </li></ul><ul><li>Name </li></ul><ul><li>Strain </li></ul><ul><li>Mother </li></ul><ul><li>Father </li></ul><ul><li>Sex </li></ul><ul><li>SAMPLE </li></ul><ul><li>Name </li></ul><ul><li>Individual </li></ul><ul><li>Tissue </li></ul>And so on … And so on … columns rows dimension ELEMENT
    24. 24. Protocol graph from FuGE <ul><li>XGAP extends on standard FuGE (Jones ea, NatBiot 2007) </li></ul>/ FuGE: Jones et al Nature Biotech 25, 1127-1133 DATA DATA Genotype data QTL data QTL Mapping Affy Array SNP Array DATA Expression data Mapping Protocol Illumina R Software Illumina Protocol Affy M430 Protocol Bead Studio DATA application Protocol Software Equipment Bioconductor Norm. Affy M430 platform DATA DATA DATA FuGE:
    25. 25. UML: XGAP extends FuGE m.a.swertz@rug.nl / Uniform core to ease sharing of data and tools Various traits for new research Various subjects for new research ? ?
    26. 26. 2. Mode, run MOLGENIS
    27. 27. Connect to R statistics Workflow ready web-services UML documentation of your model Edit & trace your data Import/export to Excel plugin your own scripts (R/QTL) Tech keywords: object oriented data models, multi-platform java, tomcat/glassfish web server, mysql/postgresql database, Eclipse/Netbeans IDE, Java API, WSDL/SOAP API, R-project API, MVC, freemarker templates and css for custom layout, open source.
    28. 28. Proof of the pudding
    29. 29. Ongoing work
    30. 30. Next step: add processing Sheets thanks to Joeri van der Velde and Danny Arends Generalize for all MOLGENIS instances: (1) Extend MOLGENIS model for tool integration <tool name=“rqtl”> <input name=“data” entity=“data”/> … </tool> (2) Integrate workflow definition and execution Extending on Taverna/Galaxy model & APIs…
    31. 31. Next step: semantics <ul><li>The Pheno-OM project </li></ul><ul><li>Integrating mouse and man </li></ul>Generalize for all MOLGENIS instances: Next: Add MOLGENIS components to integrate: (1) Ontology browsing Extending on BioPortal/OLS frameworks? (2) Semantic integration layer ???
    32. 32. <ul><li>Exploit standard generated interfaces </li></ul><ul><li>Big distribute big data and tools </li></ul><ul><li>Meta analysis </li></ul>Federation? Cloud computing?
    33. 33. Acknowledgements <ul><li>Joeri van der Velde </li></ul><ul><li>Joris Lops </li></ul><ul><li>Tomasz Adamusiak </li></ul><ul><li>Danny Arends </li></ul><ul><li>Martijn Dijkstra </li></ul><ul><li>Matthijs Kattenberg </li></ul><ul><li>Tjeerd Abma </li></ul><ul><li>Ate Boerema </li></ul><ul><li>Henrikki Almusa </li></ul><ul><li>Rudi Alberts </li></ul><ul><li>Damian Smedley </li></ul><ul><li>Katy Wolstencroft </li></ul><ul><li>Andrew R. Jones </li></ul><ul><li>Bruno M. Tesson </li></ul><ul><li>Richard A. Scheltema </li></ul><ul><li>Gonzalo Vera Rodriguez </li></ul><ul><li>Rene Oostergo </li></ul><ul><li>Helen E. Parkinson </li></ul><ul><li>Ritsert C. Jansen </li></ul><ul><li>Cisca Wijmenga </li></ul><ul><li>Carole Goble </li></ul><ul><li>Marco Roos </li></ul><ul><li>M. Scott Marshall </li></ul><ul><li>Paul Schofield </li></ul><ul><li>John M. Hancock </li></ul><ul><li>Juha Muilu </li></ul><ul><li>Klaus Schughart </li></ul><ul><li>Engbert O. de Brock </li></ul><ul><li>Hans Hillege </li></ul><ul><li>the LifeLines consortium </li></ul><ul><li>the Trial Coordination Center </li></ul><ul><li>the GEN2PHEN consortium </li></ul><ul><li>the CASIMIR consortium </li></ul><ul><li>the NBIC/BioAssist consortium </li></ul>
    34. 34. <ul><li>See us at the NBIC booth </li></ul><ul><li>Add generator targets </li></ul><ul><ul><ul><li>We have funding for </li></ul></ul></ul><ul><ul><ul><li>several positions (PhD,SE) </li></ul></ul></ul><ul><li>Read more: </li></ul><ul><ul><ul><li>MOLGENIS for data integration: </li></ul></ul></ul><ul><ul><ul><li>Smedley et al 2009, Brief. in Bioinformatics 9(6):532 </li></ul></ul></ul><ul><ul><ul><li>Review of MOLGENIS type of systems (dated) </li></ul></ul></ul><ul><ul><ul><li>Swertz & Jansen 2007, Nature Rev. Genetics 8(3):235 </li></ul></ul></ul><ul><ul><ul><li>First MOLGENIS, in those times in PHP </li></ul></ul></ul><ul><ul><ul><li>Swertz et al 2004, Bioinformatics 20(4)L2075 </li></ul></ul></ul>Questions http://www.molgenis.org http:// www.xgap.org
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×