Swertz Molgenis Bosc2009

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1

    International data standards: adopt them but preserve flexibility Flexible (standard) data models: ease extension ‘beyond’ today for new research Dynamic software infrastructure: how to ensure a long life (30 yrs!)

    Favorites, Groups & Events

    Swertz Molgenis Bosc2009 - Presentation Transcript

    1. MOLGENIS and the eXtensible Genotype And Phenotype database project (xgap) Morris A. Swertz et al DAM & BOSC sigs Stockholm, June 27 2009 EBI Biobanking platform
    2. Outline
      • MOLGENIS database generator
          • Free toolbox of automated best practices auot-generate useful data apps (sql,java,R,soap) from simple models
          • As open platform that harmonizes data syntax, programming interfaces, user interaction, pluggable
      • Demo
      • eXtensible Genotype And Phenotype (xgap)
          • To store various high-throughput genotype and phenotypes harmonized
          • as platform for collaboration and analysis tools
      • Current work
    3. MOLGENIS, why and how biologist biologist biological challenges inbreed 100 10.000 1,000,000 100,000 10,000 10 10,000,00 QTL profiles network correlate genome strains individuals markers expressions preprocess probes microarrays 100 hybridize 100,000 genotype genotypes norm exprs. map
    4. MOLGENIS, why and how biologist biologist biological challenges suitable infrastructure inbreed 100 10.000 1,000,000 100,000 10,000 10 10,000,00 QTL profiles network correlate genome strains individuals markers expressions preprocess probes microarrays 100 hybridize 100,000 genotype genotypes norm exprs. map bioinformatician softw engineers €
    5. MOLGENIS, why and how biologist biologist Reinventing wheels, Wasting time Hard to integrate biological challenges suitable infrastructure bioinformatician softw engineers inbreed 100 100.000 10,000,000 1000 10,000 10 1000 genotype individuals mass peaks genotypes QTL profiles strains network SNP arrays correlate LC/MS genome map preprocess aligned peaks €
    6. Alternative strategy http://www.molgenis.org Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243 http://www.molgenis.org
    7. MOLGENIS, why and how Platform and generators Little language <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot;ExperimentID(autonum)&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; Blueprint model <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot; ExperimentID(autonum )&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; + bioinformatician softw engineer inbreed 100 10.000 1,000,000 100,000 10,000 10 10,000,00 QTL profiles network correlate genome strains individuals markers expressions preprocess probes microarrays 100 hybridize 100,000 genotype genotypes norm exprs. 10.000 map biologist biologist http://www.molgenis.org Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243
    8. Upgrade to new research Platform and generators Little language <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot;ExperimentID(autonum)&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; Blueprint model <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot;ExperimentID(autonum)&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; + bioinformatician softw engineer biologist biologist New Biology New Biology http://www.molgenis.org Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243 inbreed 100 100.000 10,000,000 1000 10,000 10 1000 genotype individuals mass peaks genotypes QTL profiles strains network SNP arrays correlate LC/MS genome map preprocess aligned peaks
    9. Upgrade to new software tools Platform and software generators Little language <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot;ExperimentID(autonum)&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; Blueprint model <!-- entity organization --> <entity name= &quot;Experiment&quot; label= &quot;Experiment&quot; > <field name= &quot;ExperimentID&quot; key= &quot;1“ r eadonly= &quot;true&quot; label= &quot;ExperimentID(autonum)&quot;/> <field name= &quot;Medium&quot; type= &quot;xref&quot; xref_field= &quot;Medium.name&quot; /> /> <field name= &quot;Protocol&quot; label= &quot;Experiment Protocol&quot; /> <field name= &quot;Temperature&quot; type= &quot;int&quot; + bioinformatician softw engineer biologist biologist inbreed 100 100.000 10,000,000 1000 10,000 10 1000 genotype individuals mass peaks genotypes QTL profiles strains network SNP arrays correlate LC/MS genome map preprocess aligned peaks http://www.molgenis.org Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243
    10. Demo
    11. Step 1: model* m.a.swertz@rug.nl / individuals expressions probes *Can also extract automatically from an existing database
    12. m.a.swertz@rug.nl /
    13. Step 2: generate Download and customize... Model file XML Generate APIs in Java, R, Web services and HTTP MyScript Plugins FormGen MenuGen TreeGen PluginGen MatrixGen JTypeGen JDBCMapGen JListGen JReadCsvGen HSQLGen JDatabaseGen MySQLGen    RMatrixGen WSGen RListGen data infrastructure user interaction infrastructure Communication infrastructure
    14. Step 3: use result
      • Lets see
    15. eXtensible Genotype And Phenotype database for QTL and GWAS experiments
      • Locus Specific database
      • Clin. Trial metabase
      • NextGen sequencing
      • Proteo/Metabolomics
      • Animal Observations
      Example projects
    16. XGAP - DAM Challenges
      • Challenges:
      • Share data between QTL collaborators
      • Variety of species/methods
      • Reuse the ad-hoc analysis protocols
      • Aim:
      • Simple common data model and format
      • Common interaction layers (R, SOAP)
      • Platform for reusable protocols/tools
      • Reuse between individual projects
      inbreed 100 10.000 1,000,000 100,000 10,000 10 10,000,00 QTL profiles network correlate genome strains individuals markers expressions preprocess probes microarrays 100 hybridize 100,000 genotype genotypes norm exprs. 10.000 map Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information Associated data files process material 10,000
    17. First objective m.a.swertz@rug.nl / researcher researcher annotations Raw and processed data database my GaP
      • Genotype data
      1. Data model M A R K E R S Subjects: STRAINS DATA ELEMENTS T r a i t s: TRAIT  SUBJECT Looking at standards and existing data sets Simple enough for everybody to create
    18. 1. Data model
      • What about QTL data?
      / P R O B E S Traits: MARKERS T r a i t s:
    19. 1. Model / DATA (matrix) TRAIT SUBJECT DATA ELEMENT TRAIT  SUBJECT
    20. 1. Model / DATA ELEMENT TRAIT SUBJECT DATA ELEMENT
      • Annotations:
      • Individual,
      • Strain,
      • Sample,
      • Annotations
      • Phenotype
      • Probe
      • Marker
      • Mass Peak
      • Data:
      • PhenotypeValues
      • Raw
      • QTLs
      • other
      columns rows dimension ELEMENT
    21. Extensions for new experiments DATA ELEMENT TRAIT SUBJECT
      • PROBE
      • Name
      • Gene
      • Chromosme
      • Locus
      • MARKER
      • Name
      • Allele
      • Chromosme
      • Locus
      • MASSPEAK
      • Name
      • MZ
      • RetentionTime
      • Panel
      • Name
      • Type: CSS, RIL..
      • Parent Panels
      • INDIVIDUAL
      • Name
      • Strain
      • Mother
      • Father
      • Sex
      • SAMPLE
      • Name
      • Individual
      • Tissue
      And so on … And so on … columns rows dimension ELEMENT
    22. Protocol graph from FuGE
      • XGAP extends on standard FuGE (Jones ea, NatBiot 2007)
      / FuGE: Jones et al Nature Biotech 25, 1127-1133 DATA DATA Genotype data QTL data QTL Mapping Affy Array SNP Array DATA Expression data Mapping Protocol Illumina R Software Illumina Protocol Affy M430 Protocol Bead Studio DATA application Protocol Software Equipment Bioconductor Norm. Affy M430 platform DATA DATA DATA FuGE:
    23. UML: XGAP extends FuGE m.a.swertz@rug.nl / Uniform core to ease sharing of data and tools Various traits for new research Various subjects for new research ? ?
    24. 2. Mode, run MOLGENIS
    25. Connect to R statistics Workflow ready web-services UML documentation of your model Edit & trace your data Import/export to Excel plugin your own scripts (R/QTL) Tech keywords: object oriented data models, multi-platform java, tomcat/glassfish web server, mysql/postgresql database, Eclipse/Netbeans IDE, Java API, WSDL/SOAP API, R-project API, MVC, freemarker templates and css for custom layout, open source.
    26. Proof of the pudding
    27. Ongoing work
    28. Next step: add processing Sheets thanks to Joeri van der Velde and Danny Arends Generalize for all MOLGENIS instances: (1) Extend MOLGENIS model for tool integration <tool name=“rqtl”> <input name=“data” entity=“data”/> … </tool> (2) Integrate workflow definition and execution Extending on Taverna/Galaxy model & APIs…
    29. Next step: semantics
      • The Pheno-OM project
      • Integrating mouse and man
      Generalize for all MOLGENIS instances: Next: Add MOLGENIS components to integrate: (1) Ontology browsing Extending on BioPortal/OLS frameworks? (2) Semantic integration layer ???
      • Exploit standard generated interfaces
      • Big distribute big data and tools
      • Meta analysis
      Federation? Cloud computing?
    30. Acknowledgements
      • Joeri van der Velde
      • Joris Lops
      • Tomasz Adamusiak
      • Danny Arends
      • Martijn Dijkstra
      • Matthijs Kattenberg
      • Tjeerd Abma
      • Ate Boerema
      • Henrikki Almusa
      • Rudi Alberts
      • Damian Smedley
      • Katy Wolstencroft
      • Andrew R. Jones
      • Bruno M. Tesson
      • Richard A. Scheltema
      • Gonzalo Vera Rodriguez
      • Rene Oostergo
      • Helen E. Parkinson
      • Ritsert C. Jansen
      • Cisca Wijmenga
      • Carole Goble
      • Marco Roos
      • M. Scott Marshall
      • Paul Schofield
      • John M. Hancock
      • Juha Muilu
      • Klaus Schughart
      • Engbert O. de Brock
      • Hans Hillege
      • the LifeLines consortium
      • the Trial Coordination Center
      • the GEN2PHEN consortium
      • the CASIMIR consortium
      • the NBIC/BioAssist consortium
      • See us at the NBIC booth
      • Add generator targets
          • We have funding for
          • several positions (PhD,SE)
      • Read more:
          • MOLGENIS for data integration:
          • Smedley et al 2009, Brief. in Bioinformatics 9(6):532
          • Review of MOLGENIS type of systems (dated)
          • Swertz & Jansen 2007, Nature Rev. Genetics 8(3):235
          • First MOLGENIS, in those times in PHP
          • Swertz et al 2004, Bioinformatics 20(4)L2075
      Questions http://www.molgenis.org http:// www.xgap.org

    + boscbosc, 4 months ago

    custom

    165 views, 0 favs, 0 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 165
      • 165 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 1
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories