Kanterakis bosc2010 molgenis


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The envisioned system should include suitable user interfaces for researchers, programmatic interfaces for analysis protocols and data federation, and should be easily extended to accommodate diverging local needs. None of the available (open source) systems seemed to provide this and meanwhile GEN2PHEN [2] started ‘database-in-a-box’ projects including a microarray system based on the MAGE-TAB file format [3] and the MOLGENIS [4,5] biosoftware platform. BBMRI-NL chose to sponsor this project with the following results:
  • Kanterakis bosc2010 molgenis

    1. 1. Towards a federated microarray gene expression repository using MOLGENIS and MAGE-TAB<br />AlexandrosKanterakis, Tomasz Adamusiak, JuhaMuilu, Helen Parkinson, DespoinaAntonakaki, Morris A. Swertz<br />
    2. 2. About BBMRI-NL<br />Biobank research infrastructure<br />Exploit the wealth of information in microarray and GWAS<br />Data currently fragmented between individual biobanks (>6500) samples<br />
    3. 3. Objectives (1/2)<br />Establish: web-based national repository for microarray gene expression data<br />Populate: with well-annotated microarray experiments<br />Share: the software as ‘microarray database in-a-box’ such that all BBMRI biobanks can reuse it locally<br />Requirements<br />Interfaces<br />Programmatic Interfaces<br />Extendable<br />Data federation<br />User Interface<br />Analysis Protocols<br />Diverging local needs<br />
    4. 4. Objectives (2/2)<br />Combine gene expression data from multi-platform microarray experiments with GWAS studies in order to create novel eQTL datasets for complex diseases<br />+<br /><br />
    5. 5. MAGE-TAB (1/2)<br />MAGE-TAB: simple, human readable, tab-delimited.<br />Comprised by 4 parts:<br />Investigation Description Format (IDF). General information, contact details, bibliographic references,...<br />Array Design Format (ADF). What sequence is located at each position on an array and what the annotation of this sequence is.<br />Raw and processed data files. ASCII or binary files.<br />2006<br />
    6. 6. MAGE-TAB (2/2)<br />Sample and Data Relationship Format (SDRF). Relationships between samples, arrays, extracts, hybridizations and other objects used in the investigation.<br />
    7. 7. MAGE-TAB Object Model<br />From MAGE-TAB specifications we created a data model* in XML format..<br />.. and parsers for MAGE-TAB files. <br />http://www.mged.org/mage-tab/MAGE-TABv1.0.pdf<br />http://magetab-om.sourceforge.net/magetab_idf.xml<br />*data model is the set of definitions of classes, elements and properties of the data <br />
    8. 8. Visualization of MAGE-TAB OM<br />SDRF<br />ADF<br />data<br />IDF<br />
    9. 9. MOLGENIS MAGE-TAB<br />From MAGE-TAB Object Model we created a web environment for managing Microarray Experiments:<br />850 lines of maintainable code<br />60K lines of automatic generated code<br />
    10. 10. MOLGENIS MAGE-TAB<br />
    11. 11. Testing..<br />For testing and validation purposes we populated the database with data from ArrayExpress:<br /><ul><li>7665 experiments from Gene Expression Omnibus, curated by ArrayExpress
    12. 12. 3940 non-GEO experiments from ArrayExpress
    13. 13. 320.000 samples, 550 species, 2.400 human conditions</li></li></ul><li>Discussion<br />Features:<br />APIs: R, Java<br />Web services: SOAP, REST<br />Semantic Interfaces: RDF, SPARQL<br />MAGE-TAB parsers, validators and visualization<br />Future work:<br /><ul><li>Populate with local data
    14. 14. Plug-in analysis tools
    15. 15. Data and tool sharing among local installs
    16. 16. Privacy sensitive biobanking community</li></li></ul><li>Thank you<br />Acknowledgements:<br />Morris Swertz<br />Joeri van derVelde<br />LudeFranke<br />Danny Arends<br />Email: alexandros.kanterakis@gmail.com<br />Posters:<br />