International data standards: adopt them but preserve flexibility Flexible (standard) data models: ease extension ‘beyond’ today for new research Dynamic software infrastructure: how to ensure a long life (30 yrs!)
MOLGENIS and the eXtensible Genotype And Phenotype database project (xgap) Morris A. Swertz et al DAM & BOSC sigs Stockholm, June 27 2009 EBI Biobanking platform
Outline
MOLGENIS database generator
Free toolbox of automated best practices auot-generate useful data apps (sql,java,R,soap) from simple models
As open platform that harmonizes data syntax, programming interfaces, user interaction, pluggable
Demo
eXtensible Genotype And Phenotype (xgap)
To store various high-throughput genotype and phenotypes harmonized
Step 1: model* m.a.swertz@rug.nl / individuals expressions probes *Can also extract automatically from an existing database
m.a.swertz@rug.nl /
Step 2: generate Download and customize... Model file XML Generate APIs in Java, R, Web services and HTTP MyScript Plugins FormGen MenuGen TreeGen PluginGen MatrixGen JTypeGen JDBCMapGen JListGen JReadCsvGen HSQLGen JDatabaseGen MySQLGen RMatrixGen WSGen RListGen data infrastructure user interaction infrastructure Communication infrastructure
Step 3: use result
Lets see
eXtensible Genotype And Phenotype database for QTL and GWAS experiments
Locus Specific database
Clin. Trial metabase
NextGen sequencing
Proteo/Metabolomics
Animal Observations
Example projects
XGAP - DAM Challenges
Challenges:
Share data between QTL collaborators
Variety of species/methods
Reuse the ad-hoc analysis protocols
Aim:
Simple common data model and format
Common interaction layers (R, SOAP)
Platform for reusable protocols/tools
Reuse between individual projects
inbreed 100 10.000 1,000,000 100,000 10,000 10 10,000,00 QTL profiles network correlate genome strains individuals markers expressions preprocess probes microarrays 100 hybridize 100,000 genotype genotypes norm exprs. 10.000 map Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information Associated data files process material 10,000
First objective m.a.swertz@rug.nl / researcher researcher annotations Raw and processed data database my GaP
Genotype data
1. Data model M A R K E R S Subjects: STRAINS DATA ELEMENTS T r a i t s: TRAIT SUBJECT Looking at standards and existing data sets Simple enough for everybody to create
1. Data model
What about QTL data?
/ P R O B E S Traits: MARKERS T r a i t s:
1. Model / DATA (matrix) TRAIT SUBJECT DATA ELEMENT TRAIT SUBJECT
1. Model / DATA ELEMENT TRAIT SUBJECT DATA ELEMENT
Annotations:
Individual,
Strain,
Sample,
…
Annotations
Phenotype
Probe
Marker
Mass Peak
…
Data:
PhenotypeValues
Raw
QTLs
other
columns rows dimension ELEMENT
Extensions for new experiments DATA ELEMENT TRAIT SUBJECT
PROBE
Name
Gene
Chromosme
Locus
MARKER
Name
Allele
Chromosme
Locus
MASSPEAK
Name
MZ
RetentionTime
Panel
Name
Type: CSS, RIL..
Parent Panels
INDIVIDUAL
Name
Strain
Mother
Father
Sex
SAMPLE
Name
Individual
Tissue
And so on … And so on … columns rows dimension ELEMENT
Protocol graph from FuGE
XGAP extends on standard FuGE (Jones ea, NatBiot 2007)
/ FuGE: Jones et al Nature Biotech 25, 1127-1133 DATA DATA Genotype data QTL data QTL Mapping Affy Array SNP Array DATA Expression data Mapping Protocol Illumina R Software Illumina Protocol Affy M430 Protocol Bead Studio DATA application Protocol Software Equipment Bioconductor Norm. Affy M430 platform DATA DATA DATA FuGE:
UML: XGAP extends FuGE m.a.swertz@rug.nl / Uniform core to ease sharing of data and tools Various traits for new research Various subjects for new research ? ?
2. Mode, run MOLGENIS
Connect to R statistics Workflow ready web-services UML documentation of your model Edit & trace your data Import/export to Excel plugin your own scripts (R/QTL) Tech keywords: object oriented data models, multi-platform java, tomcat/glassfish web server, mysql/postgresql database, Eclipse/Netbeans IDE, Java API, WSDL/SOAP API, R-project API, MVC, freemarker templates and css for custom layout, open source.
Proof of the pudding
Ongoing work
Next step: add processing Sheets thanks to Joeri van der Velde and Danny Arends Generalize for all MOLGENIS instances: (1) Extend MOLGENIS model for tool integration <tool name=“rqtl”> <input name=“data” entity=“data”/> … </tool> (2) Integrate workflow definition and execution Extending on Taverna/Galaxy model & APIs…
Next step: semantics
The Pheno-OM project
Integrating mouse and man
Generalize for all MOLGENIS instances: Next: Add MOLGENIS components to integrate: (1) Ontology browsing Extending on BioPortal/OLS frameworks? (2) Semantic integration layer ???
Exploit standard generated interfaces
Big distribute big data and tools
Meta analysis
Federation? Cloud computing?
Acknowledgements
Joeri van der Velde
Joris Lops
Tomasz Adamusiak
Danny Arends
Martijn Dijkstra
Matthijs Kattenberg
Tjeerd Abma
Ate Boerema
Henrikki Almusa
Rudi Alberts
Damian Smedley
Katy Wolstencroft
Andrew R. Jones
Bruno M. Tesson
Richard A. Scheltema
Gonzalo Vera Rodriguez
Rene Oostergo
Helen E. Parkinson
Ritsert C. Jansen
Cisca Wijmenga
Carole Goble
Marco Roos
M. Scott Marshall
Paul Schofield
John M. Hancock
Juha Muilu
Klaus Schughart
Engbert O. de Brock
Hans Hillege
the LifeLines consortium
the Trial Coordination Center
the GEN2PHEN consortium
the CASIMIR consortium
the NBIC/BioAssist consortium
See us at the NBIC booth
Add generator targets
We have funding for
several positions (PhD,SE)
Read more:
MOLGENIS for data integration:
Smedley et al 2009, Brief. in Bioinformatics 9(6):532
0 comments
Post a comment