• Like
  • Save
TranSMART ISA-june2012
Upcoming SlideShare
Loading in...5
×
 

TranSMART ISA-june2012

on

  • 894 views

 

Statistics

Views

Total Views
894
Views on SlideShare
894
Embed Views
0

Actions

Likes
0
Downloads
13
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    TranSMART ISA-june2012 TranSMART ISA-june2012 Presentation Transcript

    • Managing Experimental Metadata using ISA data structures Philippe Rocca-Serra Ph.D on the behalf of the ISA Team, University of Oxford http://www.isa-tools.org; http://github.com/ISA-tools http://isacommons.org/ philippe.rocca-serra@oerc.ox.ac.uk TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Why ISA format and Tools?  Capture all salient features of the experimental workflow  Make annotation explicit and discoverable  Structure the descriptions for consistency, tracking  independent variables  dependent variables using  cross reference and resolvable identifiersTuesday, 19 June 2012
    • Why ISA format and Tools? – Supporting data provenance tracking – Node/Edge underlying concept – Tabular as a compromise: a presentation layer inspired by Object model (FuGE,MAGE-OM) – A Generic representation, applied to: • microarray based experiments (MAGE) • sequencing based experiments (SRA) • flow cytometry based experiments (FuGE-Flow Cyt) • mass spectrometry and NMR spectroscopy experimentsTuesday, 19 June 2012
    • Why ISA format and Tools? investigation investigation high  level  concept  to  link   H1 H. Sapiens 35 Years H1.sample1 Labeling H1.sample1.labeled h1-s1.cel related  studies H1 H. Sapiens 35 Years H1.sample2 h1-s2.cel H2 H. Sapiens 33 Years H2.sample1 Labeling H2.sample1.labeled h2-s1.cel study the  central  unit,  containing   information  on  the  subject   under  study,  its  characteristics   H1.sample1 Labeling H1.sample1.labeled h1-s1.cel and  any  treatments  applied. H1 a  study  has  associated  assays H. Sapiens H1.sample2 h1-s2.cel 35 Years assay H2 H2.sample1 Labeling H2.sample1.labeled h2-s1.cel test  performed  either  on   H. Sapiens 33 Years material  taken  from  the  sub-­ ject  or  on  the  whole  initial   subject,  which  produce  quali-­ tative  or  quantitative  meas-­ urements  (data) ISA metadata specifications: •workflow and process orientated •compatible with checklist enforcement assay(s) assay(s) •compatible with external vocabulary resources •compatible by design with existing schemas pointers  to  data  file   MAGE-Tab names/location Pride-xml SRA-xml external  files  in   Currently finalizing conversion to RDF to native  or  other  for-­ explore the growing Linked Data universe, mats in collaboration with the W3C HCLSIG, Toxbank Consortium) data data TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Material Node Material Node Characteristics[…] Factor Value[…] Characteristics[…] (independent variables) Factor Value[…] Material Type (independent variables) Comment[…] Protocol REF Material Type Comment[…] Parameter Value […] Performer (operator effect) Date (day effect) TranSMART-ISA Teleconference June 19th, 2012 5Tuesday, 19 June 2012
    • ISA syntax and Table definition • Data Acquisition & Data Transformations: – Input are Materials or Data and Outputs Data Nodes (Raw Data File, Derived Data File, Derived Array Data Matrix File) Material Node Data File Node Characteristics[…] Factor Value[…] Comment[…] (independent variables) Comment[…] Material Type Protocol REF Parameter Value […] Performer (operator effect) Date (day effect) TranSMART-ISA Teleconference June 19th, 2012 6Tuesday, 19 June 2012
    • Who uses ISA format and Tools? A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: •environmental health • stem cell discovery •environmental genomics • system biology •metabolomics • transcriptomics •metagenomics • toxicogenomics •nanotechnology • also by communities working to build a library of •proteomics cellular signatures Some of the public groups/resources: Some of the internal projects: Nanotechnology Informatics Working GroupTuesday, 19 June 2012
    • Towards interoperable bioscience data doi:10.1038/ng.1054 Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W. www.biosharing.org www.isacommons.org Feb 2012 www.isacommons.org Community involvement and uptake 1st ISA-Tab workshop 3rd ISA-Tab workshop User workshops/visits - start 1st public instance: Other tools implement Harvard Stem Cell Growing number of 2nd ISA-Tab workshop ISA-Tab Discovery Engine systems starts to adopt ISA-Tab Core developments Conversions to Pride-XML/SRA-XML/ Links to analysis Strawman ISA-Tab spec ISA software v1 MAGE-Tab and more tools starts Final ISA-Tab spec Database instance at EBI RDF format starts Publications Stem Cell ISA-Tab and Discovery ‘Omics data sharing ISA Commons Workshop reports ISA software suite Engine (Science) (Nature Genetics) (Bioinformatics) (NAR) 2007 2008 2009 2010 2011 2012 Development timelineTuesday, 19 June 2012
    • The ISA tools... modular with a suite of supporting tools Convert to ISA Convert from ISA converter converter Convert to MAGE-TAB, Convert from MAGE-Tab PRIDE-ML, SRA-XML for to ISATab. More formats submission to international coming soon... public repositories Configure Create Validate Load Browse isacreator Users browse investigations, Check adherance to Curator stores metadata Curator creates template Experimentalist uses editor to in database using BII data query and view template experimental metadata, and report investigation. management tool access associated data files Analyze Perform analysis of data in context with the metadata Requires Configuration XML using the Galaxy or R analysis engines. TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Create configuration xml files TransMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • The ISAconfigurator... TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • The ISAconfigurator... TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Use of the configuration xml In technical terms, configuration xml schema (XSD) is consumed by an XML beans goal in maven and Java stubs are created which are then used to load the XML files into memory XML definition(s) Import into Java Object Model Construct spreadsheet model. Columns, Assign cell editors. Ontology terms are using classes created by XML rows, etc. given the ontology selection tool as a cell beans editor, file fields are given a file chooser etc. <xml> <field>sample</field> <field>protocol ref</field> Java Object <field>extract name</field> TableReferenceObject <field>label</field> ... </xml> The configuration is also used to define the form view using a similar mechanism.... TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • isacreator Create & Edit ISA-Tab TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Data Reporting Scenarios 1. Starting from scratch: spreadsheet function 2. Mapping from 3rd-party tab data: mapping/ETL tool 3. Templating based on study design information: wizard(*) (*)(“early intervention is best”) TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • The ISAcreator... isacreator Developed to be a user friendly way to enter standards-compliant metadata: it has lots of features... But these are just some of them...we also have a data entry wizard and an import utility... TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Ontologies in ISAcreator We use the NCBO Bioportal and the EBI’s OLS to do searching and browsing on ontologies. Ontology field restriction Ontology browsing & searching Ontology tagging Ontology Resource Manager The resource manager provides seamless searching of ontology resources, regardless of their origins, their underlying data schema or the mechanism (REST, SOAP or local file store) through which they are accessed. NCBO Ontology Plugin BioPortal Lookup Search, Hierarchy and Annotator services Service (OLS) ISAcreator manages ontology metadata such as version information as well as individual term accessions, source, uri and so forth. Ontology search code is usable outside of ISAcreator. In fact, the ISAconfigurator imports ISAcreator as a maven dependency and reuses it’s components to do ontology restriction...plugins can also make use of our ontology search and browse functionalities TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Plugins in ISAcreator In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good. •Plugins can be developed for 3 different purposes: Search (adds extra search Custom cell editors Extra general functionality space for ontology tool) (for spreadsheet) (which appears in a plugin menu) •2 Examples of ISA plugins: • Access to local metadata stores: Novartis Plugin to Ontology Widget • Annotation of findings: Metabolite Identification Plugin (Metabolights Repository contribution to ISA project). TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Plugins...example Novartis Metastore Search Search function on the Novartis Metastore... integrates search results on the metastore in the Ontology search tool. So, with the Novartis plugin in your Plugin directory, you’ll be able to search the Novartis metastore directly within ISAcreator, and it will handle all the tasks involved with recording term source, etc. TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • ISAcreator - Metabolite Identification plugin 5 Credits: Kenneth Haug: MetabolightsTuesday, 19 June 2012
    • Summary • All Open Source, Open Access Project (https://github.com/ISA-tools) • OSGI Plugin Architecture: Apache Felix • Ontology Support: Select, Browse, Tag from public or private metadata stores • Annotation of Molecular finding: Metabolite Identification Plugin for ISAcreator • Several libraries (java, python, perl, R,) for parsing ISA files. • Integration with R: R-ISATAB package TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Summary: TransMART - ISA • ISA Study maps to TransMart • Samples and Timepoint • Study Groups • Subject Demographics • ISA assays map to TransMART Biomarkers • ISA already has configurations supporting OMICS data: • microarray • NGS • RNA-Seq, ChIP-Seq, MeDIP-Seq • microbial diversity • protein/metabolite profiling using Mass spectrometry TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Why integrating ISA with tranSMART ? • Susie Stephens (J&J): "A use case: someone was viewing results of analyses in TranSMART, and then wanted to go back to the raw or processed data and the experimental information in the ISA system. Or where results make a scientist curious to know whether a different/similar data set exists” • Michael R. Barnes (Director of Bioinformatics, Queen Mary University of London): "We are now quite bought in to TranSMART as we will be running it for a large funded MRC collaboration. The benefit of interoperability between TranSMART and ISA tools would be self evident. The fewer different standards used in a workflow the better, although TranSMART might be able to integrate diverse data sources, if the sources dont all contain the same fields then combined analysis is reduced to the common denominator fields between data sets. ISA-Tab could be a standard of choice for TranSMART, although it could not be an exclusive standard." TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Preparing for Linked Open Data ✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax ✴ reliance to internet resolvable identifiers ✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719) ✴ TODO: ✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Preparing for Linked Open Data ✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax ✴ reliance to internet resolvable identifiers ✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719) ✴ TODO: ✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Preparing for Linked Open Data ✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax ✴ reliance to internet resolvable identifiers ✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719) ✴ TODO: ✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Our next steps...as a community RDF export & Visualisation Further adoption Analysis low dose aspirin liver kidney blood serum blood plasma x5 x5 x5 x5 SAMP SAMP SAMP SAMP EX EX EX EX kidney blood serum LABEL LABEL LABEL HYB HYB HYB x5 x5 SAMP SAMP SCAN SCAN SCAN SCAN EX TRANS TRANS TRANS TRANS LABEL HYB SCAN SCAN liver kidney blood serum blood plasma TRANS TRANS x5 x5 x5 x5 SAMP SAMP SAMP SAMP well described process missing protocols and no from sample to data file. information about what was being measured. EX EX Making visual comparisons is straightfor- ward using this approach. The longest path is constructed based on all other known LABEL LABEL datasets in the pool of workflows being compared. HYB HYB HYB SCAN SCAN SCAN SCAN TRANS TRANS TRANS TRANS TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012
    • Thanks for listening... Questions?? You can email us... isatools@googlegroups.com View our website http://www.isa-tools.org View our Git repo & contribute http://github.com/ISA-tools View our blog http://isatools.wordpress.com Follow us on Twitter @isatools TranSMART-ISA Teleconference June 19th, 2012Tuesday, 19 June 2012