• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe
 

P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

on

  • 904 views

Presentation at BOSC2012 by P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Presentation at BOSC2012 by P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Statistics

Views

Total Views
904
Views on SlideShare
904
Embed Views
0

Actions

Likes
2
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe Presentation Transcript

    • 1 The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe BOSC, Long Beach, July 13-14, 2012 Philippe Rocca-Serra (Ph. D) ISA Team twitter: @isatools.org philippe.rocca-serra@oerc.ox.ac.uk http://www.isa-tools.orgFriday, 13 July 2012
    • 3 MAIN THEME: It is all about structuring experimental information to make it available to computer and software agents to enable mining. But let’s proceed gradually…Friday, 13 July 2012
    • 3 MAIN THEME: It is all about structuring experimental information to make it available to computer and software agents to enable mining. But let’s proceed gradually… Notes in Lab Books (information for humans)Friday, 13 July 2012
    • 3 MAIN THEME: It is all about structuring experimental information to make it available to computer and software agents to enable mining. But let’s proceed gradually… Notes in Lab Books Spreadsheets and Tables (information for humans) ( the compromise)Friday, 13 July 2012
    • 3 MAIN THEME: It is all about structuring experimental information to make it available to computer and software agents to enable mining. But let’s proceed gradually… Notes in Lab Books Spreadsheets and Tables Facts as RDF statements (information for humans) ( the compromise) (information for machines)Friday, 13 July 2012
    • 9 Observations • Experiments are expensive, often publicly funded, still many fail to see the light. • Spreadsheets are the most common vehicle for so-called ‘omics’ (functional genomics) experimental metadata tracking • technology centric repositories form de facto silos • conversions are required to allow for deposition to public databases. • submitting to common information across a series of repositories is inefficientFriday, 13 July 2012
    • 10 Case StudyFriday, 13 July 2012
    • 13 Many ontologies, Many Formats, Many Requirements… Grr…Where are the tools!?! Credits:  h/p://liverpoolsolfed.wordpress.com/resources/image-­‐bank/demonstraAon/Friday, 13 July 2012
    • 14 ISA framework overviewFriday, 13 July 2012
    • Why ISA format and Tools? – Supporting data provenance tracking – Node/Edge underlying concept – Tabular as a compromise: a presentation layer inspired by Object model (FuGE,MAGE-OM) – A Generic representation, applied to: • microarray based experiments (MAGE) • sequencing based experiments (SRA) • flow cytometry based experiments (FuGE-Flow Cyt) • mass spectrometry and NMR spectroscopy experimentsFriday, 13 July 2012
    • Why ISA format and Tools? investigation investigation high  level  concept  to  link   H1 H. Sapiens 35 Years H1.sample1 Labeling H1.sample1.labeled h1-s1.cel related  studies H1 H. Sapiens 35 Years H1.sample2 h1-s2.cel H2 H. Sapiens 33 Years H2.sample1 Labeling H2.sample1.labeled h2-s1.cel study the  central  unit,  containing   information  on  the  subject   under  study,  its  characteristics   H1.sample1 Labeling H1.sample1.labeled h1-s1.cel and  any  treatments  applied. H1 a  study  has  associated  assays H. Sapiens H1.sample2 h1-s2.cel 35 Years assay H2 H2.sample1 Labeling H2.sample1.labeled h2-s1.cel test  performed  either  on   H. Sapiens 33 Years material  taken  from  the  sub-­ ject  or  on  the  whole  initial   subject,  which  produce  quali-­ tative  or  quantitative  meas-­ ISA metadata specifications: urements  (data) •workflow and process orientated •compatible with checklist enforcement •compatible with external vocabulary resources assay(s) assay(s) •compatible by design with existing schemas pointers  to  data  file   MAGE-Tab names/location Pride-xml SRA-xml external  files  in   Currently finalizing conversion to RDF to explore native  or  other  for-­ mats the growing Linked Data universe, in collaboration with the W3C HCLSIG, Toxbank Consortium) data dataFriday, 13 July 2012
    • ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Material Node Material Node Characteristics[…] Factor Value[…] (independent Protocol REF Characteristics[…] variables) Factor Value[…] (independent Material Type Parameter Value variables) Comment[…] […] Material Type Comment[…] Performer (operator effect) Date (day effect) 9Friday, 13 July 2012
    • ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Material Node Material Node Characteristics[…] Factor Value[…] (independent Protocol REF Characteristics[…] variables) Factor Value[…] (independent Material Type Parameter Value variables) Comment[…] […] Material Type Comment[…] Performer (operator effect) Date (day effect) 9Friday, 13 July 2012
    • ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Data File Node Material Node Material Node Characteristics[…] Factor Value[…] (independent Protocol REF Characteristics[…] variables) Factor Value[…] (independent Material Type Parameter Value variables) Comment[…] […] Material Type Comment[…] Performer (operator effect) Date (day effect) 9Friday, 13 July 2012
    • ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Data File Node Material Node Material Node Comment[…] Characteristics[…] Factor Value[…] (independent Protocol REF Characteristics[…] variables) Factor Value[…] (independent Material Type Parameter Value variables) Comment[…] […] Material Type Comment[…] Performer (operator effect) Date (day effect) 9Friday, 13 July 2012
    • ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Data File Node Material Node Material Node Comment[…] Characteristics[…] Factor Value[…] (independent Protocol REF Characteristics[…] variables) Factor Value[…] (independent Material Type Parameter Value variables) Comment[…] […] Material Type Comment[…] Performer (operator effect) Date (day effect) 9Friday, 13 July 2012
    • 19 ISAconfigurator TablesFriday, 13 July 2012
    • 20 ISAconfigurator TablesFriday, 13 July 2012
    • 22 How do ISA tools access Ontology servers?Friday, 13 July 2012
    • The ISAcreator... isacreator Developed to be a user friendly way to enter standards-compliant metadata: it has lots of features... But these are just some of them...we also have a data entry wizard and an import utility...Friday, 13 July 2012
    • 24 Select and Annotate in ISAcreatorFriday, 13 July 2012
    • Extending ISAcreator The Plugin ArchictectureFriday, 13 July 2012
    • Plugins in ISAcreator In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good. •Plugins can be developed for 3 different purposes: Search (adds extra search space Custom cell editors Extra general functionality for ontology tool) (for spreadsheet) (which appears in a plugin menu) •2 Examples of ISA plugins: • Access to local metadata stores: Novartis Plugin to Ontology Widget • Annotation of findings: Metabolite Identification Plugin (Metabolights Repository contribution to ISA project).Friday, 13 July 2012
    • Plugins...example 1 Novartis Metastore Search Search function on the Novartis Metastore... integrates search results on the metastore in the Ontology search tool. So, with the Novartis plugin in your Plugin directory, you’ll be able to search the Novartis metastore directly within ISAcreator, and it will handle all the tasks involved with recording term source, etc.Friday, 13 July 2012
    • Plugins Example 2 - Metabolite Identification plugin 5 Credits: Kenneth Haug: MetabolightsFriday, 13 July 2012
    • 30 Potential Issues and known hurdles • The problem of conflicting versions – especially high when working with big consortia – distributed, decentralized groups of users • Lack of version control and history • Absence of collaborative features – Looking for new solutions while retaining the features ! • OntoMaton: Bringing Google Doc, NCBO Bioportal and ISA-TAB together !Friday, 13 July 2012
    • Friday, 13 July 2012
    • OntoMaton: SearchingFriday, 13 July 2012
    • OntoMaton: TaggingFriday, 13 July 2012
    • OntoMaton • Public release: http://goo.gl/2OKFV • Can be used in any Google Spreadsheet document • Application: • Annotating data records • Supporting ontology development (see OBI Quick Term Templates)Friday, 13 July 2012
    • 31 ISA2RDF work in progress • Use case on W3C HCLS scientific discourse list – deciding on the granularity of representation – building on previous experience – Evaluating alternative representations. • Participitation to the Biohackathon 2011 – http://blogs.openaccesscentral.com/blogs/bmcblog/entry/ biohackathon_2011_number_1 – Discussing best practices • PURL uri and identifiers.org as identifiers • Openphacts guidelines (http://www.nanopub.org/guidelines/ OpenPHACTS_Nanopublication_Guidlines_v1.8.1.pdf) •Friday, 13 July 2012
    • Preparing for Linked Open Data ✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax ✴ reliance to internet resolvable identifiers ✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719) ✴ TODO: ✴ Specify comparator groups + analysis methods and resulting measurements and statistical measuresFriday, 13 July 2012
    • Preparing for Linked Open Data ✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax ✴ reliance to internet resolvable identifiers ✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719) ✴ TODO: ✴ Specify comparator groups + analysis methods and resulting measurements and statistical measuresFriday, 13 July 2012
    • Preparing for Linked Open Data ✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax ✴ reliance to internet resolvable identifiers ✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719) ✴ TODO: ✴ Specify comparator groups + analysis methods and resulting measurements and statistical measuresFriday, 13 July 2012
    • 32 ISA2RDF: work in progress jeliazkova.nina [toxbank project]Friday, 13 July 2012
    • 32 ISA2RDF: work in progress jeliazkova.nina [toxbank project]Friday, 13 July 2012
    • ISA2OWL • OWLAPI • ISA Parser (in memory BII object store objects) • Mapping ISA syntax into target Ontological Space • Decoupling Mapping from Conversion Engine • avoid to be tied to a semantic frameworkFriday, 13 July 2012
    • ISA2OWL: mapping in the BFO space as starting pointFriday, 13 July 2012
    • ISA2OWL: mapping in the BFO space as starting pointFriday, 13 July 2012
    • ISA2OWL: mapping issues • Stability over time • Keeping track of resource versions • Gaps in coverage • Use of local extensions • Direct requests/contributionsFriday, 13 July 2012
    • ISA2OWL: development • include graph metadata (graph provenance to aid indexing) • extend semantic validation of ISA archive • augment annotation by suggesting additions • facilitate curation work • create new mappings to other frameworks (OPML model, SIO,)Friday, 13 July 2012
    • 33 Publication... ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar; Chris Taylor; Kimberly Begley; Dawn Field; Stephen Harris; Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna-Assunta Sansone BioinformaAcs  2010  26:  2354-­‐2356Friday, 13 July 2012
    • 34 Acknowledgements Groups and individuals participating in: MIBBI http://mibbi.org ISA-­‐Tab  format http://isatab.sf.net OBO  Foundry http://obofoundry.org OBI: http://obi-ontology.org/page/Main_Page collaborators at: ISA Infrastructure Team: Cambridge University Alejandra Gonzalez-­‐Beltran  (Oxford) EuNuGO Harvard School for Public Health Eamonn Maguire  (Oxford) FDAs NCTR Philippe Rocca-­‐Serra  (Oxford) Leibniz Plant Institute NERCs NEBC SIDR,  INIST Metabolights,  EMBL-­‐EBI Funders: EU Carcinogenomics Project UK  BBSRCFriday, 13 July 2012
    • 35 Groups and individuals participating in: Winston Hide: HSPH Oliver Hoffman: HSPH Shannan Ho Sui : HSPH Brad Chapman: HSPH Christoph Steinbeck: Metabolights Kenneth Haug: Metabolights Paula de Matos: Metabolights Magali Roux: INIST Florian Mazur: INIST Alain Zasadzinki: INIST Marie Christine Jacquemot: INIST Nina Jeliazkova: ToxBank And many more who have to forgive us!Friday, 13 July 2012
    • 36 Questions:Friday, 13 July 2012