0
1       The open source ISA metadata tracking  framework: from data curation and management at        the source, to the l...
3                                MAIN THEME:         It is all about structuring experimental information to make it      ...
3                                             MAIN THEME:         It is all about structuring experimental information to ...
3                                              MAIN THEME:         It is all about structuring experimental information to...
3                                              MAIN THEME:         It is all about structuring experimental information to...
9                               Observations         • Experiments are expensive, often publicly funded, still           m...
10                       Case StudyFriday, 13 July 2012
13                       Many ontologies, Many Formats, Many                                Requirements…                 ...
14                       ISA framework overviewFriday, 13 July 2012
Why ISA format and Tools?           – Supporting data provenance tracking           – Node/Edge underlying concept        ...
Why ISA format and Tools?                                       investigation                       investigation         ...
ISA syntax and Table definition• Material Transformations:     – Input and Outputs of Protocols are Material Nodes (Source ...
ISA syntax and Table definition• Material Transformations:     – Input and Outputs of Protocols are Material Nodes (Source ...
ISA syntax and Table definition• Material Transformations:     – Input and Outputs of Protocols are Material Nodes (Source ...
ISA syntax and Table definition• Material Transformations:     – Input and Outputs of Protocols are Material Nodes (Source ...
ISA syntax and Table definition• Material Transformations:     – Input and Outputs of Protocols are Material Nodes (Source ...
19                       ISAconfigurator TablesFriday, 13 July 2012
20                       ISAconfigurator TablesFriday, 13 July 2012
22             How do ISA tools access Ontology servers?Friday, 13 July 2012
The ISAcreator...                              isacreator  Developed to be a user friendly way to  enter standards-complia...
24                       Select and Annotate in ISAcreatorFriday, 13 July 2012
Extending ISAcreator                           The Plugin ArchictectureFriday, 13 July 2012
Plugins in ISAcreator     In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good. ...
Plugins...example 1      Novartis Metastore Search                           Search function on the Novartis              ...
Plugins Example 2 - Metabolite Identification plugin 5     Credits: Kenneth Haug: MetabolightsFriday, 13 July 2012
30                       Potential Issues and known hurdles         • The problem of conflicting versions           – espec...
Friday, 13 July 2012
OntoMaton: SearchingFriday, 13 July 2012
OntoMaton: TaggingFriday, 13 July 2012
OntoMaton                       • Public release: http://goo.gl/2OKFV                       • Can be used in any Google Sp...
31                             ISA2RDF work in progress         • Use case on W3C HCLS scientific discourse list           ...
Preparing for Linked Open Data                   ✴   ISA2RDF (Toxbank collaboration) contribution to an                   ...
Preparing for Linked Open Data                   ✴   ISA2RDF (Toxbank collaboration) contribution to an                   ...
Preparing for Linked Open Data                   ✴   ISA2RDF (Toxbank collaboration) contribution to an                   ...
32                                    ISA2RDF: work in progress                       jeliazkova.nina                     ...
32                                    ISA2RDF: work in progress                       jeliazkova.nina                     ...
ISA2OWL                       • OWLAPI                       • ISA Parser (in memory BII object store objects)            ...
ISA2OWL: mapping in the                       BFO space as starting pointFriday, 13 July 2012
ISA2OWL: mapping in the                       BFO space as starting pointFriday, 13 July 2012
ISA2OWL: mapping issues                       • Stability over time                       • Keeping track of resource vers...
ISA2OWL: development                       • include graph metadata (graph provenance to aid                         index...
33        Publication...                       ISA software suite: supporting standards-compliant                       ex...
34            Acknowledgements         Groups and individuals participating in:         MIBBI http://mibbi.org         ISA...
35                       Groups and individuals participating in:                       Winston Hide: HSPH                ...
36                       Questions:Friday, 13 July 2012
Upcoming SlideShare
Loading in...5
×

P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

889

Published on

Presentation at BOSC2012 by P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
889
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe"

  1. 1. 1 The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe BOSC, Long Beach, July 13-14, 2012 Philippe Rocca-Serra (Ph. D) ISA Team twitter: @isatools.org philippe.rocca-serra@oerc.ox.ac.uk http://www.isa-tools.orgFriday, 13 July 2012
  2. 2. 3 MAIN THEME: It is all about structuring experimental information to make it available to computer and software agents to enable mining. But let’s proceed gradually…Friday, 13 July 2012
  3. 3. 3 MAIN THEME: It is all about structuring experimental information to make it available to computer and software agents to enable mining. But let’s proceed gradually… Notes in Lab Books (information for humans)Friday, 13 July 2012
  4. 4. 3 MAIN THEME: It is all about structuring experimental information to make it available to computer and software agents to enable mining. But let’s proceed gradually… Notes in Lab Books Spreadsheets and Tables (information for humans) ( the compromise)Friday, 13 July 2012
  5. 5. 3 MAIN THEME: It is all about structuring experimental information to make it available to computer and software agents to enable mining. But let’s proceed gradually… Notes in Lab Books Spreadsheets and Tables Facts as RDF statements (information for humans) ( the compromise) (information for machines)Friday, 13 July 2012
  6. 6. 9 Observations • Experiments are expensive, often publicly funded, still many fail to see the light. • Spreadsheets are the most common vehicle for so-called ‘omics’ (functional genomics) experimental metadata tracking • technology centric repositories form de facto silos • conversions are required to allow for deposition to public databases. • submitting to common information across a series of repositories is inefficientFriday, 13 July 2012
  7. 7. 10 Case StudyFriday, 13 July 2012
  8. 8. 13 Many ontologies, Many Formats, Many Requirements… Grr…Where are the tools!?! Credits:  h/p://liverpoolsolfed.wordpress.com/resources/image-­‐bank/demonstraAon/Friday, 13 July 2012
  9. 9. 14 ISA framework overviewFriday, 13 July 2012
  10. 10. Why ISA format and Tools? – Supporting data provenance tracking – Node/Edge underlying concept – Tabular as a compromise: a presentation layer inspired by Object model (FuGE,MAGE-OM) – A Generic representation, applied to: • microarray based experiments (MAGE) • sequencing based experiments (SRA) • flow cytometry based experiments (FuGE-Flow Cyt) • mass spectrometry and NMR spectroscopy experimentsFriday, 13 July 2012
  11. 11. Why ISA format and Tools? investigation investigation high  level  concept  to  link   H1 H. Sapiens 35 Years H1.sample1 Labeling H1.sample1.labeled h1-s1.cel related  studies H1 H. Sapiens 35 Years H1.sample2 h1-s2.cel H2 H. Sapiens 33 Years H2.sample1 Labeling H2.sample1.labeled h2-s1.cel study the  central  unit,  containing   information  on  the  subject   under  study,  its  characteristics   H1.sample1 Labeling H1.sample1.labeled h1-s1.cel and  any  treatments  applied. H1 a  study  has  associated  assays H. Sapiens H1.sample2 h1-s2.cel 35 Years assay H2 H2.sample1 Labeling H2.sample1.labeled h2-s1.cel test  performed  either  on   H. Sapiens 33 Years material  taken  from  the  sub-­ ject  or  on  the  whole  initial   subject,  which  produce  quali-­ tative  or  quantitative  meas-­ ISA metadata specifications: urements  (data) •workflow and process orientated •compatible with checklist enforcement •compatible with external vocabulary resources assay(s) assay(s) •compatible by design with existing schemas pointers  to  data  file   MAGE-Tab names/location Pride-xml SRA-xml external  files  in   Currently finalizing conversion to RDF to explore native  or  other  for-­ mats the growing Linked Data universe, in collaboration with the W3C HCLSIG, Toxbank Consortium) data dataFriday, 13 July 2012
  12. 12. ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Material Node Material Node Characteristics[…] Factor Value[…] (independent Protocol REF Characteristics[…] variables) Factor Value[…] (independent Material Type Parameter Value variables) Comment[…] […] Material Type Comment[…] Performer (operator effect) Date (day effect) 9Friday, 13 July 2012
  13. 13. ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Material Node Material Node Characteristics[…] Factor Value[…] (independent Protocol REF Characteristics[…] variables) Factor Value[…] (independent Material Type Parameter Value variables) Comment[…] […] Material Type Comment[…] Performer (operator effect) Date (day effect) 9Friday, 13 July 2012
  14. 14. ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Data File Node Material Node Material Node Characteristics[…] Factor Value[…] (independent Protocol REF Characteristics[…] variables) Factor Value[…] (independent Material Type Parameter Value variables) Comment[…] […] Material Type Comment[…] Performer (operator effect) Date (day effect) 9Friday, 13 July 2012
  15. 15. ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Data File Node Material Node Material Node Comment[…] Characteristics[…] Factor Value[…] (independent Protocol REF Characteristics[…] variables) Factor Value[…] (independent Material Type Parameter Value variables) Comment[…] […] Material Type Comment[…] Performer (operator effect) Date (day effect) 9Friday, 13 July 2012
  16. 16. ISA syntax and Table definition• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.) Data File Node Material Node Material Node Comment[…] Characteristics[…] Factor Value[…] (independent Protocol REF Characteristics[…] variables) Factor Value[…] (independent Material Type Parameter Value variables) Comment[…] […] Material Type Comment[…] Performer (operator effect) Date (day effect) 9Friday, 13 July 2012
  17. 17. 19 ISAconfigurator TablesFriday, 13 July 2012
  18. 18. 20 ISAconfigurator TablesFriday, 13 July 2012
  19. 19. 22 How do ISA tools access Ontology servers?Friday, 13 July 2012
  20. 20. The ISAcreator... isacreator Developed to be a user friendly way to enter standards-compliant metadata: it has lots of features... But these are just some of them...we also have a data entry wizard and an import utility...Friday, 13 July 2012
  21. 21. 24 Select and Annotate in ISAcreatorFriday, 13 July 2012
  22. 22. Extending ISAcreator The Plugin ArchictectureFriday, 13 July 2012
  23. 23. Plugins in ISAcreator In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good. •Plugins can be developed for 3 different purposes: Search (adds extra search space Custom cell editors Extra general functionality for ontology tool) (for spreadsheet) (which appears in a plugin menu) •2 Examples of ISA plugins: • Access to local metadata stores: Novartis Plugin to Ontology Widget • Annotation of findings: Metabolite Identification Plugin (Metabolights Repository contribution to ISA project).Friday, 13 July 2012
  24. 24. Plugins...example 1 Novartis Metastore Search Search function on the Novartis Metastore... integrates search results on the metastore in the Ontology search tool. So, with the Novartis plugin in your Plugin directory, you’ll be able to search the Novartis metastore directly within ISAcreator, and it will handle all the tasks involved with recording term source, etc.Friday, 13 July 2012
  25. 25. Plugins Example 2 - Metabolite Identification plugin 5 Credits: Kenneth Haug: MetabolightsFriday, 13 July 2012
  26. 26. 30 Potential Issues and known hurdles • The problem of conflicting versions – especially high when working with big consortia – distributed, decentralized groups of users • Lack of version control and history • Absence of collaborative features – Looking for new solutions while retaining the features ! • OntoMaton: Bringing Google Doc, NCBO Bioportal and ISA-TAB together !Friday, 13 July 2012
  27. 27. Friday, 13 July 2012
  28. 28. OntoMaton: SearchingFriday, 13 July 2012
  29. 29. OntoMaton: TaggingFriday, 13 July 2012
  30. 30. OntoMaton • Public release: http://goo.gl/2OKFV • Can be used in any Google Spreadsheet document • Application: • Annotating data records • Supporting ontology development (see OBI Quick Term Templates)Friday, 13 July 2012
  31. 31. 31 ISA2RDF work in progress • Use case on W3C HCLS scientific discourse list – deciding on the granularity of representation – building on previous experience – Evaluating alternative representations. • Participitation to the Biohackathon 2011 – http://blogs.openaccesscentral.com/blogs/bmcblog/entry/ biohackathon_2011_number_1 – Discussing best practices • PURL uri and identifiers.org as identifiers • Openphacts guidelines (http://www.nanopub.org/guidelines/ OpenPHACTS_Nanopublication_Guidlines_v1.8.1.pdf) •Friday, 13 July 2012
  32. 32. Preparing for Linked Open Data ✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax ✴ reliance to internet resolvable identifiers ✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719) ✴ TODO: ✴ Specify comparator groups + analysis methods and resulting measurements and statistical measuresFriday, 13 July 2012
  33. 33. Preparing for Linked Open Data ✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax ✴ reliance to internet resolvable identifiers ✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719) ✴ TODO: ✴ Specify comparator groups + analysis methods and resulting measurements and statistical measuresFriday, 13 July 2012
  34. 34. Preparing for Linked Open Data ✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax ✴ reliance to internet resolvable identifiers ✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719) ✴ TODO: ✴ Specify comparator groups + analysis methods and resulting measurements and statistical measuresFriday, 13 July 2012
  35. 35. 32 ISA2RDF: work in progress jeliazkova.nina [toxbank project]Friday, 13 July 2012
  36. 36. 32 ISA2RDF: work in progress jeliazkova.nina [toxbank project]Friday, 13 July 2012
  37. 37. ISA2OWL • OWLAPI • ISA Parser (in memory BII object store objects) • Mapping ISA syntax into target Ontological Space • Decoupling Mapping from Conversion Engine • avoid to be tied to a semantic frameworkFriday, 13 July 2012
  38. 38. ISA2OWL: mapping in the BFO space as starting pointFriday, 13 July 2012
  39. 39. ISA2OWL: mapping in the BFO space as starting pointFriday, 13 July 2012
  40. 40. ISA2OWL: mapping issues • Stability over time • Keeping track of resource versions • Gaps in coverage • Use of local extensions • Direct requests/contributionsFriday, 13 July 2012
  41. 41. ISA2OWL: development • include graph metadata (graph provenance to aid indexing) • extend semantic validation of ISA archive • augment annotation by suggesting additions • facilitate curation work • create new mappings to other frameworks (OPML model, SIO,)Friday, 13 July 2012
  42. 42. 33 Publication... ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar; Chris Taylor; Kimberly Begley; Dawn Field; Stephen Harris; Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna-Assunta Sansone BioinformaAcs  2010  26:  2354-­‐2356Friday, 13 July 2012
  43. 43. 34 Acknowledgements Groups and individuals participating in: MIBBI http://mibbi.org ISA-­‐Tab  format http://isatab.sf.net OBO  Foundry http://obofoundry.org OBI: http://obi-ontology.org/page/Main_Page collaborators at: ISA Infrastructure Team: Cambridge University Alejandra Gonzalez-­‐Beltran  (Oxford) EuNuGO Harvard School for Public Health Eamonn Maguire  (Oxford) FDAs NCTR Philippe Rocca-­‐Serra  (Oxford) Leibniz Plant Institute NERCs NEBC SIDR,  INIST Metabolights,  EMBL-­‐EBI Funders: EU Carcinogenomics Project UK  BBSRCFriday, 13 July 2012
  44. 44. 35 Groups and individuals participating in: Winston Hide: HSPH Oliver Hoffman: HSPH Shannan Ho Sui : HSPH Brad Chapman: HSPH Christoph Steinbeck: Metabolights Kenneth Haug: Metabolights Paula de Matos: Metabolights Magali Roux: INIST Florian Mazur: INIST Alain Zasadzinki: INIST Marie Christine Jacquemot: INIST Nina Jeliazkova: ToxBank And many more who have to forgive us!Friday, 13 July 2012
  45. 45. 36 Questions:Friday, 13 July 2012
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×