1
Philippe Rocca-Serra Ph.D
University of Oxford e-Research Centre, UK
Don’t forget the “little data”..
Context and Proven...
2
Provenance
2
2
Provenance
2
2
Provenance
2
3
MAINTHEME: PROVENANCE
It is all about structuring experimental information to make it available to computer
and software...
3
MAINTHEME: PROVENANCE
It is all about structuring experimental information to make it available to computer
and software...
3
MAINTHEME: PROVENANCE
It is all about structuring experimental information to make it available to computer
and software...
3
MAINTHEME: PROVENANCE
It is all about structuring experimental information to make it available to computer
and software...
Contextual Data & Experimental Metadata?
4
4
Contextual Data & Experimental Metadata?
• “Data about the Data”
–description of the data (descriptive metadata)
4
4
Contextual Data & Experimental Metadata?
• “Data about the Data”
–description of the data (descriptive metadata)
• How muc...
Contextual Data & Experimental Metadata?
• “Data about the Data”
–description of the data (descriptive metadata)
• How muc...
5
isacommons
S te m C e ll C om m ons
Nanotechnology
Informatics Working
Group
A growing ecosystem of over
30 public and int...
ISA users
• Carcinogenomics Project (EU-FP6 IP)
• Dixa project (EU-FP7 IP)
• ToxBank Project (FP7-HEALTH-2010-Alternative-...
Why ISA format andTools?
investigation
assay(s) assay(s)
data data
external files in
native or other for-
mats
pointers to...
Essentials about ISA syntax
9
9
Essentials about ISA syntax
• 3 types of files
9
9
Essentials about ISA syntax
• 3 types of files
• Investigation file: at max 1 (think executive summary)
– Why? general study...
Essentials about ISA syntax
• 3 types of files
• Investigation file: at max 1 (think executive summary)
– Why? general study...
Essentials about ISA syntax
• 3 types of files
• Investigation file: at max 1 (think executive summary)
– Why? general study...
Features of ISA model
10
10
Features of ISA model
• generic constructs to describe inputs and outputs for
processes (material processing or data proce...
Features of ISA model
• generic constructs to describe inputs and outputs for
processes (material processing or data proce...
ISA configurations
Available from:
https://github.com/ISA-tools/Configuration-Files
• Assembling workflow archetypes
• Settin...
ISA configurations
Rely on Biosharing to survey the landscape of
community requirements
ISAconfigurator: Supporting tool
htt...
ISA configurations
Rely on Biosharing to survey the landscape of
community requirements
ISAconfigurator: Supporting tool
htt...
ISAconfiguratorTables
13
13
ISAconfiguratorTables
14
14
Tools for creating ISA-Tab documents:
ISAcreator
15
15
isacreator
Developed to be a user friendly way to
enter standards-compliant metadata: it
has lots of features...
But these...
Select and Annotate in ISAcreator
17
17
ISACreator Wizard: automatic template generation
Prerequisites and Conditions of use:
-supports factorial design experimen...
19
ISAcreator features: automatic template generation
19
20
ISAcreator features: mapping to third party table (ETL function)
20
20
ISAcreator features: mapping to third party table (ETL function)
20
Extending ISAcreator
The Plugin Architecture
21
21
How do ISA tools access Ontology servers?
22
22
Plugins in ISAcreator
•Plugins can be developed for 3 different purposes:
In ISAcreator, we use the Apache Felix implement...
Plugins...example 1 Novartis Metastore Search
Search function on the Novartis
Metastore... integrates search results
on th...
Plugins Example 2 - Metabolite Identification plugin
5
Credits: Kenneth Haug: Metabolights
25
25
ISAcreator features: visualizing experimental workflows
Work completed during investigation of new approach for creation of...
27
Making the most of Experimental Plan
• Working prospectively: Programmatic creation of ISA
Tables
• ISAWizard to quickl...
28
Communication with Instrumentation
• Survey existing software API
– understand input and output
– are there xml message...
29
• https://github.com/ISA-tools/ISAcreator/wiki/API
29
29
This	
  bit	
  of	
  code	
  indicates	
  you	
  need	
  to	
  
invoke	
  ISA	
  configuraBon	
  which	
  define	
  
expe...
OntoMaton: Searching andTagging
30
30
2
31
3
32
33
•  R"package"available"since"BioConductor"2.11"
h:p://www.bioconductor.org/packages/release/bioc/html/Risa.html"
•  Fun...
4
34
Ongoing Work
35
35
5
36
• New open-access, online-only publication for descriptions of scientifically valuable datasets
• Only content type: Data ...
Narrative Section
A brief article-like document like with:
•Title
•Abstract
•Background & Summary
•Methods
•Technical Vali...
6
39
ISA2OWL: mapping in the OBO Foundry space and SIO
40
• Make ISA semantics explicit and serialize ISA representation as Lin...
ISA2OWL: mapping in the OBO Foundry space and SIO
40
• Make ISA semantics explicit and serialize ISA representation as Lin...
ISA2OWL: mapping in the OBO Foundry space and SIO
40
• Make ISA semantics explicit and serialize ISA representation as Lin...
41
New graph based web application
41
41
New graph based web application
41
41
New graph based web application
41
41
New graph based web application
41
41
New graph based web application
41
41
New graph based web application
41
42
42
Questions??
You can email us...
isatools@googlegroups.com
View our blog
http://isatools.wordpress.com
Follow us onTwitter
...
Upcoming SlideShare
Loading in...5
×

Eagle Bioinformatics Symposium: 4. Philippe Rocca-Serra: Don't Forget the Small Data: Experimental Metadate Tracking Using the ISA Infrastructure

228

Published on

Reporting experimental plans should not be a second thought. Instrument output without relevant and accurate descriptors is of little benefit to the community. ISA infrastructure is a suite of tools geared towards facilitating good dataset stewardship by providing the necessary means for data managers to annotate, curate, report and ultimately publish their scientific results. In this presentation, we will highlight key features, ongoing development and collaborations to demonstrate the value and flexibility of the resources.

Published in: Healthcare, Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
228
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Eagle Bioinformatics Symposium: 4. Philippe Rocca-Serra: Don't Forget the Small Data: Experimental Metadate Tracking Using the ISA Infrastructure

  1. 1. 1 Philippe Rocca-Serra Ph.D University of Oxford e-Research Centre, UK Don’t forget the “little data”.. Context and Provenance are essential philippe.rocca-serra@oerc.ox.ac.uk Eagle’s 4th Symposium, UK, March, 27th 2014 Babraham Research Campus, Cambridge 1
  2. 2. 2 Provenance 2
  3. 3. 2 Provenance 2
  4. 4. 2 Provenance 2
  5. 5. 3 MAINTHEME: PROVENANCE It is all about structuring experimental information to make it available to computer and software agents to enable: Traceability, which relates to the notion of planning, assessment and evaluation relates to the notion of accountability, reliability, trust, evidence relates to the notion of conservation, preservation, storage, archiving and mining But let’s proceed gradually… 3
  6. 6. 3 MAINTHEME: PROVENANCE It is all about structuring experimental information to make it available to computer and software agents to enable: Traceability, which relates to the notion of planning, assessment and evaluation relates to the notion of accountability, reliability, trust, evidence relates to the notion of conservation, preservation, storage, archiving and mining But let’s proceed gradually… Notes in Lab Books (information for humans) 3
  7. 7. 3 MAINTHEME: PROVENANCE It is all about structuring experimental information to make it available to computer and software agents to enable: Traceability, which relates to the notion of planning, assessment and evaluation relates to the notion of accountability, reliability, trust, evidence relates to the notion of conservation, preservation, storage, archiving and mining But let’s proceed gradually… Notes in Lab Books (information for humans) Facts as RDF statements (information for machines) 3
  8. 8. 3 MAINTHEME: PROVENANCE It is all about structuring experimental information to make it available to computer and software agents to enable: Traceability, which relates to the notion of planning, assessment and evaluation relates to the notion of accountability, reliability, trust, evidence relates to the notion of conservation, preservation, storage, archiving and mining But let’s proceed gradually… Notes in Lab Books (information for humans) Spreadsheets andTables ( the compromise) Facts as RDF statements (information for machines) 3
  9. 9. Contextual Data & Experimental Metadata? 4 4
  10. 10. Contextual Data & Experimental Metadata? • “Data about the Data” –description of the data (descriptive metadata) 4 4
  11. 11. Contextual Data & Experimental Metadata? • “Data about the Data” –description of the data (descriptive metadata) • How much metadata is needed? –CNL_MOA1_C2_LD_TP1_EWR.fastq.gz –“it is all in the file name” approach 4 4
  12. 12. Contextual Data & Experimental Metadata? • “Data about the Data” –description of the data (descriptive metadata) • How much metadata is needed? –CNL_MOA1_C2_LD_TP1_EWR.fastq.gz –“it is all in the file name” approach • Is this enough to understand what this experiment is about ....5 years from now? 4 4
  13. 13. 5
  14. 14. isacommons S te m C e ll C om m ons Nanotechnology Informatics Working Group A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards- compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: users and publications: http:// isacommons.org 6 Novartis , Jansen 6
  15. 15. ISA users • Carcinogenomics Project (EU-FP6 IP) • Dixa project (EU-FP7 IP) • ToxBank Project (FP7-HEALTH-2010-Alternative-Testing- Strategies-TAB format + ISA2RDF tool) • Metabolights Repository (EMBL-EBI) • ISA-TAB nano for nanoparticle characterisation (NCI caNano) • Long standing relationship with NCTR FDA (Littlerock) • Scientific Data NPG 7 7
  16. 16. Why ISA format andTools? investigation assay(s) assay(s) data data external files in native or other for- mats pointers to data file names/location investigation high level concept to link related studies study the central unit, containing information on the subject under study, its characteristics and any treatments applied. a study has associated assays assay test performed either on material taken from the sub- ject or on the whole initial subject, which produce quali- tative or quantitative meas- urements (data) H. Sapiens 33 Years H. Sapiens H. Sapiens H. Sapiens H1 H1 H2 35 35 33 Years Years Years H1.sample1 H1.sample2 H2.sample1 Labeling Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel h1-s2.cel h2-s1.cel H1 H2 H1.sample1 H1.sample2 H2.sample1 Labeling Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel h1-s2.cel h2-s1.cel H. Sapiens 35 Years MAGE-Tab Pride-xml SRA-xml ISA metadata specifications: •workflow and process orientated •compatible with checklist enforcement •compatible with external vocabulary resources •compatible by design with existing schemas Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG,Toxbank Consortium) 8 8
  17. 17. Essentials about ISA syntax 9 9
  18. 18. Essentials about ISA syntax • 3 types of files 9 9
  19. 19. Essentials about ISA syntax • 3 types of files • Investigation file: at max 1 (think executive summary) – Why? general study description – How? methods / protocol declaration – How? variable declarations (factors and response variable) – Who? contact and affiliation information 9 9
  20. 20. Essentials about ISA syntax • 3 types of files • Investigation file: at max 1 (think executive summary) – Why? general study description – How? methods / protocol declaration – How? variable declarations (factors and response variable) – Who? contact and affiliation information • Study File: true table (think sorting, filtering) – What? Listing all biological materials collected over the study course. 9 9
  21. 21. Essentials about ISA syntax • 3 types of files • Investigation file: at max 1 (think executive summary) – Why? general study description – How? methods / protocol declaration – How? variable declarations (factors and response variable) – Who? contact and affiliation information • Study File: true table (think sorting, filtering) – What? Listing all biological materials collected over the study course. • Assay File: true table (think sorting, filtering) – Results! Listing all data files collected by a given assay – n files, as many as there are assay types declared 9 9
  22. 22. Features of ISA model 10 10
  23. 23. Features of ISA model • generic constructs to describe inputs and outputs for processes (material processing or data processing) –overall, description of experimental workflow 10 10
  24. 24. Features of ISA model • generic constructs to describe inputs and outputs for processes (material processing or data processing) –overall, description of experimental workflow • extensible: –allow support of new assays while reusing existing components –Need for more semantic support for Assay descriptions •resources such as OBI. BAO. SIO. to define endpoints and techniques •Gaps in semantics remains and needs to be tackled 10 10
  25. 25. ISA configurations Available from: https://github.com/ISA-tools/Configuration-Files • Assembling workflow archetypes • Setting annotation requirements – for compliance with database schemas (SRA, MAGE, PRIDE) – for compliance with community based requirements (MIAME, MIAPE,MIMS....) • Guide users – Provide preassembled templates – specify vocabulary support ISAconfigurator: Supporting tool https://github.com/ISA-tools/ISAconfigurator http://isatab.sourceforge.net/assets/img/tools/tools-table- images/configurator.png 11 11
  26. 26. ISA configurations Rely on Biosharing to survey the landscape of community requirements ISAconfigurator: Supporting tool https://github.com/ISA-tools/ISAconfigurator http://isatab.sourceforge.net/assets/img/tools/tools-table- images/configurator.png 12 12
  27. 27. ISA configurations Rely on Biosharing to survey the landscape of community requirements ISAconfigurator: Supporting tool https://github.com/ISA-tools/ISAconfigurator http://isatab.sourceforge.net/assets/img/tools/tools-table- images/configurator.png 12 12
  28. 28. ISAconfiguratorTables 13 13
  29. 29. ISAconfiguratorTables 14 14
  30. 30. Tools for creating ISA-Tab documents: ISAcreator 15 15
  31. 31. isacreator Developed to be a user friendly way to enter standards-compliant metadata: it has lots of features... But these are just some of them...we also have a data entry wizard and an import utility... The ISAcreator... 16 16
  32. 32. Select and Annotate in ISAcreator 17 17
  33. 33. ISACreator Wizard: automatic template generation Prerequisites and Conditions of use: -supports factorial design experiments, meaning sets of discrete factor levels combined together, to define a treatment 2x2 factorial design as in 2 compounds and 2 time points 2x2x3 factorial design as in 2 compounds, 2 time points, 2 doses -assumes one sample collection event (all samples collected at sacrifice time) -supports some but not all currently available assay types -supports fractional factorial design -supports unbalanced factor group population sizes (ethical considerations for high dose toxic exposures) -generates automatically sample identifiers, human readable & meaning full labels and , if requested, barcodes 18 18
  34. 34. 19 ISAcreator features: automatic template generation 19
  35. 35. 20 ISAcreator features: mapping to third party table (ETL function) 20
  36. 36. 20 ISAcreator features: mapping to third party table (ETL function) 20
  37. 37. Extending ISAcreator The Plugin Architecture 21 21
  38. 38. How do ISA tools access Ontology servers? 22 22
  39. 39. Plugins in ISAcreator •Plugins can be developed for 3 different purposes: In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good. Search (adds extra search space for ontology tool) Custom cell editors (for spreadsheet) Extra general functionality (which appears in a plugin menu) •2 Examples of ISA plugins: • Access to local metadata stores: Novartis Plugin to Ontology Widget • Annotation of findings: Metabolite Identification Plugin (Metabolights Repository contribution to ISA project). 23 23
  40. 40. Plugins...example 1 Novartis Metastore Search Search function on the Novartis Metastore... integrates search results on the metastore in the Ontology search tool. So, with the Novartis plugin in your Plugin directory, you’ll be able to search the Novartis metastore directly within ISAcreator, and it will handle all the tasks involved with recording term source, etc. 24 24
  41. 41. Plugins Example 2 - Metabolite Identification plugin 5 Credits: Kenneth Haug: Metabolights 25 25
  42. 42. ISAcreator features: visualizing experimental workflows Work completed during investigation of new approach for creation of glyphs with use of taxonomy for guidance. See Maguire et al,Taxonomy-Based Glyph Design – with a Case Study onVisualizing Workflows of Biological Experiments, IEEETransactions onVisualization and Computer Graphics, 2012 26 26
  43. 43. 27 Making the most of Experimental Plan • Working prospectively: Programmatic creation of ISA Tables • ISAWizard to quickly create ISATables –a component of ISAcreator –to be expanded to accommodate more advanced study designs • Use ISAcreator API to manipulate / create ISA tables –more information on github: • https://github.com/ISA-tools/ISAcreator/wiki/API 27
  44. 44. 28 Communication with Instrumentation • Survey existing software API – understand input and output – are there xml messages that can be harnessed? – is it possible to have an instrument to read ISA-Table? – is it possible to have an instrument to write to ISA-Table • Lemnatec instruments – include barcode/qrcode reader • harness ISAtools ability to create barcode/QRcodes – devise workflows • identify key nodes (objects) • identify key data types • agreement of patterns 28
  45. 45. 29 • https://github.com/ISA-tools/ISAcreator/wiki/API 29
  46. 46. 29 This  bit  of  code  indicates  you  need  to   invoke  ISA  configuraBon  which  define   expected  table  layout  in  order  to   proceed • https://github.com/ISA-tools/ISAcreator/wiki/API 29
  47. 47. OntoMaton: Searching andTagging 30 30
  48. 48. 2 31
  49. 49. 3 32
  50. 50. 33 •  R"package"available"since"BioConductor"2.11" h:p://www.bioconductor.org/packages/release/bioc/html/Risa.html" •  Func@onality"for"parsing"ISAFTab"datasets"into"R"objects," saving"and"upda@ng"them." •  It"bridges"the"ISAFTab"metadata"to"analysis"pipelines"of" specific"assay"types,"by"building"objects"for"use"in"other"R" packages"downstream" –  "currently"considering"mass"spectrometry"(xmcs"package,"xcmsSet)" and"DNA"microarray"(Biobase"package,"ExpressionSet)" " Run Assays4 SAMPLE1 SAMPLE2 SAMPLE3 SAMPLE4 SAMPLE5 SAMPLE6 SAMPLE7 SAMPLE8 SAMPLE9 SAMPLE10 SAMPLE11 SAMPLE 1 SAMPLE 2 SAMPLE 3 SAMPLE 4 SAMPLE 5 SAMPLE 6 SAMPLE 7 SAMPLE 8 SAMPLE 9 SAMPLE 10 SAMPLE 11 FILE 1 FILE 2 FILE 3 FILE 4 FILE 5 FILE 6 FILE 7 FILE 8 FIL FIL FIL Experiment Design Analysis Arabidopsis thaliana Treatment groups 70% 90% 100% Collect Samples1 2 3 5 6 33
  51. 51. 4 34
  52. 52. Ongoing Work 35 35
  53. 53. 5 36
  54. 54. • New open-access, online-only publication for descriptions of scientifically valuable datasets • Only content type: Data Descriptor, narrative + structured parts • Initially focused on the life, environmental and biomedical sciences • Data Descriptor will be complementary to traditional research journals and data repositories • Designed to foster data sharing and reuse, and ultimately to accelerate scientific discovery www.nature.com/scientificdata 37
  55. 55. Narrative Section A brief article-like document like with: •Title •Abstract •Background & Summary •Methods •Technical Validation •Usage Notes •Figures & Tables •References Structured Section Detailed descriptions of the experimental procedures used to produce the data •Following community-defined minimum information requirements • for a level of detail sufficient to reproduce the experiments •Using ontologies & controlled-vocabularies • To maximise consistency of the descriptions www.nature.com/scientificdata Data Descriptors served by Scientific Data 38
  56. 56. 6 39
  57. 57. ISA2OWL: mapping in the OBO Foundry space and SIO 40 • Make ISA semantics explicit and serialize ISA representation as Linked Data • Maximize Annotation Markups and OntologyTerms • Augment ISA semantics with new constructs (study groups and their size) allowing further exploration • SemanticValidation 40
  58. 58. ISA2OWL: mapping in the OBO Foundry space and SIO 40 • Make ISA semantics explicit and serialize ISA representation as Linked Data • Maximize Annotation Markups and OntologyTerms • Augment ISA semantics with new constructs (study groups and their size) allowing further exploration • SemanticValidation 40
  59. 59. ISA2OWL: mapping in the OBO Foundry space and SIO 40 • Make ISA semantics explicit and serialize ISA representation as Linked Data • Maximize Annotation Markups and OntologyTerms • Augment ISA semantics with new constructs (study groups and their size) allowing further exploration • SemanticValidation •  Make%the%seman+cs%of%ISA2Tab%explicit%by%conver+ng%ISA2Tab%files%in Linked%Data%(using%web%standards%to%connect%related%data)% •  Triples%of%<subject,%predicate,%object>%with%iden+fiable%en++es% –  e.g%<lipoprotein>%<par+cipates_in>%<inflammatory%response>% %%%%%%%%<PRO:212342352>%<BFO_0000056>%<GO:0006954>% %%%%%% % 40
  60. 60. 41 New graph based web application 41
  61. 61. 41 New graph based web application 41
  62. 62. 41 New graph based web application 41
  63. 63. 41 New graph based web application 41
  64. 64. 41 New graph based web application 41
  65. 65. 41 New graph based web application 41
  66. 66. 42 42
  67. 67. Questions?? You can email us... isatools@googlegroups.com View our blog http://isatools.wordpress.com Follow us onTwitter @isatools View our website http://www.isa-tools.org Thanks for listening... View our Git repo & contribute http://github.com/ISA-tools 43 43
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×