4A2B2C-2013

2,759 views
2,897 views

Published on

Published in: Education, Technology
2 Comments
1 Like
Statistics
Notes
No Downloads
Views
Total views
2,759
On SlideShare
0
From Embeds
0
Number of Embeds
1,887
Actions
Shares
0
Downloads
5
Comments
2
Likes
1
Embeds 0
No embeds

No notes for slide

4A2B2C-2013

  1. 1. 1 The ISA infrastructure: supporting bio-scientists from experimental design to data publication Alejandra González-Beltrán, Ph.D University of Oxford e-Research Centre, UK alejandra.gonzalezbeltran@oerc.ox.ac.uk 4to. Congreso Argentino de Bioinformática y Biología Computacional (4CAB2C) & 4ta. Conferencia Internacional de la Sociedad Iberoamericana de Bioinformática (SolBio) 29-31 October 2013, Rosario, Argentina
  2. 2. h"p://www.nature.com/news/2011/110111/full/469139a.html;
  3. 3. h"p://www.nature.com/news/2011/110111/full/469139a.html; h"p://www.economist.com/node/215285937
  4. 4. h"p://www.nature.com/news/2011/110111/full/469139a.html; h"p://www.economist.com/node/215285937 h"p://www.ny*mes.com/2011/07/08/health/research/08genes.html:
  5. 5. Ioannidis( et( al.,( Repeatability( of( published( microarray( gene(expression(analyses.(Nature'Gene*cs(41(2),(149@55( (2009)(doi:10.1038/ng.295(( 3
  6. 6. Ioannidis( et( al.,( Repeatability( of( published( microarray( gene(expression(analyses.(Nature'Gene*cs(41(2),(149@55( (2009)(doi:10.1038/ng.295(( 3
  7. 7. Experimental workflow Planning Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  8. 8. Experimental workflow data + metadata Planning Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  9. 9. Experimental workflow data + metadata Planning Use existing data Publication Data Collection Perform new experiment Data Scientist Data Management Visualization Analysis y lit ibi uc d ro ep eR nc cie S
  10. 10. Experimental workflow data + metadata Planning Use existing data Publication Data Collection Perform new experiment Data Scientist Data Management Visualization Analysis y lit ibi uc d ro ep eR nc cie S
  11. 11. Experimental workflow data + metadata Planning Use existing data Publication Data Collection Perform new experiment Data Scientist Data Management Visualization Analysis y lit ibi uc d ro ep eR nc cie S
  12. 12. Experimental workflow Planning Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  13. 13. Experimental workflow Planning metadata Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis retrospective Perform new experiment
  14. 14. Experimental workflow Planning metadata Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  15. 15. Experimental workflow metadata metadata prospective Planning metadata Use existing data Publication Data Collection metadata metadata Data Scientist Data Management Visualization metadata Analysis metadata Perform new experiment
  16. 16. Experimental workflow metadata metadata prospective Planning metadata Use existing data Publication Data Collection metadata metadata Perform new experiment Data Scientist Data Management Visualization metadata Analysis metadata metadata tracking infrastructure
  17. 17. Experimental workflow metadata metadata prospective Planning metadata Use existing data Publication Data Collection metadata metadata Perform new experiment Data Scientist Data Management Visualization metadata Analysis metadata metadata tracking infrastructure
  18. 18. Experimental workflow Planning Planning Use existing data Publication Data Collection Perform new experiment Use existing data Publication Data Scientist Data Scientist Data Management Visualization Analysis Data Collection Data Management Visualization ity bil sa eu aR at D Analysis Perform new experiment
  19. 19. Experimental workflow en id Ev Planning ce na ve ro P ce n ra T Data Collection Use existing data Perform new experiment Publication Data Scientist ity bil sa eu aR at ce ility D en ib ci uc S d ro ep R Analysis nt e cc A Perform new experiment Data Management Visualization Analysis sm es ss A Data Collection Data Scientist Data Management Visualization y Planning Use existing data Publication ea c lit bi un o ab t ity il Re rie t al v i M ng ni
  20. 20. Motivation heterogeneous experimental data Data Collection
  21. 21. Motivation formats and database fragmentation Publication
  22. 22. Roadmap • • • • • • Importance of data+metadata availability Experimental workflow Multi-omic experiments, heterogeneous data & formats The Investigation/Study/Assay (ISA) infrastructure • • • Experimental workflow revisited
  23. 23. Roadmap • • • • • • Importance of data+metadata availability Experimental workflow Multi-omic experiments, heterogeneous data & formats The Investigation/Study/Assay (ISA) infrastructure • • • Experimental workflow revisited
  24. 24. Different communities 12
  25. 25. Different communities allow&data&to&flow&from& one&system&to&another& use&the&same&term&to& refer&to&the&same&‘thing’& report&the&same&core,&& essen.al&informa.on&& 12
  26. 26. Different communities Challenges: lack of interaction & coordination, duplication of effort, fragmentation & uneven coverage... hampers interoperability allow&data&to&flow&from& one&system&to&another& use&the&same&term&to& refer&to&the&same&‘thing’& report&the&same&core,&& essen.al&informa.on&& 12
  27. 27. Planning Data Collection 13
  28. 28. Planning Data Collection 13
  29. 29. Planning Data Collection 13
  30. 30. Planning Data Collection 13
  31. 31. The infrastructure generic format for experimental description and data exchange community engagement open source software tools 14
  32. 32. 15
  33. 33. st se ruc m tu an re tic s 16
  34. 34. st se ruc m tu an re tic s 17
  35. 35. Experimental workflow - graph representation H1.sample1 H1.sample1.labeled ... Scanning h1-s1.cel ... Labeling Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H. Sapiens 33 Years H1.sample2 H2.sample1 Labeling H2.sample1.labeled
  36. 36. Experimental workflow - graph representation Labeling H1.sample1.labeled ... Scanning h1-s1.cel ... H1.sample1 Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H1.sample2 Labeling H2.sample1 H2.sample1.labeled H. Sapiens 33 Years Spreadsheets for end-users ... H1 H. Sapiens 35 Years H1.sample1 H1 H. Sapiens 35 Years H1.sample2 H2 H. Sapiens 33 Years H2.sample1 Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel Scanning Labeling Scanning h1-s2.cel Scanning h2-s1.cel vocabulary for the description of the experimental workflow
  37. 37. Experimental workflow - graph representation Labeling H1.sample1.labeled ... Scanning h1-s1.cel ... H1.sample1 Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H1.sample2 Labeling H2.sample1 H2.sample1.labeled H. Sapiens 33 Years Spreadsheets for end-users ... H1 H. Sapiens 35 Years H1.sample1 H1 H. Sapiens 35 Years H1.sample2 H2 H. Sapiens 33 Years H2.sample1 Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel Scanning Labeling Scanning h1-s2.cel Scanning h2-s1.cel vocabulary for the description of the experimental workflow syntactic interoperability across biological experiments of different types
  38. 38. Hybridiza9on' Assay'Name' Sample'Name' Material'Type' ' Assay'Design'REF' Array'Data'File' Protocol'REF' Derived'Array'Data'File' ' sample1' genomic'DNA' assay1' A-AFFY-107! assay1.cel' data'normaliza9on' assay1.txt' sample2' genomic'DNA' assay2' A-AFFY-107! assay2.cel' data'normaliza9on' assay2.txt' sample3' genomic'DNA' assay3' A-AFFY-107! assay3.cel' data'normaliza9on' assay3.txt' Material'transforma9ons...' Data'File'Node' Material'Node' ! DATA! ! Characteristics[…]! Material! Factor Value[…] (independent variables)! Material Type! Comment[…]! ! Material! Derived Data File! Protocol' Process' Parameter!Value! […]! Performer!!(operator effect)! !Date!(day effect)! ! DATA! Raw Data File!
  39. 39. A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework (ISA-Tab and/or format) to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: • environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics 23 • stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communities working to build a library of cellular signatures
  40. 40. Implementa)on+at+Harvard+ ISA h1p://discovery.hsci.harvard.edu/+ 24
  41. 41. h5p://www.ebi.ac.uk/metabolights+ Implementa)on+at+the++ European+Bioinforma)cs+Ins)tute+ 25
  42. 42. 1
  43. 43. Create template(s) to fit the type of experiments to be described! ! Create!templates(detailing!the!steps!to!be!reported! for!different!inves4ga4ons,!complying(to( community(standards,!e.g.!configuring!the!value(s)! allowed!for!each!field!to!be!! •  text!(with/without!regular!expression!tes4ng),! •  ontology!terms,! •  numbers!etc.& ! We#now#have#GSC#compliant#configura7ons#for# submission#to#ENA.# & & & 29
  44. 44. Or describe, curate your experiment using a desktop-based tool! Report and edit the description using this tool, (also customized using the templates) with a spreadsheet like look and feel, packed with functionalities such as ! •  ontology search (access via )! •  term-tagging features! •  import from spreadsheets etc…! ! 30
  45. 45. 31 Data Collection Developed to be a user friendly way to enter standardscompliant metadata: it has lots of features...
  46. 46. + Design Wizard Planning
  47. 47. •  Ontology(search(and(automated(tagging(((relying(on(( NCBO(Bioportal(services)(on(Google(Spreadsheets( •  Collabora=ve(annota=on;(support(for(distributed(users( •  Version(control(&(history( OntoMaton:(a(Bioportal(powered( Ontology(widget(for(Google( Spreadsheets( Maguire(et(al,((2013( Bioinforma?cs( 33 Data Collection
  48. 48. Data Management Data Management
  49. 49. Data Management Data Management
  50. 50. Data Management Shifting towards a new system Data Management
  51. 51. Data Management Shifting towards a new system Data Management
  52. 52. Data Management Shifting towards a new system Data Management
  53. 53. Analysis Analysis The interesting bit...doing something with our data and metadata... Analysis of ISA Tab data in the R language. Brings together the context and data to enable more meaningful analysis. Also suggests packages to use for analysis based on the data types in the ISA Tab file. Analysis of ISA-Tab data in the Galaxy Environment. Analysis of ISA-Tab data in the GenomeSpace Environment. Creates Galaxy Library objects from ISA-Tab files. Load and edit files stored on distributed servers. Created by Brad Chapman at the Harvard School for Public Health
  54. 54. 1 Experiment Design 2 Collect Samples 3 4 Run Assays 5 Analysis 70% SAMPLE 1 FILE 1 SAMPLE2 SAMPLE 2 FILE 2 SAMPLE3 SAMPLE 3 FILE 3 SAMPLE4 SAMPLE 4 FILE 4 SAMPLE5 SAMPLE 5 FILE 5 SAMPLE6 SAMPLE 6 FILE 6 SAMPLE7 SAMPLE 7 FILE 7 SAMPLE8 SAMPLE 8 FILE 8 SAMPLE9 SAMPLE 9 FIL SAMPLE10 SAMPLE 10 FIL SAMPLE11 Arabidopsis thaliana 100% 90% SAMPLE1 SAMPLE 11 FIL 6 Treatment groups Parses ISA-Tab datasets into R objects, allowing to update them and save them after analysis. Bridges the ISA-Tab metadata to analysis pipelines of specific assay types, by building objects for use in other R packages downstream: currently considering mass spectrometry (xmcs package, xcmsSet) and DNA microarray (Biobase package, ExpressionSet) Suggests packages in BioConductor that might be relevant for an assay type, according to the BioCViews annotations. 39 Gonzalez-Beltran et al. The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again. In press
  55. 55. data submission 41 Publication
  56. 56. data submission 41 Publication
  57. 57. data submission 41 Publication
  58. 58. Publication Publication Getting your work out there... Share, link and reason over experiments with linked data Publish, along with your research articles & specialised community repositories
  59. 59. http://www.gigasciencejournal.com/content/1/1/3#B19 http://gigasciencejournal.com
  60. 60. http://www.gigasciencejournal.com/content/1/1/3#B19 http://gigadb.org/dataset/100035 http://gigasciencejournal.com
  61. 61. • • • • • New open-access, online-only publication for descriptions of scientifically valuable datasets Only content type: Data Descriptor, narrative + structured parts Initially focused on the life, environmental and biomedical sciences Data Descriptor will be complementary to traditional research journals and data repositories Designed to foster data sharing and reuse, and ultimately to accelerate scientific discovery www.nature.com/scientificdata
  62. 62. Data Descriptors served by Scientific Data Structured Section Narrative Section A brief article-like document like with: •Title •Abstract Detailed descriptions of the experimental procedures used to produce the data •Following community-defined minimum information requirements •Background & Summary •for a level of detail sufficient to reproduce the experiments •Methods •Using ontologies & controlled-vocabularies •Technical Validation •To maximise consistency of the descriptions •Usage Notes •Figures & Tables •References www.nature.com/scientificdata
  63. 63. Data Descriptors served by Scientific Data Structured Section Narrative Section A brief article-like document like with: •Title •Abstract Detailed descriptions of the experimental procedures used to produce the data •Following community-defined minimum information requirements •Background & Summary •for a level of detail sufficient to reproduce the experiments •Methods •Using ontologies & controlled-vocabularies •Technical Validation •To maximise consistency of the descriptions •Usage Notes •Figures & Tables •References www.nature.com/scientificdata
  64. 64. Planning Publication Data Collection Data Scientist Data Management Visualization Analysis ity il cib du ro ep R ce ien Sc
  65. 65. core isa team funders 50
  66. 66. Thanks for your attention! Questions? You can email us... isatools@googlegroups.com View our website http://www.isa-tools.org View our Git repo & contribute http://github.com/ISA-tools View our blog http://isatools.wordpress.com Follow us on Twitter @isatools

×