0
The buzz around reproducible bioscience data:the policies, the communities and the standards            Susanna-Assunta Sa...
Lab scientist!                 Data scientist!                   Team Leader!    Consultant!
Oxford e-Research Centre
Oxford e-Research Centre
Oxford e-Research Centre             Providing research             computing, high-             performance             c...
Oxford e-Research Centre          Collaborating with European and wider          international groups in, e.g.:           ...
My team’s activities and groups we work with      data management and biocuration, collaborative development           of ...
http://www.flickr.com/photos/12308429@N03/4957994485/   CC BY
Outline        “The buzz around reproducible bioscience data:       the policies, the communities and the standards”“The r...
Preserve    institutional /      corporate       memoryHarmonize collection across sites    Find matching studies     Data...
Utilizepublic dataIdentify suitable data       RetrieveCurate and harmonize     Re-analyze                         11
Addressreproducibility /     reuse of public data                    12
Addressreproducibility /     reuse of public data                    13
Addressreproducibility /     reuse of public data                    Ioannidis et al., Repeatability of published microarr...
Addressreproducibility /     reuse of public data                          15                    15
Addressreproducibility /     reuse of public data                          16                    16
Addressreproducibility /     reuse of public data                          17                    17
http://www.flickr.com/photos/notbrucelee/8016189356/   CC BY
COMPREHENSIBLE    INTEROPERABLE    REPRODUCIBLE       REUSABLEhttp://www.flickr.com/photos/notbrucelee/8016189356/   CC BY
Growing, worldwide movement for reproducible research     Shared, annotated research data and methods offer new discovery ...
Growing, worldwide movement for reproducible research esoteric formats                                           comprehen...
Structure and enrich description of the experiments§  Describe and communicate the information in an unambiguous,    huma...
Structure and enrich description of the experiments§  Describe and communicate the information in an unambiguous,    huma...
Reproducible &     ReusableBioscience Research
reasoning visualizationanalysis browsing integration    exchange retrieval     Well-annotated &     Structured Data    Rep...
reasoning visualization            analysis browsing integration                exchange retrievalCommunity               ...
http://www.flickr.com/photos/lamerentertainment/1581770980/sizes/m/in/photostream/
Today’s bioscience research                                              Publications  Experimental      and computational...
29   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone                 ...
Example from the toxicogenomics domain                        Study looking at the effect of a                        comp...
Example of experiments by                                                                                                 ...
Structured description of datasets                       §  Capture all salient features                           of the...
Not too much, not too little, just ‘right’                          §  We must strike a balance                          ...
Information intensive experiments
Information intensive experiments                     To make the experiments                     comprehensible and reusa...
Common ways to report and share§ The challenges we face  •    Large in volume: lots of data types and metadata!  •    Lot...
Reporting standards – the benefits§  Describe and communicate the information to others, in an    unambiguous manner§  T...
Escalating number of standardization efforts in bioscience,                          e.g.:                                ...
Different community, different norms and standards, e.g.:                                  use the same word and        al...
Is this ‘general mobilization’ good or bad?                                      use the same word and            allow da...
Is this ‘general mobilization’ good or bad?                                        use the same word and              allo...
Fragmentation of the databases and data, e.g.                Access                Storage                Submission Three...
Fragmentation of the databases and data, e.g.                Access                Storage                Submission Three...
Fragmentation of the databases and data, e.g.                Access                Storage                Submission Three...
Fragmentation of the databases and data, e.g.                             AccessDIFFERENTDownload formatsDIFFERENT- Core r...
To integrate data we need interoperable standards                         epidemiologyplant biology                       ...
Need to address the fragmentation§  Promote synergies   •  Among basic academic (omics) research but also regulatory- or ...
Eloquent quotes      “Biologists would rather share their toothbrush      than their gene name”      Michael Ashburner, Pr...
Standards – an old issue, e.g. engineering in 1850§  Buying nuts and bolts is easy today     •  But in the 19th century i...
Standards – an old issue, e.g. engineering in 1850§  Buying nuts and bolts is easy today     •  But in the 19th century i...
Social engeneering51   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sanson...
Ownership of open standards                                          can be problematic in broad,                         ...
The extensive community                                         liaison needs to be managed                               ...
The cost of implementing a                                           standards-supported data                             ...
1. Funders actively developing data policies§  Several data preservation, management and sharing policies have    emerged...
2. Similar trend in the regulatory arena§  “… lack of standardized data affects CDER’s review processes by curtailing a  ...
3. Publishes have become strong advocators§  Continue to support the development of open standards and tools     •  to su...
….the rise of data-driven journals, e.g.:                                        partnering with:
The rise of data-driven journals, e.g.:                                          partnering with:
4. Similar trend in the commercial sector§  R&D has invested heavily in procedures and tools that integrate external    i...
....their information landscape is evolving     Yesterday                                Today                            ...
http://www.flickr.com/photos/idiolector/289490834/   CC BY
Take home messages      “The buzz around reproducible bioscience data:     the policies, the communities and the standards...
Outline        “The buzz around reproducible bioscience data:       the policies, the communities and the standards”“The r...
How do we achieve this? Is it possible to achieve a common,  structured representation of diverse bioscience experiments  ...
Growing number of reporting standards                       MAGE-Tab!     AAO!            miame!                     GCDML...
Growing number of reporting standards                                                      + 303                          ...
But how much do we know about these standards                       MAGE-Tab!     AAO!            miame!                  ...
But how much do we know about these standards            Which tools and     I use high throughput              databases ...
But how much do we know about these standards§  A bewildering array of standards is available, but   •  these are hard to...
(2007) Vol 25 No 11obofoundry.org
Towards Lego-like ontologies §  Compound terms should be formed out of simpler constituents:     •  Body weight         w...
(2008) Vol 26 No 8mibbi.og
§ Serves researchers, biocurators, journal editorsand reviewers, and funders to   §  discover checklists for a particula...
Science                        (2009),Vol 326, 234-236http://biosharing.org
A catalogue to map the                                                                                  landscape of stand...
•    A coherent, curated and searchable catalogue of data sharing resources•    Bioscience standards and associated data-s...
Smith et al, 2007The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone     ...
Smith et al, 2007Taylor, Field, Sansone et al, 2008    The International Conference on Systems Biology (ICSB), 22-28 Augus...
List of databases, linked to standards a collaboration with                                                 Database Issue...
List of databases, linked to standards a collaboration with                                                 Database Issue...
List of databases, linked to standards a collaboration with                                                 Database Issue...
Major challenge: define ‘relations’ among standards                                                                       ...
Example of multi-assays study – how many ‘standards’                are applicable to this?
Example of multi-assays study – how many ‘standards’                are applicable to this?
Example of multi-assays study – how many ‘standards’                are applicable to this?
Example of multi-assays study – how many ‘standards’                are applicable to this?
An exemplar approach to the status quo§  A grass-root collaborative that works to facilitate collection, curation   and s...
An exemplar approach to the status quo§  A grass-root collaborative that works to facilitate collection, curation   and s...
metadata tracking framework                                                           user communityThe International Conf...
General-purpose, configurable format,designed to support the use of severalstandards checklists, terminologies andconversi...
ISA software suite: supporting standards-compliant experimentalannotation and enabling curation at the community level(Roc...
Create template(s) to fit the type of    experiments to be described	    Create templates detailing the steps to be    repo...
Describe, curate your experiment 	     with geographically- distributed     collaborators 	     Report and edit the descri...
Or describe, curate your experiment     using a desktop-based tool	     Report and edit the description using this tool,  ...
ISMB tag:                                                                                                                 ...
Perform data analysis	    	    We are building relevant ISA modules for GenomeSpace, 	    R-based BioConductor and Galaxy ...
Share your experiments with the    world as Linked Open Data	    	    Through conversion to RDF; work in    collaboration ...
Share your experiments with the                      world as Linked Open Data	                      	                    ...
Submit your experiments to public repositories	    	    Directly in ISA-Tab or reformatting using the ISAconverter	5
Create your own repository		Store the investigations in the database, assign access rights andconduct maintenance tasks.	S...
Maguire E, Rocca-Serra P, Sansone SA, Davies J and Chen M.Taxonomy-based Glyph Design -- with a Case Study on VisualizingW...
A growing ecosystem of over 30 public and internal resourcesusing the ISA metadata tracking framework (ISA-Tab and/orforma...
Implementations at Harvard  Importance of a local community
Implementations at Harvarddata sharing in ISA-Tab               Importance of a local community
Implementations at Harvarddata sharing in ISA-Tab               Importance of a local community
Implementation at the EBI113   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunt...
Data papers
Extensions of the           Nanotechnology      Informatics Working Group115    The International Conference on Systems Bi...
Open source codeCommunity involvement and uptake!1st ISA-Tab workshop! 3rd ISA-Tab workshop!      User workshops/visits - ...
Final remarks        “The buzz around reproducible bioscience data:       the policies, the communities and the standards”...
Your research and all (publicly                                funded) research should make                               ...
…..the biggest possible impact!      http://www.flickr.com/photos/webhamster/2582189977/                                  ...
http://www.flickr.com/photos/andrevanbortel/3745527869/sizes/m/in/photostream/
We must increase the level of annotation   Notes in Lab Books       Spreadsheets and Tables   Facts as RDF statements   (i...
122   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone                ...
eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
Upcoming SlideShare
Loading in...5
×

eScience-School-Oct2012-Campinas-Brazil

356

Published on

2 Comments
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
356
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
2
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "eScience-School-Oct2012-Campinas-Brazil"

  1. 1. The buzz around reproducible bioscience data:the policies, the communities and the standards Susanna-Assunta Sansone, PhD Principal Investigator and Team Leader, University of Oxford e-Research Centre, Oxford, UK Slides at: http://www.slideshare.net/SusannaSansone SPSAS e-SciBioEnergy Sao Paolo School of Advanced Science on e-Science for Bioenergy Research, 22-26 Oct, 2012, Campinas, Brazil
  2. 2. Lab scientist! Data scientist! Team Leader! Consultant!
  3. 3. Oxford e-Research Centre
  4. 4. Oxford e-Research Centre
  5. 5. Oxford e-Research Centre Providing research computing, high- performance computing Integrating with national and international infrastructure Supporting leading edge facilities through education and training
  6. 6. Oxford e-Research Centre Collaborating with European and wider international groups in, e.g.: •  energy, •  radio astronomy, •  biological data federation, •  life sciences simulation, •  biodiversity, •  computational chemistry, •  neuroscience, •  digital humanities tools, •  digital music analysis Research in •  computation, •  data infrastructure and analysis, •  visualisation
  7. 7. My team’s activities and groups we work with data management and biocuration, collaborative development of software and database, standards and ontology•  environmental genomics •  stem cell discovery•  metabolomics •  system biology•  metagenomics •  transcriptomics•  nanotechnology •  toxicogenomics•  proteomics •  environmental health env   agro   tox/pharma   health  
  8. 8. http://www.flickr.com/photos/12308429@N03/4957994485/ CC BY
  9. 9. Outline “The buzz around reproducible bioscience data: the policies, the communities and the standards”“The reality from the buzz:how to deliver reproducible bioscience data”
  10. 10. Preserve institutional / corporate memoryHarmonize collection across sites Find matching studies Data dissemination Long-term data stewardship 10
  11. 11. Utilizepublic dataIdentify suitable data RetrieveCurate and harmonize Re-analyze 11
  12. 12. Addressreproducibility / reuse of public data 12
  13. 13. Addressreproducibility / reuse of public data 13
  14. 14. Addressreproducibility / reuse of public data Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 14 149-55 (2009) doi:10.1038/ng.295
  15. 15. Addressreproducibility / reuse of public data 15 15
  16. 16. Addressreproducibility / reuse of public data 16 16
  17. 17. Addressreproducibility / reuse of public data 17 17
  18. 18. http://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
  19. 19. COMPREHENSIBLE INTEROPERABLE REPRODUCIBLE REUSABLEhttp://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
  20. 20. Growing, worldwide movement for reproducible research Shared, annotated research data and methods offer new discovery opportunities and prevent unnecessary repetition of work. Improved data sharing underpins science of the future “Publicly-funded research data are a public good, produced in the public interest” “Publicly-funded research data should be openly available20 to the maximum extent possible” The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  21. 21. Growing, worldwide movement for reproducible research esoteric formats comprehensible? lack of sufficient interoperable? contextual information reusable? hoc or proprietary terminology reproducible?§  Researchers and bioinformaticians in both academic and commercial science, along with funding agencies and publishers, embrace the concept that community-developed standards are pivotal to structure and enrich the annotation of •  entities of interest (e.g., genes, metabolites, phenotypes) and •  experimental steps (e.g., provenance of study materials, technology and measurement types)
  22. 22. Structure and enrich description of the experiments§  Describe and communicate the information in an unambiguous, human and machine readable manner Seven week old C57BL/6N mice were treated with low-fat diet. Liver was dissected out, RNA prepared…etc. Age value Unit Strain name Subject of the experiment Type of diet and Type of protocol - sample treatment experimental condition Type of protocol - nucleic acid extraction Anatomy part
  23. 23. Structure and enrich description of the experiments§  Describe and communicate the information in an unambiguous, human and machine readable manner Figure: credit to OBI consortium
  24. 24. Reproducible & ReusableBioscience Research
  25. 25. reasoning visualizationanalysis browsing integration exchange retrieval Well-annotated & Structured Data Reproducible & Reusable Bioscience Research
  26. 26. reasoning visualization analysis browsing integration exchange retrievalCommunity SoftwareStandards Tools Well-annotated & Structured Data Reproducible & Reusable Bioscience Research
  27. 27. http://www.flickr.com/photos/lamerentertainment/1581770980/sizes/m/in/photostream/
  28. 28. Today’s bioscience research Publications Experimental and computational data§  Is interdisciplinary and integrative in character •  need to deal with new and existing datasets •  deal with a variety of data types§  ‘How the organism works’ is the focus •  Twenty years ago data was the center Source of the figure: EBI website
  29. 29. 29 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone Source: http://ebbailey.wordpress.com www.ebi.ac.uk/net-project
  30. 30. Example from the toxicogenomics domain Study looking at the effect of a compound inducing liver damage by characterizing/measuring - the metabolic profile by MS and NMR - protein expression in liver by MS - gene expression by DNA microarray -  conducting genetic and phenotypical analysis Information contributing to the construction and validation of system biology models
  31. 31. Example of experiments by InnoMed PredTox31 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 a FP6 public-private consortium Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  32. 32. Structured description of datasets §  Capture all salient features of the experimental workflow §  Make annotation explicit and discoverable §  Structure the descriptions for consistency, tracking §  independent variables §  dependent variables using §  cross reference and resolvable identifiers
  33. 33. Not too much, not too little, just ‘right’ §  We must strike a balance between •  depth and breadth of information; and •  sufficient information required to reuse the data
  34. 34. Information intensive experiments
  35. 35. Information intensive experiments To make the experiments comprehensible and reusable, underpinning future investigations, we need common ways to report and share the experimental details and the associated data. Consistent reporting will have a positive and long-lasting impact on the value of collective scientific outputs.
  36. 36. Common ways to report and share§ The challenges we face •  Large in volume: lots of data types and metadata! •  Lots of free text descriptions: hard to mine, subject to mistakes! •  Babel of terminologies: lack of definitions, hard to map! •  Heterogeneous file formats: software lock-in!§ Need for reporting standards •  Minimal reporting descriptors - Report the same ‘core essentials’ •  Controlled vocabularies or ontology - Use the same word and mean the same thing •  Common exchange formats - Make tools interoperable, allow data exchange and integration
  37. 37. Reporting standards – the benefits§  Describe and communicate the information to others, in an unambiguous manner§  To unlock the value in the data •  Compare, query and evaluate data - Facilitate scientific validation of the findings •  Understand variability within/between different technologies and protocols -  Facilitate technical validation -  Enable optimization of the experimental designs -  Identify critical checkpoints and develop quality metrics§  To define submission and/or publication requirements •  Journals •  Databases§  To ensure data integrity, reproducibility and (re)use
  38. 38. Escalating number of standardization efforts in bioscience, e.g.: Genomics StandardsGenome annotation Consortium (GSC)www.geneontology.org gensc.org Functional Enzymology dataGenomics Data standardsSociety (FGED) www.strenda.org www.fged.org HUPO- Proteomics Standards Initiative (PSI) Systems modelling http://www.psidev.info standards www.co.mbine.org Cheminformatics www.ebi.ac.uk/chebi Pathways www.biopax.org Metabolomics Standards Initiative (MSI) http://www.metabolomicssociety.org
  39. 39. Different community, different norms and standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information Challenges:lack of coordination, fragmentation and uneven coverage
  40. 40. Is this ‘general mobilization’ good or bad? use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information§  Difference in structures and processes: •  organization types (open, close to members, society, WG…) •  standards development (how to design, develop, evaluate, maintain…) •  adoption, uptake, outreach (link to journals, funders, commercial sector…) •  funds (sponsors, memberships, grants, volunteering…)
  41. 41. Is this ‘general mobilization’ good or bad? use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information§  Fragmentation of the standards is a major issue •  Being focused on particular communities’ interests, be their individual technologies or biological/biomedical disciplines, leads to duplication of effort, and more seriously, the development of (largely arbitrarily) different standards •  This severely hinders the interoperability of databases and tools and ultimately the integration of datasets
  42. 42. Fragmentation of the databases and data, e.g. Access Storage Submission Three EBIomics systems
  43. 43. Fragmentation of the databases and data, e.g. Access Storage Submission Three EBIomics systems
  44. 44. Fragmentation of the databases and data, e.g. Access Storage Submission Three EBIomics systems
  45. 45. Fragmentation of the databases and data, e.g. AccessDIFFERENTDownload formatsDIFFERENT- Core requirements Storagerepresented- Representation of thestudies and relatedsamples- Curation practicesDIFFERENT SubmissionFormats, terminologies andtools Three EBI omics systems
  46. 46. To integrate data we need interoperable standards epidemiologyplant biology microbiology Biologically-delineated views of the world Generic features ( common core ) - description of source biomaterial - experimental design components MS MS Arrays Gels NMR Technologically-delineated Columns FTIR views of the world Scanning Arrays & Scanning Columnstranscriptomics metabolomics transcriptomics
  47. 47. Need to address the fragmentation§  Promote synergies •  Among basic academic (omics) research but also regulatory- or healthcare-driven initiatives§  Much could be learned from exchange of ideas and practices •  Although, regulatory- or healthcare-driven initiatives have far stricter guidelines •  Although, often SDOs have ‘close’ discussions, require membership§  Create interoperable standards •  Fit neatly into a jigsaw, resolving inconsistency and filling gaps§  Overcome several barriers •  Technical •  Funding issue •  Sociological......
  48. 48. Eloquent quotes “Biologists would rather share their toothbrush than their gene name” Michael Ashburner, Professor Genetics, University of Cambridge, UK “Any customer can have a car painted any colour that he wants so long as it is black” Henry Ford, you know who he is…48 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  49. 49. Standards – an old issue, e.g. engineering in 1850§  Buying nuts and bolts is easy today •  But in the 19th century it was very complicated!
  50. 50. Standards – an old issue, e.g. engineering in 1850§  Buying nuts and bolts is easy today •  But in the 19th century it was very complicated!§  Nuts and bolts were custom made •  Products from different shops were incompatible •  Craftsmen liked the monopoly - Customers were ‘locked in’ !!§  In 1864 William Sellers initiated the standardization •  Mass production •  Get interchangeable parts •  Standardized way to make nuts and bolts§  Generally adopted only after WWII, though …. !!
  51. 51. Social engeneering51 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  52. 52. Ownership of open standards can be problematic in broad, grass-root collaborations; it requires improved models, to encourage maintenance of and contributions to these efforts, supporting their evolutions52 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  53. 53. The extensive community liaison needs to be managed and funded; rewards and incentives need to be identified for all contributors53 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  54. 54. The cost of implementing a standards-supported data sharing vision is as large as the number of stakeholders that must operate synchronously54 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  55. 55. 1. Funders actively developing data policies§  Several data preservation, management and sharing policies have emerged in response to increased funding for omics domains§  Even if in general terms, standards are recognized as necessary ‘tools’ to unambiguously represent, describe and communicate research data
  56. 56. 2. Similar trend in the regulatory arena§  “… lack of standardized data affects CDER’s review processes by curtailing a reviewer’s ability to perform integral tasks such as rapid acquisition, storage, analysis......efficient management of a portfolio of standards projects will require coordinated efforts and clear roles for multiple participants within/outside FDA”
  57. 57. 3. Publishes have become strong advocators§  Continue to support the development of open standards and tools •  to support sharing of sufficiently well annotated datasets59 •  to enable comprehensible, reusable, www.ebi.ac.uk/net-project research reproducible The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  58. 58. ….the rise of data-driven journals, e.g.: partnering with:
  59. 59. The rise of data-driven journals, e.g.: partnering with:
  60. 60. 4. Similar trend in the commercial sector§  R&D has invested heavily in procedures and tools that integrate external information with their own data to enhance the decision-making process•  Now joining forces to streamline non-competitive elements of the life science workflow by the specification of common standards, business terms, relationships and processes
  61. 61. ....their information landscape is evolving Yesterday Today Tomorrow Proprietary Public content content provider provider Big Life Science Big Life CRO Academic Company Science group Company Regulatory authorities Service provider Software vendor Yesterday Today TomorrowInnovation Innovation inside Searching for Innovation Heterogeneity of collaborations; part of the wider ecosystemModelIT Internal apps & data Struggling with change Cloud, services security and trustData Mostly inside In and out DistributedPortfolio Internally driven and owned Partially shared Shared portfolio Credit to: Pistoia Alliance
  62. 62. http://www.flickr.com/photos/idiolector/289490834/ CC BY
  63. 63. Take home messages “The buzz around reproducible bioscience data: the policies, the communities and the standards”u  Contribute to the reproducible research movementu  Learn about open community-standards in your areau  Consider data science as a career path
  64. 64. Outline “The buzz around reproducible bioscience data: the policies, the communities and the standards”“The reality from the buzz:how to deliver reproducible bioscience data”
  65. 65. How do we achieve this? Is it possible to achieve a common, structured representation of diverse bioscience experiments that: •  “The buzz around reproducible bioscience data: follows the appropriate community standards and COMPREHENSIBLE •  the policies, E R Ocommunities research?standards” delivers I N T the P E R A B L E and the REPRODUCIBLE REUSABLE“The reality from the buzz:how to deliver reproducible bioscience data”
  66. 66. Growing number of reporting standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  67. 67. Growing number of reporting standards + 303 + 150 + 130 Source: MIBBI, Source: BioPortal EQUATOR Estimated Databases, annotation, curation tools MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  68. 68. But how much do we know about these standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  69. 69. But how much do we know about these standards Which tools and I use high throughput databases sequencing technologies, implement which which one are applicable standards? to me? How can I get What are the involved tocriteria to evaluate propose their status and extensions or value? modifications? Which one are I work on plants, mature enough for are these just for me to use or biomedical recommend? applications?
  70. 70. But how much do we know about these standards§  A bewildering array of standards is available, but •  these are hard to find, at different levels of maturity; in some areas duplications or gaps in coverage also exist§  Standards are just a ‘means to an end’, therefore •  we want to make them discoverable and accessible, maximizing their use to assist the virtuous data cycle, from generation to standardization through publication to subsequent sharing and reuse
  71. 71. (2007) Vol 25 No 11obofoundry.org
  72. 72. Towards Lego-like ontologies §  Compound terms should be formed out of simpler constituents: •  Body weight weight (quality ontology, PATO) that inheres_in (relation ontology, RO) whole_organism (anatomy ontology, CARO) •  Xylene contaminated soil soil (environmental ontology, EnvO) that has_contaminated (relation ontology, RO) xylene (chemical ontology, ChEBI)76 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  73. 73. (2008) Vol 26 No 8mibbi.og
  74. 74. § Serves researchers, biocurators, journal editorsand reviewers, and funders to §  discover checklists for a particular domain §  monitor progress of extant efforts §  facilitate collaborations
  75. 75. Science (2009),Vol 326, 234-236http://biosharing.org
  76. 76. A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation) Field*, Sansone* et al., Omics data sharing. Science80 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone 326, 234-36 (2009) doi:0.1126/science.1180598 www.ebi.ac.uk/net-project
  77. 77. •  A coherent, curated and searchable catalogue of data sharing resources•  Bioscience standards and associated data-sharing policies, publications, tools and databases•  Assessment criteria for usability and popularity of standards•  Relationships among standards•  Encouragement for communication & interaction among groups•  Promoting interoperability & informed decisions about standards
  78. 78. Smith et al, 2007The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  79. 79. Smith et al, 2007Taylor, Field, Sansone et al, 2008 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  80. 80. List of databases, linked to standards a collaboration with Database Issue84 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  81. 81. List of databases, linked to standards a collaboration with Database Issue85 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  82. 82. List of databases, linked to standards a collaboration with Database Issue86 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  83. 83. Major challenge: define ‘relations’ among standards CREDIT: The relationship among popular standard formats for pathway information Demir, et al., The BioPAX BioPAX and PSI-MI are designed for data exchange to and from databases and community standard for pathway and network data integration. SBML and CellML are designed to pathway data sharing, support mathematical simulations of biological systems and SBGN represents 2010. pathway diagrams.87 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  84. 84. Example of multi-assays study – how many ‘standards’ are applicable to this?
  85. 85. Example of multi-assays study – how many ‘standards’ are applicable to this?
  86. 86. Example of multi-assays study – how many ‘standards’ are applicable to this?
  87. 87. Example of multi-assays study – how many ‘standards’ are applicable to this?
  88. 88. An exemplar approach to the status quo§  A grass-root collaborative that works to facilitate collection, curation and sharing of experiments using a common, structured representation of the experiments that •  transcends individual biological and technological domains and •  can be ‘configured’ to implement (several of) the community standards TOWARDS INTEROPERABLE BIOSCIENCE DATA doi:10.1038/ng.1054 Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Feb 2012 Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B,www.biosharing.org www.isacommons.org Wolstencroft K, Xenarios J, Hide W. www.isacommons.org
  89. 89. An exemplar approach to the status quo§  A grass-root collaborative that works to facilitate collection, curation and sharing of experiments using a common, structured representation of the experiments that •  transcends individual biological and technological domains and •  can be ‘configured’ to implement (several of) the community standards
  90. 90. metadata tracking framework user communityThe International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  91. 91. General-purpose, configurable format,designed to support the use of severalstandards checklists, terminologies andconversions to (a growing number of) othermetadata formats, used by publicrepositories, e.g. MAGE-Tab Pride-xml SRA-xml SOFT
  92. 92. ISA software suite: supporting standards-compliant experimentalannotation and enabling curation at the community level(Rocca-Serra et al, 2010)a collaborative effort of international research/service groups:University of Oxford, EBI, Harvard School of Public Health, NERC EnvironmentalBioinformatics Centre, Genomic Standards Consortium, US FDA Center forBioinformatics, Leibniz Institute of Plant Biochemistry and more….
  93. 93. Create template(s) to fit the type of experiments to be described Create templates detailing the steps to be reported for different investigations, complying to community standards, e.g. configuring the value(s) allowed for each field to be •  text (with/without regular expression testing), •  ontology terms, •  numbers etc.1
  94. 94. Describe, curate your experiment with geographically- distributed collaborators Report and edit the description of the investigation using customized Google Spreadsheets (importing the ‘template’ created by the ISA configurator) enabled with ontology search and term-tagging features.2a
  95. 95. Or describe, curate your experiment using a desktop-based tool Report and edit the description using this tool, (also customized using the templates) with a spreadsheet like look and feel, packed with functionalities such as • ontology search (access via ) • term-tagging features • import from spreadsheets etc…2b
  96. 96. ISMB tag: #PP44 To mint DOIs102 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project empowering researchers to use standards
  97. 97. Perform data analysis We are building relevant ISA modules for GenomeSpace, R-based BioConductor and Galaxy tools 3
  98. 98. Share your experiments with the world as Linked Open Data Through conversion to RDF; work in collaboration with the W3C HCLSIG 4
  99. 99. Share your experiments with the world as Linked Open Data Through conversion to RDF; work in collaboration with the W3C HCLSIG 4Tim Berners-Lee’s 5-star deployment scheme for Linked Open Data
  100. 100. Submit your experiments to public repositories Directly in ISA-Tab or reformatting using the ISAconverter 5
  101. 101. Create your own repository Store the investigations in the database, assign access rights andconduct maintenance tasks. Share, browse, query and view investigations, theirdescriptions and access associated data files. 6
  102. 102. Maguire E, Rocca-Serra P, Sansone SA, Davies J and Chen M.Taxonomy-based Glyph Design -- with a Case Study on VisualizingWorkflows of Biological Experiments, IEEE Transactions on Visualization and Computer Graphics, volume 18, 2012 (in press)
  103. 103. A growing ecosystem of over 30 public and internal resourcesusing the ISA metadata tracking framework (ISA-Tab and/orformat) to facilitate standards-compliant collection, curation,management and reuse of investigations in an increasingly diverseset of life science domains, including:•  environmental health •  stem cell discovery•  environmental genomics •  system biology•  metabolomics •  transcriptomics•  metagenomics •  toxicogenomics•  nanotechnology •  also by communities working to build•  proteomics, a library of cellular signatures
  104. 104. Implementations at Harvard Importance of a local community
  105. 105. Implementations at Harvarddata sharing in ISA-Tab Importance of a local community
  106. 106. Implementations at Harvarddata sharing in ISA-Tab Importance of a local community
  107. 107. Implementation at the EBI113 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  108. 108. Data papers
  109. 109. Extensions of the Nanotechnology Informatics Working Group115 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  110. 110. Open source codeCommunity involvement and uptake!1st ISA-Tab workshop! 3rd ISA-Tab workshop! User workshops/visits - start! 1st public instance: ! 2nd ISA-Tab workshop! Other tools implement ! Harvard Stem Cell ! Growing number of ISA-Tab! Discovery Engine! systems starts to adopt ISA framework!Core developments! Conversions to ! Links to Pride-XML/SRA-XML/! analysis toolsStrawman ISA-Tab spec! ISA software v1! MAGE-Tab and more! starts! Final ISA-Tab spec! Database instance ! at EBI! RDF format starts!Publications! Stem Cell ! ISA-Tab and ! Discovery ! ISA Commons! Omics data sharing! Workshop reports! ISA software suite! Engine! (Science)! (Nature Genetics)! (Bioinformatics)! (NAR)!2007 2008 2009 2010 2011 2012Development timeline
  111. 111. Final remarks “The buzz around reproducible bioscience data: the policies, the communities and the standards”“The reality from the buzz:how to deliver reproducible bioscience data”
  112. 112. Your research and all (publicly funded) research should make make an … impact http://www.flickr.com/photos/equinoxefr/2620239993/ CC BY118 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  113. 113. …..the biggest possible impact! http://www.flickr.com/photos/webhamster/2582189977/ CC BY119 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  114. 114. http://www.flickr.com/photos/andrevanbortel/3745527869/sizes/m/in/photostream/
  115. 115. We must increase the level of annotation Notes in Lab Books Spreadsheets and Tables Facts as RDF statements (information for humans) ( the compromise) (information for machines)•  Invest in curating and manage data at the source using: •  a common metadata tracking framework, such as ISA •  publicly available and community-developed terminologies •  recording sufficient contextual information of the experimental steps§  Progressively datasets will become more comprehensible, interoperable, reproducible and (re)usable, underpinning future investigations
  116. 116. 122 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×