Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

1,167 views

Published on

Susanna-Assunta Sansone talk at the Genomics Standards Consortium meeting in Shenzhen, March 6th 2012

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

  1. 1. Policies and standards for reproducible research: from theory to practice§  How do we make standards-compliant data sharing culture functional and efficient? •  Several data management, sharing policies and plans have emerged; the number of data journals is growing and guidelines to authors for reporting data are being enriched; there are thousands of biological databases and a wealth of community standards •  Although, funders, journal editors, data producers, consumers and service providers agree in principle that shared, annotated research data and methods offers new discovery opportunities, compliance is challenging in practice§  Starting from the genomics domain and extending to other areas of life-science, we are looking to highlight the success stories and existing problems
  2. 2. About this session - speakers§  Representatives from stakeholders involved in complete cycle of data •  from funding and regulation, to production, release and re-use§  Setting the scene: •  Susanna-Assunta Sansone, University of Oxford, UK •  Scott Edmunds, GigaScience BGI Shenzhen, China§  Funders •  Rita Colwell, University of Maryland, USA •  Paula J. Olsiewski, Sloan Foundation§  Service providers and/or data producers •  Philippe Rocca-Serra, University of Oxford, UK •  Folker Meyer, Argonne National Laboratory, USA •  Srikrishna Subramanian, IMTECH, India§  Editors •  Clare Garvey, Genome Biology/BioMed Central •  Craig Mak, Nature Biotechnology
  3. 3. About this session - topics§  Data management, preservation and sharing policies – view points •  formulation and enforcement, or •  uptake and compliance§  Reporting standards – experiences and challenges •  evolutions of standards, costs of compliance, reward for complying etc. •  usability of standards when working across disciplines, also they all have differing community norms •  challenges in integrating data types and how standards can help§  Tackling the challenges – approaches and lessons learned •  balance needs and expectations (data producers, consumers, reviews, service providers etc.) •  potential role of each stakeholder •  new way forwards
  4. 4. the evolving portfolio of data sharing enablers Susanna-Assunta Sansone, PhD University of Oxford, Oxford e-Research Centre, Oxford, UK http://uk.linkedin.com/in/sasansone GSC13th, Shenzhen, China, March 5-7, 2012
  5. 5. From reusable data to reproducible researchTo make the datasets comprehensible, interoperable and reusable,underpinning future investigations, we need common ways to report andshare the experimental details and the associated results.Consistent reporting will have a positive and long-lasting impact on the valueof collective scientific outputs. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  6. 6. A ‘general mobilization’ to develop standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information
  7. 7. A ‘general mobilization’ to develop standards…..BUT§  Fragmentation of the standards is a major issue ! •  Being focused on particular communities’ interests, be their individual technologies or biological/biomedical disciplines, leads to duplication of effort, and more seriously, the development of (largely arbitrarily) different standards •  This severely hinders the interoperability of databases and tools and ultimately the integration of datasets
  8. 8. Growing number of reporting standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  9. 9. Growing number of reporting standards + 303 + 150 + 130 Source: MIBBI, Source: BioPortal EQUATOR Estimated Databases, annotation, curation tools MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  10. 10. But how much do we know about these standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  11. 11. But how much do we know about these standards Which tools and I use high throughput databases sequencing technologies, implement which which one are applicable standards? to me? How can I get What are the involved tocriteria to evaluate propose their status and extensions or value? modifications? Which one are I work on plants, mature enough for are these just for me to use or biomedical recommend? applications?
  12. 12. Often Which tools and not muchI use high throughput … databases sequencing technologies, implement which which one are applicable standards? to me? How can I get What are the involved tocriteria to evaluate propose their status and extensions or value? modifications? Which one are I work on plants,Several policy documentations and guidelines are inconsistent just for are these and/or mature enough forunclear when recommending use of standards, e.g.: biomedical me to use or“..recommend use of appropriate standards...where these exists…....mature, applications? recommend?stable efforts....MIAME format…..standards from accredited standardsorganizations…..deposition to public repositories, supporting thesestandards…...”
  13. 13. 200914
  14. 14. 15
  15. 15. A coherent, curated and searchable catalogue of data sharing resources that(collaboratively) works to:2. Centralizes community-developed bioscience standards and make themdiscoverable; linking to: •  data sharing, preservation and management policies •  other portals e.g. MIBBI, NCBO’s BioPortal, NIF, BioSiteMaps, OBO foundry •  related open access, published material e.g. BioMedCentral, Nature Precedings, F1000 •  tools and databases implementing the standards e.g. collaboration with NAR Database3. Identifies and maintain a set of (implicit) criteria for assessing usability andpopularity of the standards, including: •  implementations by tools and databases •  availability of standards-compliant, public datasets •  relations among standards3. Fosters communication among groups, in particular to: •  address overlaps and duplication of efforts and enhance interoperability of standards •  produce ‘best practice’ guidelines starting new, or contributing to existing efforts Ø  Will allow stakeholders (funders, journals, service providers and16 researchers) toSystems Biologyinformed decision on standards The International Conference on make (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  16. 16. 17 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  17. 17. Over 400 entries (public and in curation)18 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  18. 18. Smith et al, 2007The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  19. 19. Smith et al, 2007Taylor, Field, Sansone et al, 2008 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  20. 20. List of databases, linked to standards a collaboration with Database Issue21 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  21. 21. List of databases, linked to standards a collaboration with Database Issue22 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  22. 22. List of databases, linked to standards a collaboration with Database Issue23 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  23. 23. Define groups and relations among standards CREDIT: The relationship among popular standard formats for pathway information Demir, et al., The BioPAX BioPAX and PSI-MI are designed for data exchange to and from databases and community standard for pathway and network data integration. SBML and CellML are designed to pathway data sharing, support mathematical simulations of biological systems and SBGN represents 2010. pathway diagrams.24 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  24. 24. E.g. in the genomics context: resources from GSC and other communities… INSDC GCDML EnvO GOLD SRAxml MixS EnvO-light MG-RAST ISA-Tab OBI CAMERA BIOM etc… SILVA (data matrices) etc…Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
  25. 25. E.g. in the genomics context: resources from GSC and other communities… INSDC GCDML EnvO GOLD SRAxml MixS EnvO-light MG-RAST ISA-Tab OBI CAMERA BIOM etc… SILVA (data matrices) etc…Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
  26. 26. E.g. in the genomics context: resources from GSC and other communities… INSDC GCDML EnvO GOLD SRAxml MixS EnvO-light MG-RAST ISA-Tab OBI CAMERA BIOM etc… SILVA (data matrices) etc…Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
  27. 27. E.g. in the genomics context: resources from GSC and other communities… INSDC GCDML EnvO GOLD SRAxml MixS EnvO-light MG-RAST ISA-Tab OBI CAMERA BIOM etc… SILVA (data matrices) etc…Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
  28. 28. E.g. in the genomics context: resources from GSC and other communities… INSDC GCDML EnvO GOLD SRAxml MixS EnvO-light MG-RAST ISA-Tab OBI CAMERA BIOM etc… SILVA (data matrices) etc…Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
  29. 29. Acknowledgements:Philippe Rocca-Serra (University of Oxford)Eamonn Maguire (University of Oxford)Annapaola Santarsiero (University of Oxford)Susanna Sansone (University of Oxford)Chris Taylor (EMBL-EBI)Dawn Field (NERC-NEBC)with contributions from members of our communities andindividuals.

×