NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013


Published on

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013

  1. 1. Honorary Academic Editor Susanna-Assunta Sansone, PhD (University of Oxford, UK) Visit Managing Editor Andrew L Hufton, PhD Email Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators Tweet @ScientificData
  2. 2. Now open for submissions! Launching May 2014 Advisory Panel Susanna-Assunta Sansone Honorary Academic Editor Andrew L Hufton Managing Editor Ruth Wilson Publisher Supported by Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta, UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute, USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ● Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ● Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ● Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ● Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter Institute, USA ● Caroline Shamu ● Harvard Medical School, USA
  3. 3. Now open for submissions! Launching May 2014 Introducing a new content type: Data Descriptor Supported by
  4. 4. Data Descriptor vs. Traditional Article ● The data descriptor is only concerned with the facts behind the methodology of data generation/collection and processing ● A data descriptor can be: – submitted prior to journal article – submitted at the same time as the journal article – submitted after journal article Interpretation Synthesis Analysis Facts What is the sample? Data Descriptor Conclusions Data Descriptor What did I do to generate the data? How was the data processed? Where is the data? Who did what when? Summary of DD Journal article
  5. 5. Prior Publication Policy “Nature-titled journals will not consider prior Data Descriptor publications to compromise the novelty of new manuscript submissions as long as those manuscripts go substantially beyond a descriptive analysis of the data, and report important new scientific findings appropriate for the journal. This policy does not necessarily extend to subsequent journal articles whose primary purpose is to describe a new dataset or resource.” See the full text in our Editorial Policies online
  6. 6. Barriers to data sharing and reuse ● Datasets are not released ● Datasets are not reusable or discoverable ● Lack of credit for sharing data and making it reusable
  7. 7. Two sample Data Descriptors now online 7
  8. 8. Data Descriptor has 2 components Article or narrative component (PDF and HTML) Supported by Experimental metadata or structured component (in-house curated, machine-readable formats) 8
  9. 9. Data Descriptor - article Sections: • Title • Abstract • Background & Summary • Methods • Technical Validation • Data Records • Usage Notes • Figures & Tables • References In traditional publications this is not provided in a sufficiently detailed manner However this information is essential for understanding, reusing, and reproducing datasets
  10. 10. Data Descriptor – experimental metadata Submit ISA-Tab* files directly OR Submission tools and simple templates help authors provide the information without special tools In-house curator standardizes the structured content *Sansone et al., Nature Genetics, 2012 10
  11. 11. Discover similar datasets Structured content allows users to link, with one click, to other datasets studying the same tissue, disease, organism, or using the same experimental platform SciData DD SciData DD SciData DD Structured content Structured content Structured content SciData DD Same tissue Same organism Structured content Same assay SciData DD SciData DD SciData DD Structured content Structured content Structured content SciData DD SciData DD SciData DD Structured content Structured content Structured content 11
  12. 12. Get Credit for Sharing Your Data Publications will be listed in the major indexes and will be citeable Open-access Authors select from three Creative Commons licences for the main Data Descriptor. Each publication supported by curated CC0 metadata Focused on Data Reuse All the information others need to reuse the data; no interpretative analysis or hypothesis testing Peer-reviewed Rigorous peer-review managed by our Editorial Board of academic researchers ensures data quality and standards Promoting Community Data Repositories Data stored in community data repositories
  13. 13. Complementary to both journal articles and data repositories Export to various formats (ISA_tab, RDF, etc)
  14. 14. Scientific Data and GBIF: Roadmap Partnership between GBIF and NPG Scientific Data Mapping the DD article and GBIF Metadata Profile Q4 2013 Q4 2013 Enhancement to GBIF IPT to export the DD article Call for manuscript submissions 1st set of Data Descriptors published Vishwas Chavan PHASE 1 Q42 2014 Q43 2014 Q4 2014 Mapping the DD experimental metadata and GBIF Metadata Profile Further enhancements to GBIF IPT PHASE 2 The two components of the Data Descriptor (DD): • DD article or narrative component • DD experimental metadata or structured component (ISA-Tab format, progressively others e.g. RDF)