Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013
Upcoming SlideShare
Loading in...5
×
 

Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

on

  • 637 views

 

Statistics

Views

Total Views
637
Views on SlideShare
637
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013 Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013 Presentation Transcript

  • Now open for submissions Launching May 2014 www.nature.com/scientificdata scientificdata@nature.com @ScientificData Advisory Panel including senior researchers, funders, librarians and curators Susanna-Assunta Sansone Honorary Academic Editor (University of Oxford, UK) Andrew L Hufton Managing Editor Victoria Newman Editorial Curator Ruth Wilson Publisher Supported by:! Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta, UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute, USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ● Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ● Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ● Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ● Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter Institute, USA ● Caroline Shamu ● Harvard Medical School, USA
  • Introducing a new content type: ! Data Descriptor ! Credit for Sharing Your Data Open-access Focused on Data Reuse Peer-reviewed, curated Promoting Community Data Repositories
  • Introducing a new content type: ! Data Descriptor ! Credit for Sharing Your Data Open-access Focused on Data Reuse Session 2: Publishing Data Aims of this session: to explore how data is being represented and cited in research articles; to showcase new data publishing products, and consider how the edges between articles and data are joined or defined. How can we maximize integrated utility across the different data resources used by scientists? Peer-reviewed, curated Promoting Community Data Repositories Session 3: Credit, Attribution, Reproducibility and Provenance Aims of this session: in an integrated information space, it is essential to have transparency on the sources and methods of scientific outputs. How do scientific articles contribute to this goal? Are they sufficiently addressing requirements, what are the most useful approaches and how might they be actioned?
  • Data Descriptor vs. traditional article! •  The data descriptor is only concerned with the facts behind the methodology of data generation/collection and processing! •  A data descriptor can be:! –  submitted prior to journal article ! –  submitted at the same time as the journal article! –  submitted after journal article! Synthesis Analysis What is the sample? What did I do to generate the data? How was the data processed? Where is the data? Who did what when? Facts Data Descriptor Conclusions Data Descriptor NARRATIVE Summary of Data Descriptor Interpretation Journal article
  • Two sample Data Descriptors now online!
  • Data Descriptor has 2 components! Article or narrative component (PDF and HTML) Supported by Experimental metadata or structured component (in-house curated, machine-readable formats)
  • Data Descriptor - article 
 ! 
 Sections:! •  Title! •  Abstract! •  Background & Summary! •  Methods! •  Technical Validation! •  Data Records! •  Usage Notes ! •  Figures & Tables ! •  References! •  Data Citations! !
  • Data Descriptor - article 
 ! 
 Sections:! •  Title! •  Abstract! •  Background & Summary! •  Methods! •  Technical Validation! •  Data Records! •  Usage Notes ! •  Figures & Tables ! •  References! •  Data Citations! ! In traditional publications this information is not provided in a sufficiently detailed manner However this information is essential for understanding, reusing, and reproducing datasets
  • Data Descriptor - article 
 ! 
 Sections:! •  Title! •  Abstract! •  Background & Summary! •  Methods! •  Technical Validation! •  Data Records! •  Usage Notes ! •  Figures & Tables ! •  References! •  Data Citations! !
  • Data Descriptor - article 
 ! 
 Sections:! •  Title! •  Abstract! •  Background & Summary! •  Methods! •  Technical Validation! •  Data Records! •  Usage Notes ! •  Figures & Tables ! •  References! •  Data Citations! !
  • Data Descriptor - article 
 ! 
 Sections:! •  Title! •  Abstract! •  Background & Summary! •  Methods! •  Technical Validation! •  Data Records! •  Usage Notes ! •  Figures & Tables ! •  References! •  Data Citations! !
  • Data Descriptor – experimental metadata (CC0)! funded by:
  • Data Descriptor – experimental metadata (CC0)! General-purpose, configurable format, designed to support: •  description of the experimental workflow, making the annotation explicit and discoverable •  provenance tracking •  use community standards, such as minimal reporting guidelines and terminologies o  over 300 ‘ontologies’ and over 60 guidelines •  conversions to - a growing number of - other metadata formats o  e.g. used by EBI repositories o  and as linked data funded by:
  • Data Descriptor – experimental metadata (CC0)! General-purpose, configurable format, designed to support: •  description of the experimental workflow, making the annotation explicit and discoverable •  provenance tracking •  use community standards, such as minimal reporting guidelines and terminologies o  over 300 ‘ontologies’ and over 60 guidelines •  conversions to - a growing number of - other metadata formats o  e.g. used by EBI repositories o  and as linked data ISA is implemented by several service providers running systems that are •  local, institute-based o  e.g. Harvard Stem Cell Institute •  project, consortium-based o  e.g. ToxBank serving a research cluster of seven EU FP7 Health projects •  global, international repositories •  e.g. EBI’s MetaboLights •  and another ‘data journal, GigaScience in GigaDB
  • Data Descriptor – experimental metadata (CC0)! Includes fields describing: •  each study, linking to relevant sections of the Data Descriptor article •  authors’ details, including ORCID •  publications •  funding sources and funders’ name, via FundRef •  experimental factors •  study design •  assays •  protocols
  • Data Descriptor – experimental metadata (CC0)!
  • Data Descriptor – experimental metadata (CC0)!
  • Data Descriptor – experimental metadata (CC0)! In-house curation team: •  assists users to submit the structured content via simple templates and an internal authoring tool •  performs value-added semantic annotation of the experimental metadata For advanced users/service providers willing to export ISA-Tab for direct submission, we will release a technical specification: analysis ! method! Data file or ! record in a database! script!
  • Discover similar datasets! Structured content allows users to link, with one click, to other datasets studying the same tissue, disease, organism, or using the same experimental platform! SciData DD SciData DD SciData DD Structured content Structured content Structured content SciData DD Same tissue Same organism Structured content Same assay SciData DD SciData DD SciData DD Structured content Structured content Structured content SciData DD SciData DD SciData DD Structured content Structured content Structured content Community Data Repositories
  • 
 Complementing both journal articles and data repositories 
 ! Export to various formats (ISA_tab, RDF, etc)
  • Other data-related activities at NPG
 ! •  Figure source data -  putting data behind figures/graphs -  implemented at Molecular System Biology, rolled out at Nature and progressively across all other Nature branded titles Wang et al, Nature, 2013 doi:10.1038/nature12730
  • Other data-related activities at NPG
 ! •  Figure source data -  putting data behind figures/graphs -  implemented at Molecular System Biology, rolled out at Nature and progressively across all other Nature branded titles •  Extended data -  expandable text and extra figures; rolled out at Nature •  Data citation -  tackling both styling and format; monitoring community developments, such the Data Citation Synthesis Group -  to be rolled out across all Nature branded titles and Scientific Data •  Code reproducibility -  peer review, availability and reuse •  Supported community databases -  criteria for selection, common list across all NPG titles •  NPG’s Linked Data release – CC0