Presentation by Ruth Wilson on Nature Publishing Group's Scientific Data journal given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
3. 33
Data, data, data
Two important factors are driving to make research data more available and
reusable:
• To ensure the scientific process is transparent and can be scrutinised and
research results reproduced
• To speed the scientific process, lead to new insights and reduce duplicated
and repeated work
To achieve this research data needs to be
– Available
– Findable
– Interpretable
– Re-usable
– Citable
4. 44
Existing challenges
• Data producers do not necessarily get
appropriate credit for their work
• Traditional publications are focused on
hypothesis/conclusions
• The peer review process at many research
journals is not focused on ensuring data release
and data standards
• Data and info about datasets often ends in supp.
material
• Potentially valuable datasets are not released
6. 66
What is Scientific Data?
• Scientific Data is an Open Access, online-only
platform containing data descriptors that
describe and explain datasets, supported by
an APC model.
• Data descriptors are a new type of content
and can be viewed as ‘secondary’ material
aimed at increasing the visibility and usability
of datasets and to aid research reproducibility
• For all types of data the descriptor will be peer
reviewed
7. 77
What is Scientific Data..?
• As part of the peer review process we will
check that the data is publically available in an
approved data repository and follows
community guidelines
• All content will be published open access with
the author able to select from a number of
options. In addition the descriptor metadata
will be available under CC0.
• An in-house editorial team and new authoring
tools are being developed to ensure the
creation, submission, curation and publication
of data descriptors is as simple as possible
• The external advisory board will represent
different stakeholder views and provide
feedback on key services.
8. 88
8
Data Descriptors
a new publication type for describing scientifically valuable
datasets
SciData DD
Structured
content
Export to various
formats
(ISA_tab, RDF, etc
)
Datasets
Interoperate with Community resources
Code Workflows
Advanced Search
and Discovery
functions
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
Link to related
Content
Nature Methods
Scientific Reports
Nature Genetics
9. 99
Narrative content
complements both journal articles and repository records
Includes
– Highly detailed, reproducible methods descriptions
– Quality control & technical validation experiments
– Searchable, machine-readable meta-data
Does Not Include
– In depth analysis or tests of hypotheses
– New scientific conclusions
– Exploratory analysis (e.g. clustering)
10. 1010
10
Structured content
It will be based on and compatible with ISA-tab and
undergo technical review by biocuration/standards referees
Submit ISA-tab files directly OR Submission tools and simple templates
help authors provide the information
without special tools
In-house curator
standardizes the
structured content
11. 1111
License types
Data: the raw datasets will reside in public
repositories and likely to be CC0 similar to
Figshare and Dryad etc…
DATA DESCRIPTOR
Metadata: as NPG has already done with its
existing Linked Data Portal the metadata about
data descriptors in Scientific Data will be CC0
Narative/Figures: the narrative describing the
methodology of data generation/collection and
processing will be licensed under either of the
following, by author choice:
12. 1212
Susanna-Assunta Sansone - Honorary Academic Editor
Andrew L Hufton - Managing Editor
Advisory Panel
Supported by
Joseph R. Ecker
Salk Institute, USA
Mark Forster
Syngenta, UK
Stephen Friend
Sage Bionetworks, USA
Pascale Gaudet
Swiss Institute of
Bioinformatics, Switzerland
Anne-Claude Gavin
EMBL, Germany
Albert J. R. Heck
Utrecht University, The Netherlands
Wolfram Horstmann
University of Oxford, UK
Johanna McEntyre
EMBL-EBI, European Bioinformatics Institute, UK
Anthony Rowe
Johnson & Johnson, USA
Richard H. Scheuermann
J. Craig Venter Institute, USA
Caroline Shamu
Harvard Medical School, USA
Jessica Tenenbaum
Duke Translational Medicine Institute, USA
Weida Tong
National Center for Toxicological
Research, FDA, USA
Judith A. Blake
The Jackson Laboratory, USA
Chris Bowler
IBENS, France
Piero Carninci
RIKEN Omics Science
Center, Japan
David Carr
Wellcome Trust, UK
Stephen Chanock
National Cancer Institute, USA
Simon Hodson
Jisc, UK
Who are we?
13. 1313
Contacts
Call for submission Fall 2013
Launching in Spring 2014
13
• www.nature.com/scientificdata
• Email: scientificdata@nature.com
• Twitter: @ScientificData
15. 1515
Evolution - SI
• Greater accessibility/visibility
• Greater discoverability
• Currently about to be piloted on
• Nature Structural and Molecular Biology
• Nature Cell Biology
Very broad theme and not so much time so will concentrate on two aspects of linking between publications and research data at NPG, one is a new product Scientific Data – A new data focused OA peer reviewed platform, other is evolution of practises for existing journals.A small amount of context
What are the existing challengesWe know that much research data is stored in draws if stored at all….
Response to challenges NPG is launching Scientific Data - focused on data interpretation and reuseCalling for submissions in Fall 2013, launching in Spring 2014Six Principles. The Scientific Brand: Innovative new publishing brand from NPG. Open-access, community-driven. Feature-rich. Complements the more tradition-bound Nature titles.
New layer in between traditional journal articles and Repositories. We don’t store the data.
Blessing and a curse……UnderutilisedPublishers (including NPG) do little with Supplementary Information (SI), other than present it with the article in PDF form (not being in xml/html format makes it hard to index and find)Growth difficult to managePublishers are struggling with the growing amount of SI in the life sciences: since 2010 the Journal of Neuroscience no longer accepts SI as it felt it was adversely affecting peer review. In 2009 Cell restricted the number and volume of SIIncreasing volumes at NPGNumber of pieces of SI in NG grown by 65% between 2008 and 2011There were 1515 pieces of SI in Nature Genetics (incl. figures and tables) in the first half of 2011, compared to 915 in the same time period in 2008 The volume of SI across NPG has grown from 5299 files, to 6469 and 7120 (2008 – 2010)(22%, 10%) These figures do not break out individual figures and tables but instead look at the number of PDF files, doc files, xls files etc. Approx 60% are PDF files
Source data – has been on EMBO and MSB for some time….Linked to Nature journals’ updated editorial policies aim to improve transparency and reproducibility by: -Both requiring much more precise description of statistics and employing the expertise of a statistics consultant, where needed;-Increasing the lengths of Methods sections in journals to allow authors to be much more descriptive and facilitate replication of their findings and;-Publishing source data: first the actual data points, that is, tabular source data, behind figures; next additional forms of source data.To this point we have been citing data sets in an online Accession Codes section in our articles online by listing the repository name and, via the persistent identifier, linking to the data set entry in the repository. We will further formalize Data Citations by having them appear in a similar manner to bibliographic references including ensuring that data set authors are more granularly credited for their work (and including the date, minimally year, of data deposition).