High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

!
High quality data publications:!
drives and needs!
!
Susanna-Assunta Sansone, PhD!
!
@biosharing!
@isatools!
@scientificdata!
!
B-DEBATE: Big Data in Biomedicine. Challenges and Opportunities, 12 Nov, 2014
Data Consultant,
Honorary Academic Editor
Associate Director,
Principal Investigator

Credit to:
https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/

Plagued by selective reporting of data and methods
• Over 50% of completed studies in
biomedicine do not appear in the
published literature!
!
• Often because results do not
conform to author's hypotheses!
“Only half the health-related
studies funded by the European
Union between 1998 and 2006 -
an expenditure of €6 billion - led
to identifiable reports”!

Incentivizing individual contributor to share data
• Big science efforts!
o data is often better organized, reported and shared!
• Small independent efforts, yielding a rich variety of specialty data sets!
o Most of these data (such as null findings) is unpublished!
o These dark data hold a potential wealth of knowledge!

From made reproducible to born reproducible
“Reproducing the method took several months of effort, and
required using new versions and new software that posed
challenges to reconstructing and validating the results”

Worldwide movement for FAIR data
http://bd2k.nih.gov/workshops.html#ADDS

Publishers occupy a leverage point
Because of importance of formal
publications in the academic !
incentive structure!

Role of publishers as “agents of change”
Serve as the implementation and/or enforcement arm
at the point of publication!

2013
Credit to:
Iain Hrynaszkiewicz

Data/reproducibility at NPG
Wang et al, Nature, 2013
doi:10.1038/nature12730
• Figure source data
o putting data behind figures/graphs

Data/reproducibility at NPG
• Figure source data
o putting data behind figures/graphs
• Data citation
o tackling both styling and format; monitoring community developments,
such the Data Citation Synthesis Group
• Code reproducibility
o peer review, availability and reuse
• NPG’s Linked Data release – CC0
• A new data journal

Role of data papers and data journals
• Incentive, credit for sharing!
• Peer review focus!
• Value of data vs. analysis!
• Discoverability and reusability!

market research (2011)
• What do researchers want from a data publications?
o 96% - increased visibility and discovery
o 95% - increased usability of their research data
o 93% - credit mechanism for deposit of data
o 80% - peer review of content/datasets
Respondent characteristics
387 respondents (329 active researchers
Physics (24%)
Earth and environmental science (21%)
Biology (20%)
Chemistry (19%)
Others (16%)

!
!
!
Helping you publish, discover and reuse research data
Credit for sharing
your data
Focused on reuse
and reproducibility
Peer reviewed,
curated
Promoting community
data and code
repositories
Open Access
• Currently covering life, natural and environmental
sciences!
• Big and small data!
o power of small data are in their aggregation and
integration with other datasets!
• New and previously published individual datasets,
curated collections and citizen science!
o a fuller, more in-depth look at the data processing
steps, additional data files, codes etc!
o tutorial-like information for scientists interested in
reusing or integrating the data with their own!

Introducing a new content type: Data Descriptor
Methods and technical analyses supporting the quality
of the measurements:!
What did I do to generate the data?!
How was the data processed?!
Where is the data?!
Who did what when!
How can the data be used or reused?!
Designed to make data
more discoverable,
interpretable
and reusable

Relation with traditional article - content
!
!
!
!
!
!
!
!
Scientific hypotheses:!
Synthesis!
Analysis!
Conclusions!
Methods and technical analyses supporting the quality
of the measurements:!
What did I do to generate the data?!
How was the data processed?!
Where is the data?!
Who did what when!
How can the data be used or reused?!

Relation with traditional article - time
Publish
Data!
AFTER: expand on your research articles, adding further information for reuse of the data
AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)
OR BEFORE !

Share your data, get credited and cited
!
!
!
!
!
!
!
!
!
Code in GitHub
!
!
!
!
!
!
!
!
!
Data in OpenfMRI

Data Descriptor: narrative and structure
!
!
!
Experimental metadata or !
structured component!
(in-house curated, machine-readable
formats)!
Article or !
narrative component!
(PDF and HTML) !

Data Descriptor: narrative
Focus on data reuse!
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!
Sections:!
• Title!
• Abstract!
• Background & Summary!
• Methods!
• Technical Validation!
• Data Records!
• Usage Notes !
• Figures & Tables !
• References!
• Data Citations!
!
Joint Declaration of Data Citation Principles by the
Data Citation Synthesis Group

Focus on data reuse!
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!
Sections:!
• Title!
• Abstract!
• Background & Summary!
In traditional publications this
• Methods!
• Technical Validation!
• Data Records!
• Usage Notes !
• Figures & Tables !
• References!
• Data Citations!
information is not provided in a
sufficiently detailed manner
However this information is
essential for understanding,
reusing, and reproducing
datasets
!
Data Descriptor: narrative

Data Descriptor: structure (CC0)
In-house editorial curator:!
• assists users to submit the structured
content via simple templates and an
internal authoring tool!
• performs value-added semantic
annotation of the experimental
metadata!
For advanced users/service providers
willing to export ISA-Tab for direct
submission, we have released a technical
specification:!
Data file or !
record in a
database!
analysis !
method! script!

Adding value to research articles and data records
Research
papers
Descriptors
Data
Data
records
We currently recognize over
60 public data repositories!
!

Citation of and link to data files and databases

Peer review process focused on quality and reuse!
Evaluation is not be based on the perceived impact !
or novelty of the findings or size of the data!
!
• Experimental rigour and technical data quality!
o Methodologically sound!
o Technical validation experiments and statistical analyses!
o Depth, coverage, size, and/or completeness of data sufficient for the types
of applications!
• Completeness of the description!
o Sufficient details to allow others to reproduce the results, reuse or
integrate it with other data!
o Compliance with relevant minimum information or reporting standards!
• Integrity of the data files and repository record!
o Data files match the descriptions in the Data Descriptor!
o Deposited in the most appropriate available databases!

Progressively refine guidance to authors and reviewers
~ 156
~ 70
~ 334
Source: BioPortal
Databases !
implementing !
standards!
miame!
MIAPA!
MIRIAM!
MIX!MIQAS!
MIGEN!
MIAPE!
CIMR!
MIASE!
REMARK!
MIQE!
CONSORT!
MISFISHIE….!
MAGE-Tab!
GCDML!
SRAxml!
SOFT! FASTA!
DICOM!
MzML!
SBRML!
CML!
GELML!
SEDML…!
MITAB!
ISA-Tab!
AAO!
CHEBI!
OBI!
PATO! ENVO!
MOD!
TEDDY!
BTO!
IDO…!
XAO!
PRO!
DO
VO!

Mapping the landscape of standards and databases

PI: Lucila Ohno-Machado, UCSD
biocaddie.org

PI: Mark Musen, Stanford
metadatacenter.org

Acknowledgements!
Visit
nature.com/scientificdata
Email
scientificdata@nature.com
Tweet
@ScientificData
Honorary Academic Editor
Susanna-Assunta Sansone, PhD
Managing Editor
Andrew L Hufton, PhD
Editorial Curator
Varsha Khodiyar
Publisher
Iain Hrynaszkiewicz
Advisory Panel and Editorial Board including
senior researchers, funders, librarians and curators
and our Advisory Boards and Collaborators
Philippe
Rocca-Serra, PhD
Alejandra
Gonzalez-Beltran, PhD
Eamonn
Maguire
Milo
Thurston, PhD
Funds:

High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Similar to High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014 (20)

More from Susanna-Assunta Sansone

More from Susanna-Assunta Sansone (20)

Recently uploaded

Recently uploaded (20)

High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014