Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Data, Data Everywhere: What's A Publisher to Do?
1. | 1
Anita de Waard 0000-0002-9034-4119
VP Research Data Collaborations, Elsevier RDM
a.dewaard@elsevier.com
Data, Data, Everywhere:
What’s A Publisher To Do?
2. | 2
Are Scientists Sharing Their Data? Why (not)?
Attitudes to data sharing
73% agree that having
access to other
researchers’ data would
benefit their research
… but 64% agree that
they would be willing to
share their own research
data with other
researchers
The reasons for discrepancy?
Only 26% agreed they
had received sufficient
training in data
management
Only 37% agreed that
sharing research data is
associated with credit/
reward in their field
59% agreed that research data
management specialists need to play a
role in research data sharing
Perceived benefits of data sharing
Top 3
More possibilities
for collaboration
55%
Reproducibility of
research
53%
Article more likely to
be cited
50%
Perceived drawbacks of data sharing
Competitors using data
before collector has a
chance to re-use it
Use without
crediting/citing the
data collector
Cost (time and financial)
Legal concerns
(ownership, misuse,
confidentiality)
CWTS –Elsevier Open Data Report (2016): 1,200 researchers responded to a survey of 51,672 individuals randomly selected from Scopus
author database (2.3% response rate). Survey tool: Online survey available in English only. Co-branded with University of Leiden (CWTS).
Fieldwork took place in June-July 2016
https://www.elsevier.com/about/open-science/research-data/open-data-report
3. | 3
Publishers Getting Together: AGU FAIR Data Sharing Guidelines
1. https://www.elsevier.com/authors/author-services/research-data/data-guidelines/
2. https://cos.io/our-services/top-guidelines/
AGU FAIR DATA DRAFT
Consolidated proposal for data guidelines for publishers of Earth and Space Science journals:
1. Authors are required to deposit their research data in a relevant data repository:
• Before publication, large data sets (such as microarray data, protein or DNA sequences, atomic coordinates or
climate data) must be deposited in an approved database.
• An inventory of suggested and supported data repositories is provided by the Coalition on Publishing Data in the
Earth and Space Sciences (COPDESS).
• All data used in the analysis must be available to any researcher for purposes of reproducing or extending the
analysis.
2. Authors are required to cite and link to this dataset in your article, following the Force11 Data CItation Principles
3. If this is not possible, authors are required to submit a statement explaining why research data cannot be shared
Truly exceptional circumstances requiring special treatment, such as protecting personal privacy, should be discussed
with the editor no later than at the manuscript revision stage, and spelled out explicitly in the acknowledgments.
Elsevier + Science + Nature + Wiley + PLoS + Digital Science + AGU
4. | 4
Research data is more than just a journal supplement:
9
All forms of research data, which
includes everything needed to
reproduce and reuse experimental
and computational results.
Raw data Processed data
Machine &
environment settings
Protocols, methods, workflows Scripts, analyses, algorithms
5. | 5
3. ‘Metrics on data’
Monitoring and reporting
on institutional data
• Benchmark • Rank Evaluate
• Manage • Preserve
Institution
Search Repository
Notebook
Manager
Mendeley Data Platform:
Monitor
A modular, cloud-based platform designed for research institutions,
to manage the entire lifecycle of research data
Find Topic
Design
Identify gaps
Plan & Fund
Discover data, people,
methods & protocols
Collect, analyze
& visualize
Prepare, reproduce,
re-use & benchmark
Store &
Share
Publish
Disseminate
1. Lab data
Execute
Research
2. Open data: data publicly available
6. | 6
But how do we publish Data Science?
https://projectreporter.nih.gov/project_description.cfm?projectnumber=1R01MH107238-01
1R01MH107238-01 (Arnold, Fraser, Kesselman):
An experimental paradigm to allow dynamic monitoring of the strength
and location of every glutamatergic and GABA/Glycinergic synapse within the brain of a
living organism. This will involve combining three technologies:
1. Recombinant probes, ...
2. 2P-SPIM microscopy, …
3. Software to calculate and store the location and strength of each synapse in
such a manner that it can be easily manipulated and analyzed
7. | 7
The computer is a scientist, too:
“intelligent systems for computer-aided
discovery can integrate into the insight
generation loop in scalable ways…”
Computer-Aided Discovery: Towards Scientific Insight Generation with Machine Support, V. Pankratius, J. Li, M. Gowanlock, D. Blair, C. Rude, T.
Herring, F. Lind, P. Erickson, C. Lonsdale, IEEE Intelligent Systems 31(4), pp. 3-10, Jul/Aug 2016
“This work combines time series Principal
Component Analysis with InSAR to constrain
the space of possible model explanations on
current empirical data sets and achieve a better
identification of deformation patterns”
9. | 9
Moving from a pipeline to a platform model:
A platform:
• Is a nexus of rules and architecture
• Is open, allowing regulated participation
• Actively promotes (positive) interactions among different partners
• Scales much faster than a pipeline.
A network inherently has an external focus.
To have an external focus, you must have a community strategy.
Pipelines, Platforms, and the New Rules of Strategy, Harvard Business review, April 2016,
https://hbr.org/2016/04/pipelines-platforms-and-the-new-rules-of-strategy
10. | 10
existing integration
planned integration
Example #1: Mendeley Data Platform to integrate with
Research Data Management ecosystem
Index datasets
metadata
Mint DOIs Import / export notebooks,
experiments
Import / export
datasets
Repository
indexed by
OpenAIRE
Zenodo indexed
by DataSearch
Publish links
between articles
and datasets
Datasets indexed by
DataSearchLong-term
preservation
of published
datasets
+ 22 repositories
Integrate with
machine
readable DMPs
Open API
11. | 11
• Goal: develop a cloud-based solution for doing bioinformatics experiment
• Elsevier portion: build a Global Unique Identifier Broker tool
• Based on Mendeley Data DOI resolver, to be shared in OS through NIH Platform
Example #2: Collaboration NIH Data Commons Pilot
With SevenBridges Genomics
http://www.healthcareitnews.com/news/nih-taps-new-partners-build-commons-petabytes-biomedical-data
12. | 12
In Summary:
• Publishers are getting together to help store and share data:
- Open Data Report
- AGU Fair Data Group
- Mendeley Data Platform
• But: all science is becoming data science:
- Scientists are building the tools that other scientists work with
- We are moving ‘beyond download science’, where computers join in
• Publishers need to change:
- We need to enable network effects
- This means moving from a pipeline to a platform model
• Some things we are doing at Elsevier RDM:
- Mendeley Data Platform
- Participating in the NIH Data Commons Pilot
13. | 13
Many Questions Remain…
• How do we best publish Data Science?
• (How) do we connect this to the article/journal model that we all know
and love?
• How does scientific software become sustainable, used, connected?
• What role do all the parties play: funding agencies, tool builders
(commercial and academic), HPCCs, cloud providers, libraries,
standards bodies, publishers?
• How do all of us become connected into a viable network?
• How do we collectively transition from the old models into the new?
16. | 16
Ideas are becoming distributed
Tools are becoming
distributed
Easy to create networks of
tools to run anywhere
(Docker, Jupyter Notbook
collections etc)
Many sources, formats,
owners, types: global,
interconnected
Computers make hypotheses, too*;
citizen science/MOOCs enable
ubiquitous access to knowledge
*
http://ieeexplore.ieee.org/abstract/document/7
515118/: Computer-Aided Discovery: Toward
Scientific Insight Generation with Machine
Data is becoming distributed