| 1
Anita de Waard, VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
May 20, 2016
Publishing The Full Research Cycle
To Support Open Science
Container Strategies for Data & Software
Preservation that Promote Open Science
Notre Dame, IN
| 2
Source: JISC: How and why you should manage your research data: a guide for researchers
Caroline Ingram, Published: 7 January 2016
Research Data Life Cycle:
| 3
Collaborate and Analyse: Hivebench
www.hivebench.com
| 4
Manage, Store, Preserve:
Data Rescue: Preserving Data At Risk
https://olivearchive.org/
Software Rescue: Preserving Executable Content
http://www.codata.org/task-groups/data-at-
risk/dar-workshops
| 5
https://data.mendeley.com/
Linked to published
papers – or not
Linked to Github – or
not
Versioning and
provenance
Manage, Store, Preserve: Mendeley Data
Allowing Different
Licenses
| 6
Data
articles
Software
articles
Method
articles
Protocols
Video
articles
Hardware
articles
Lab
resources
Full Research
paper
• Brief article types designed to
communicate a specific element of
the research cycle
• Complementary to full research
papers
• Easy to prepare and submit
• Peer-reviewed and indexed
• Receive a DOI and fully citable
• Allow citable post-publication
updates
• Primarily Open Access (CC-BY)
• Published in Multidisciplinary and
domain-specific journals
https://www.elsevier.com/books-and-journals/research-elements
Share, Publish: Research Elements
| 7
http://www.journals.elsevier.com/softwarex/
Share, Publish: SoftwareX
• Submissions to SoftwareX are composed of
- A short article describing the software, with a focus on the impact of
the software in the research community and re-usability across disciplines
- A “metadata table” containing information about the software and key metrics:
- A permanent link to a software repository (GitHub) where the software and code is
stored and maintained by Elsevier and made freely available
• Peer Review
- Follows a simple reviewer questionnaire, available from the SoftwareX website, that
evaluates usability and scientific impact of the software
- Less attention is placed on the technical quality of the software
| 8
data uploaded on
Mendeley Data
code/softwar
e deposited
to GitHub
software updates
Software
article
peer-review
process
submitted
SoftwareX
Metadata
Bi-directional
links
software article
published; live stats
shown
code/software
forked to the journal
GitHub repository
(open source)
CC-BY
linke
d
Data is publicly
available on
Mendeley Data
(CC-BY)
accepted
Share, Publish: SoftwareX
| 9
Discover: Datasearch
http://datasearchdemo.elsevier.com/indexed#
| 10
• The first Reproducibility Paper was published recently:
http://www.sciencedirect.com/science/article/pii/S0306437915301113
• It is linked to this paper:
http://www.sciencedirect.com/science/article/pii/S0306437915000472
• The data is hosted here: https://data.mendeley.com/datasets/xz6gv65m6d/6
• To reproduce the experiment, the journal requires source code for the software
components, together with installation scripts; we suggest authors to host their code in
GitHub
• In addition to the source code, we recommend authors to submit a virtual machine, where
all appropriate software components are readily installed and can be reproduced on a
wide variety of platforms. Authors are to submit their experiments using either ReproZip or
Docker.
Reuse: Reproducibility Papers
| 11
Discover, Reuse and Cite:
• ICSU-WDS/RDA Publishing Data Service Working group,
merged with National Data Service pilot
• Cross-stakeholder - with support and input from CrossRef, DataCite, OpenAIRE,
Europe PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and
others
• Proposed long-term architecture and interoperability framework: www.scholix.org
• Operational prototype at http://dliservice.research-infrastructures.eu/#/api
(including 1.4 Million links from various sources)
| 12
Discover, Reuse and Cite:
https://www.elsevier.com/connect/data-citation-is-becoming-real-with-force11-and-elsevier
| 13
Publishing The Full Research Cycle Requires
Networks of Collaboration:
Force11:
- Multi-stakeholder, member-driven organisation
- Unites scholars, tool developers, librarians, publishers, funding agencies etc. etc.
- E.g. Software citation group, akin to Data Citation Group
National Data Service:
- Multi-stakeholder group, based around supercomputing centres
- Aims to be a ‘connective tissue’ between data creation, curation, storage etc projects.
- Inviting Pilots: two or more partners who have not worked together, interested in
collaborating on a data-centric project to solve a real-world needs
- E.g. Datasearch, Data Linking systems
RDA:
- Coleading Data publishing, linking group
- Colead Cost Recovery group, part of RDA US Sustainability effort
- Active in Chemistry, Earth Science groups, starting IG on Data Search
- SciDataCon, Sept 11-16, Denver, CO
The National
DATA SERVICE
| 14
• https://www.hivebench.com
• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-
international-data-rescue-award-in-the-geosciences
• http://www.journals.elsevier.com/softwarex/
• https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking
• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html
• https://rd-alliance.org/bof-data-search.html
• https://data.mendeley.com/
• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
• https://www.force11.org/
• http://www.nationaldataservice.org/
• https://rd-alliance.org/
• https://www.elsevier.com/about/open-science/research-data
Anita de Waard, a.dewaard@elsevier.com
Thank you! Questions?
| 15
Researche
rs
Funding
AgencyInstitution
Data
RepositoryDataset
JournalPaper
1. Researcher creates datasets
2. Researcher writes paper & publishes in journal
3. (Sometimes,) dataset gets posted to repository
4. Researcher reports (post-hoc) to Institution and Funder
2
2
1
3
4
4
Share and Publish, Current Status:
| 16
Researche
rs
Funding
AgencyInstitution
Dataset
JournalPaper
2
2
1
3
4
4
iii. No link between data
and paper
iv. Funders/Institutions informed as an
afterthought
i. Too much work for researchers
ii. Data posting not mandatory
Data
Repository
Share and Publish, Issues:
| 17
Researche
rs
Funding
AgencyInstitution
Data
Repository
Dataset
Journal
Paper
1. Researcher creates datasets and posts to
repository(under embargo)
2. Funder is automatically notified of dataset publication
3. Researcher writes paper & publishes in journal;
embargo is lifted and data linked
- NB this also allows release of non-used data for negative result and
reproducibility
4. Funder and institution get report on publication and embargo lifting
2
1
1
3
3
3
4
4i. Less
Work!
iv. Better
Tracking!
iii. Better
Linking!
ii. More
Data
Stored!
Share and Publish, Proposal:
| 18
https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
A Maslow Hierarchy for Research Data:

Publishing the Full Research Data Lifecycle

  • 1.
    | 1 Anita deWaard, VP Research Data Collaborations Elsevier RDM Services a.dewaard@elsevier.com May 20, 2016 Publishing The Full Research Cycle To Support Open Science Container Strategies for Data & Software Preservation that Promote Open Science Notre Dame, IN
  • 2.
    | 2 Source: JISC:How and why you should manage your research data: a guide for researchers Caroline Ingram, Published: 7 January 2016 Research Data Life Cycle:
  • 3.
    | 3 Collaborate andAnalyse: Hivebench www.hivebench.com
  • 4.
    | 4 Manage, Store,Preserve: Data Rescue: Preserving Data At Risk https://olivearchive.org/ Software Rescue: Preserving Executable Content http://www.codata.org/task-groups/data-at- risk/dar-workshops
  • 5.
    | 5 https://data.mendeley.com/ Linked topublished papers – or not Linked to Github – or not Versioning and provenance Manage, Store, Preserve: Mendeley Data Allowing Different Licenses
  • 6.
    | 6 Data articles Software articles Method articles Protocols Video articles Hardware articles Lab resources Full Research paper •Brief article types designed to communicate a specific element of the research cycle • Complementary to full research papers • Easy to prepare and submit • Peer-reviewed and indexed • Receive a DOI and fully citable • Allow citable post-publication updates • Primarily Open Access (CC-BY) • Published in Multidisciplinary and domain-specific journals https://www.elsevier.com/books-and-journals/research-elements Share, Publish: Research Elements
  • 7.
    | 7 http://www.journals.elsevier.com/softwarex/ Share, Publish:SoftwareX • Submissions to SoftwareX are composed of - A short article describing the software, with a focus on the impact of the software in the research community and re-usability across disciplines - A “metadata table” containing information about the software and key metrics: - A permanent link to a software repository (GitHub) where the software and code is stored and maintained by Elsevier and made freely available • Peer Review - Follows a simple reviewer questionnaire, available from the SoftwareX website, that evaluates usability and scientific impact of the software - Less attention is placed on the technical quality of the software
  • 8.
    | 8 data uploadedon Mendeley Data code/softwar e deposited to GitHub software updates Software article peer-review process submitted SoftwareX Metadata Bi-directional links software article published; live stats shown code/software forked to the journal GitHub repository (open source) CC-BY linke d Data is publicly available on Mendeley Data (CC-BY) accepted Share, Publish: SoftwareX
  • 9.
  • 10.
    | 10 • Thefirst Reproducibility Paper was published recently: http://www.sciencedirect.com/science/article/pii/S0306437915301113 • It is linked to this paper: http://www.sciencedirect.com/science/article/pii/S0306437915000472 • The data is hosted here: https://data.mendeley.com/datasets/xz6gv65m6d/6 • To reproduce the experiment, the journal requires source code for the software components, together with installation scripts; we suggest authors to host their code in GitHub • In addition to the source code, we recommend authors to submit a virtual machine, where all appropriate software components are readily installed and can be reproduced on a wide variety of platforms. Authors are to submit their experiments using either ReproZip or Docker. Reuse: Reproducibility Papers
  • 11.
    | 11 Discover, Reuseand Cite: • ICSU-WDS/RDA Publishing Data Service Working group, merged with National Data Service pilot • Cross-stakeholder - with support and input from CrossRef, DataCite, OpenAIRE, Europe PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others • Proposed long-term architecture and interoperability framework: www.scholix.org • Operational prototype at http://dliservice.research-infrastructures.eu/#/api (including 1.4 Million links from various sources)
  • 12.
    | 12 Discover, Reuseand Cite: https://www.elsevier.com/connect/data-citation-is-becoming-real-with-force11-and-elsevier
  • 13.
    | 13 Publishing TheFull Research Cycle Requires Networks of Collaboration: Force11: - Multi-stakeholder, member-driven organisation - Unites scholars, tool developers, librarians, publishers, funding agencies etc. etc. - E.g. Software citation group, akin to Data Citation Group National Data Service: - Multi-stakeholder group, based around supercomputing centres - Aims to be a ‘connective tissue’ between data creation, curation, storage etc projects. - Inviting Pilots: two or more partners who have not worked together, interested in collaborating on a data-centric project to solve a real-world needs - E.g. Datasearch, Data Linking systems RDA: - Coleading Data publishing, linking group - Colead Cost Recovery group, part of RDA US Sustainability effort - Active in Chemistry, Earth Science groups, starting IG on Data Search - SciDataCon, Sept 11-16, Denver, CO The National DATA SERVICE
  • 14.
    | 14 • https://www.hivebench.com •https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015- international-data-rescue-award-in-the-geosciences • http://www.journals.elsevier.com/softwarex/ • https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking • https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://rd-alliance.org/bof-data-search.html • https://data.mendeley.com/ • https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data • https://www.force11.org/ • http://www.nationaldataservice.org/ • https://rd-alliance.org/ • https://www.elsevier.com/about/open-science/research-data Anita de Waard, a.dewaard@elsevier.com Thank you! Questions?
  • 15.
    | 15 Researche rs Funding AgencyInstitution Data RepositoryDataset JournalPaper 1. Researchercreates datasets 2. Researcher writes paper & publishes in journal 3. (Sometimes,) dataset gets posted to repository 4. Researcher reports (post-hoc) to Institution and Funder 2 2 1 3 4 4 Share and Publish, Current Status:
  • 16.
    | 16 Researche rs Funding AgencyInstitution Dataset JournalPaper 2 2 1 3 4 4 iii. Nolink between data and paper iv. Funders/Institutions informed as an afterthought i. Too much work for researchers ii. Data posting not mandatory Data Repository Share and Publish, Issues:
  • 17.
    | 17 Researche rs Funding AgencyInstitution Data Repository Dataset Journal Paper 1. Researchercreates datasets and posts to repository(under embargo) 2. Funder is automatically notified of dataset publication 3. Researcher writes paper & publishes in journal; embargo is lifted and data linked - NB this also allows release of non-used data for negative result and reproducibility 4. Funder and institution get report on publication and embargo lifting 2 1 1 3 3 3 4 4i. Less Work! iv. Better Tracking! iii. Better Linking! ii. More Data Stored! Share and Publish, Proposal:
  • 18.