SlideShare a Scribd company logo
Developing metadata curation
processes for data that can’t
be shared openly
Rebecca Grant, Graham Smith, Iain
Hrynaszkiewicz
IllustrationinspiredbytheworkofJohnMaynardKeynes
1
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
1
The context for curation support
2
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Stuart, David; Baynes, Grace; Hrynaszkiewicz, Iain; Allin, Katie;
Penny, Dan; Lucraft, Mithu; Astell, Mathias (2018): Whitepaper:
Practical challenges for researchers in data sharing
https://doi.org/10.6084/m9.figshare.5975011.v1
Practical Challenges for Researchers
in Data Sharing white paper
A global survey of nearly 8000
researchers
3
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Global levels of data sharing:
• Poland – 76% (highest)
• Germany – 75%
• UK – 58%
• USA – 55%
Private sharing of data is more common than public sharing
of data
4
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
11.52%
16.39%
20.22%
23.03%
28.24%
Costs of
sharing data
Lack of time
to deposit
data
Not knowing
which
repository to
use
Unsure
about
copyright
and licensing
Organising
data in a
presentable
and useful
way
Total respondents: 7719
Problems authors face in sharing datasets
5
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
• Recommended repositories list
• Research Data Helpdesk
• Research Data Policies
Joe Salter
Journal
Development Editor
Graham Smith
Senior Research
Data Editor
Varsha Khodiyar
Data Curation
Manager
Iain Hrynaszkiewicz
Head of Data
Publishing
Rebecca Grant
Research Data
Manager
Data curation at Springer Nature
6
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
No one other than the
creator can access the
data, or even knows that
it exists
Supporting data curation: a researcher’s dataset in a
desktop folder
7
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Pre-curation data checks:
 The data aren’t sensitive
 The data don’t include direct
or indirect human identifiers
 The data shouldn’t be in a
community repository
 The data are associated with
a trusted publication
After making these checks, we begin the
curation process. If necessary we may
recommend that the dataset is split into
smaller groups or collections.
Once received, we check to make sure that the
dataset is suitable for our curation services.
Multiple files in any format are accepted.
Before curation begins
8
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
The curated dataset will be published with
its own metadata record which includes
rich descriptive information, reuse
conditions, licence, DOI, metrics and
keywords
(this example is
https://doi.org/10.6084/m9.figshare.5259
415)
Working with the researcher’s manuscript or published paper, we draft a comprehensive
metadata record for the dataset which is sent to the researcher for approval before
being published. Embargoes can be applied if necessary.
Metadata curation output
9
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
9
Addressing the challenges of data that can’t be openly
shared
10
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
• Personally identifiable information.
• Special categories of personal information
(e.g. as specified by the GDPR or other
data protection legislation).
• Data revealing the location of rare,
endangered or commercially-valuable
species.
• Commercially sensitive data, for example
relating to industrial partners or collected
on their behalf.
What makes research data sensitive?
11
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Datasets should have 0
direct identifiers included
 Name
 Fingerprint
 Facial
photographs
 Signature
 Biometric
records
 Telephone
number
Direct identifiers relate directly to an individual and are
information that, on its own, allows the clear identification of
individuals.
Assessing sensitivity of personal data: direct identifiers
12
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Datasets should have <3
indirect identifiers included
Gender
Place of birth
Income
Race or ethnicity
Unusual features, e.g. rare
diseases, uncommon job titles,
or a large number of children
Indirect identifiers are information that allows the
identification of individuals through their combination with
other available information.
Assessing sensitivity of personal data: indirect identifiers
13
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
 De-identified and shared publicly, e.g. in a
repository.
 Deposited in a controlled access repository.
 Access managed and controlled by the
researcher (e.g. “available on request”).
Sensitive data can still be shared:
14
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Journal data policies may require data sharing, or a data
availability statement describing how data can be accessed.
• Authors may not have the expertise to de-identify data
appropriately; editors may not be able to advise.
• Alternatively, data are deposited in controlled access
repositories (sometimes with minimal metadata).
• Authors may also choose to share data on request (e.g. no
metadata is available at all).
The challenges of sharing sensitive data
15
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
doi:10.1038/s41523-018-0079-1
Data available on request
16
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Working with the curation team to provide editorial support for data
sharing:
 Reviewing accepted manuscripts.
 Providing advice on data sharing.
 Creating a metadata catalogue of rich metadata records for every article
in the journal’s repository.
 Writing detailed data availability statements.
 Build on existing data sharing practice at the journal and support more
authors to share.
npj Breast Cancer is an open access, online-
only, multidisciplinary research journal
dedicated to publishing the finest research on
breast cancer research and treatment.
Metadata curation for the journal npj Breast Cancer
17
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
• Curators connected with authors when an article is
accepted in principle.
• Advice given on de-identification of data.
• Advice given on suitable disciplinary repositories.
• Curator reads paper for data-related information.
• Additional information requested from author.
• Curator creates rich metadata record and DAS.
• Metadata and DAS reviewed and approved by author.
Curation workflow
18
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Data reporting checklist
19
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Authors were not
responsive even
though it held up
their article during
publication
The form was not
suitable for paper
which cited multiple
datasets which need
to be described.
 Type/format of data
 Filenames
 Software required
 Funder
 Additional documentation
For sensitive clinical data that aren’t shared openly:
 Sample size
 Cohort size
 Registered trial number
 Access requirements
Initially the team used a Google form to capture contextual information about the
author’s study and accompanying datasets.
Gathering contextual metadata
20
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Metadata collection is now based on:
A review of the author’s paper.
+ A short spreadsheet filled out by the author.
+ A review of the author’s datasets available in other repositories.
+ Email directly to the author where necessary.
= Rich contextual information about the datasets
+ A consistent format for the metadata we use to describe studies
Adapting the metadata collection process
21
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Example output: metadata record for data available on request
• Authors
• Title
• A description of the
study design
• Data type
• Data format
• Number of files, file
names
• Software required
• Access requirements
• Funder information
• Keywords
• Link to associated
paper
• Metrics
The dataset is available on request only due
to commercial sensitivity. The metadata
record is stored in the npj Breast Cancer
figshare portal and includes:
22
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
https://doi.org/10.1038/s41523-019-
0106-x
Example output: data availability statement
23
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
doi:10.1038/s41523-018-0079-1
Before: data available on request
24
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
14 submissions (journal articles) to date:
 3 were deposited in specialist repositories on the curator’s
recommendation (including GEO and dbGap) – Potential risk to funding if this
was not done.
 1 was a commercially sensitive dataset which required assessment,
advised not to share openly – Potential risk of legal liability if shared without
permission.
 1 paper originally consisted of references to articles for 39 gene
expression datasets, which the curator used to create a table including DOIs
and accession numbers for each.
 800+ views of the metadata records in the repository.
Curation impacts so far
25
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
Improvements to accessibility of all datasets, particularly those only
available on request.
Opportunity for researcher to identify issues in related publications,
e.g. incorrect accession codes.
Allows curation without access to sensitive datasets, capitalising on
knowledge of the researcher and the journal editor.
Increasing accessibility of curation to a larger proportion of researchers
– does not exclude those who cannot share openly.
Demonstrating an approach that’s compatible with generalist or
institutional repositories.
Other impacts
26
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
26
The story behind the image
John Maynard Keynes (1883–1946)
John Maynard Keynes was a British economist who
revolutionised the theory and practice of macroeconomics,
reformed economics and had a profound influence on
economic policy. This illustration represents the Keynesian
model which shows that in a monetary economy it is
possible to have periods of high unemployment unless
governments use active monetary and fiscal policy to
stimulate aggregate demand.
Rebecca Grant, Research Data Manager
Researchdata@springernature.com /
Rebecca.Grant@springernature.com
https://go.nature.com/ResearchDataServices
https://researchdata.springernature.com/
Thank you
27
Developing metadata curation processes for data that can’t be shared openly
CC-BY-ND 2019
27
The story behind the image
John Maynard Keynes (1883–1946)
John Maynard Keynes was a British economist who
revolutionised the theory and practice of macroeconomics,
reformed economics and had a profound influence on
economic policy. This illustration represents the Keynesian
model which shows that in a monetary economy it is
possible to have periods of high unemployment unless
governments use active monetary and fiscal policy to
stimulate aggregate demand.
Slide 10: photo by Frida Bredesen on Unsplash
Slide 12: photo by Ashley Edwards on Unsplash
Image credits

More Related Content

What's hot

dkNET Webinar: dkNET Hypothesis Center Live Demo 09/24/2021
dkNET Webinar: dkNET Hypothesis Center Live Demo 09/24/2021dkNET Webinar: dkNET Hypothesis Center Live Demo 09/24/2021
dkNET Webinar: dkNET Hypothesis Center Live Demo 09/24/2021
dkNET
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
DataONE
 
Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018
Clare Dean
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
National Information Standards Organization (NISO)
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
Michel Dumontier
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
Tom Plasterer
 
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
From Data Policy Towards FAIR Data For All: How standardised data policies ca...From Data Policy Towards FAIR Data For All: How standardised data policies ca...
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
Rebecca Grant
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
Michel Dumontier
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016
dkNET
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Tom Plasterer
 
Compliance: Data Management Plans and Public Access to Data
Compliance: Data Management Plans and Public Access to DataCompliance: Data Management Plans and Public Access to Data
Compliance: Data Management Plans and Public Access to Data
Margaret Henderson
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET
 
Inroads into Data: Getting Involved in Data at Your Institution
Inroads into Data: Getting Involved in Data at Your InstitutionInroads into Data: Getting Involved in Data at Your Institution
Inroads into Data: Getting Involved in Data at Your Institution
Margaret Henderson
 
dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016
dkNET
 
Horizon 2020 and the open research data pilot
Horizon 2020 and the open research data pilotHorizon 2020 and the open research data pilot
Horizon 2020 and the open research data pilot
Sarah Jones
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
Varsha Khodiyar
 
dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017
dkNET
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
Sherry Lake
 
23 things for Research Data - LIBER webinar 23 Feb 2017
23 things for Research Data - LIBER webinar 23 Feb 201723 things for Research Data - LIBER webinar 23 Feb 2017
23 things for Research Data - LIBER webinar 23 Feb 2017
ARDC
 

What's hot (20)

dkNET Webinar: dkNET Hypothesis Center Live Demo 09/24/2021
dkNET Webinar: dkNET Hypothesis Center Live Demo 09/24/2021dkNET Webinar: dkNET Hypothesis Center Live Demo 09/24/2021
dkNET Webinar: dkNET Hypothesis Center Live Demo 09/24/2021
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
 
Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
From Data Policy Towards FAIR Data For All: How standardised data policies ca...From Data Policy Towards FAIR Data For All: How standardised data policies ca...
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
 
Compliance: Data Management Plans and Public Access to Data
Compliance: Data Management Plans and Public Access to DataCompliance: Data Management Plans and Public Access to Data
Compliance: Data Management Plans and Public Access to Data
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
Inroads into Data: Getting Involved in Data at Your Institution
Inroads into Data: Getting Involved in Data at Your InstitutionInroads into Data: Getting Involved in Data at Your Institution
Inroads into Data: Getting Involved in Data at Your Institution
 
dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016
 
Horizon 2020 and the open research data pilot
Horizon 2020 and the open research data pilotHorizon 2020 and the open research data pilot
Horizon 2020 and the open research data pilot
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
 
23 things for Research Data - LIBER webinar 23 Feb 2017
23 things for Research Data - LIBER webinar 23 Feb 201723 things for Research Data - LIBER webinar 23 Feb 2017
23 things for Research Data - LIBER webinar 23 Feb 2017
 

Similar to Developing metadata curation processes for data that can’t be shared openly

DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
DataONE
 
Life Science Analytics
Life Science AnalyticsLife Science Analytics
Life Science Analytics
Andrew Malinow, PhD
 
Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...
Varsha Khodiyar
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
Globus
 
Digital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceDigital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data science
Varsha Khodiyar
 
Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data
Varsha Khodiyar
 
2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop
Lizzy_Rolando
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
David Peyruc
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management Plans
Sarah Jones
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
Robert Grossman
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
Robert Grossman
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
Jisc RDM
 
Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
Carolyn Ten Holter
 
Shareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your ResearchShareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your Research
London School of Hygiene and Tropical Medicine
 
DMP health sciences
DMP health sciencesDMP health sciences
DMP health sciences
Sarah Jones
 
North American funders' DMP requirements
North American funders' DMP requirementsNorth American funders' DMP requirements
North American funders' DMP requirements
Sarah Jones
 
Hahnel "Open Data Policies: Opportunities, compliance and technology strategies"
Hahnel "Open Data Policies: Opportunities, compliance and technology strategies"Hahnel "Open Data Policies: Opportunities, compliance and technology strategies"
Hahnel "Open Data Policies: Opportunities, compliance and technology strategies"
National Information Standards Organization (NISO)
 
Publishing Data on the Web
Publishing Data on the Web Publishing Data on the Web
Publishing Data on the Web
Centro Web
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
The University of Edinburgh
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT
 

Similar to Developing metadata curation processes for data that can’t be shared openly (20)

DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
Life Science Analytics
Life Science AnalyticsLife Science Analytics
Life Science Analytics
 
Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
 
Digital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceDigital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data science
 
Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data
 
2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management Plans
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
 
Shareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your ResearchShareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your Research
 
DMP health sciences
DMP health sciencesDMP health sciences
DMP health sciences
 
North American funders' DMP requirements
North American funders' DMP requirementsNorth American funders' DMP requirements
North American funders' DMP requirements
 
Hahnel "Open Data Policies: Opportunities, compliance and technology strategies"
Hahnel "Open Data Policies: Opportunities, compliance and technology strategies"Hahnel "Open Data Policies: Opportunities, compliance and technology strategies"
Hahnel "Open Data Policies: Opportunities, compliance and technology strategies"
 
Publishing Data on the Web
Publishing Data on the Web Publishing Data on the Web
Publishing Data on the Web
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
 

More from Rebecca Grant

Increasing transparency in Medical Education through Open Data
Increasing transparency in Medical Education through Open Data Increasing transparency in Medical Education through Open Data
Increasing transparency in Medical Education through Open Data
Rebecca Grant
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career Researchers
Rebecca Grant
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
Rebecca Grant
 
Do Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer NatureDo Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer Nature
Rebecca Grant
 
Positioning record keepers as data management professionals
Positioning record keepers as data management professionalsPositioning record keepers as data management professionals
Positioning record keepers as data management professionals
Rebecca Grant
 
A National Approach to Open Data in Ireland: Publishers and Research Data Man...
A National Approach to Open Data in Ireland: Publishers and Research Data Man...A National Approach to Open Data in Ireland: Publishers and Research Data Man...
A National Approach to Open Data in Ireland: Publishers and Research Data Man...
Rebecca Grant
 
Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?
Rebecca Grant
 

More from Rebecca Grant (7)

Increasing transparency in Medical Education through Open Data
Increasing transparency in Medical Education through Open Data Increasing transparency in Medical Education through Open Data
Increasing transparency in Medical Education through Open Data
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career Researchers
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Do Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer NatureDo Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer Nature
 
Positioning record keepers as data management professionals
Positioning record keepers as data management professionalsPositioning record keepers as data management professionals
Positioning record keepers as data management professionals
 
A National Approach to Open Data in Ireland: Publishers and Research Data Man...
A National Approach to Open Data in Ireland: Publishers and Research Data Man...A National Approach to Open Data in Ireland: Publishers and Research Data Man...
A National Approach to Open Data in Ireland: Publishers and Research Data Man...
 
Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?
 

Recently uploaded

Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Xiao Xu
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
hiju9823
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
frp60658
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 

Recently uploaded (20)

Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 

Developing metadata curation processes for data that can’t be shared openly

  • 1. Developing metadata curation processes for data that can’t be shared openly Rebecca Grant, Graham Smith, Iain Hrynaszkiewicz IllustrationinspiredbytheworkofJohnMaynardKeynes
  • 2. 1 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 1 The context for curation support
  • 3. 2 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Stuart, David; Baynes, Grace; Hrynaszkiewicz, Iain; Allin, Katie; Penny, Dan; Lucraft, Mithu; Astell, Mathias (2018): Whitepaper: Practical challenges for researchers in data sharing https://doi.org/10.6084/m9.figshare.5975011.v1 Practical Challenges for Researchers in Data Sharing white paper A global survey of nearly 8000 researchers
  • 4. 3 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Global levels of data sharing: • Poland – 76% (highest) • Germany – 75% • UK – 58% • USA – 55% Private sharing of data is more common than public sharing of data
  • 5. 4 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 11.52% 16.39% 20.22% 23.03% 28.24% Costs of sharing data Lack of time to deposit data Not knowing which repository to use Unsure about copyright and licensing Organising data in a presentable and useful way Total respondents: 7719 Problems authors face in sharing datasets
  • 6. 5 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 • Recommended repositories list • Research Data Helpdesk • Research Data Policies Joe Salter Journal Development Editor Graham Smith Senior Research Data Editor Varsha Khodiyar Data Curation Manager Iain Hrynaszkiewicz Head of Data Publishing Rebecca Grant Research Data Manager Data curation at Springer Nature
  • 7. 6 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 No one other than the creator can access the data, or even knows that it exists Supporting data curation: a researcher’s dataset in a desktop folder
  • 8. 7 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Pre-curation data checks:  The data aren’t sensitive  The data don’t include direct or indirect human identifiers  The data shouldn’t be in a community repository  The data are associated with a trusted publication After making these checks, we begin the curation process. If necessary we may recommend that the dataset is split into smaller groups or collections. Once received, we check to make sure that the dataset is suitable for our curation services. Multiple files in any format are accepted. Before curation begins
  • 9. 8 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 The curated dataset will be published with its own metadata record which includes rich descriptive information, reuse conditions, licence, DOI, metrics and keywords (this example is https://doi.org/10.6084/m9.figshare.5259 415) Working with the researcher’s manuscript or published paper, we draft a comprehensive metadata record for the dataset which is sent to the researcher for approval before being published. Embargoes can be applied if necessary. Metadata curation output
  • 10. 9 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 9 Addressing the challenges of data that can’t be openly shared
  • 11. 10 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 • Personally identifiable information. • Special categories of personal information (e.g. as specified by the GDPR or other data protection legislation). • Data revealing the location of rare, endangered or commercially-valuable species. • Commercially sensitive data, for example relating to industrial partners or collected on their behalf. What makes research data sensitive?
  • 12. 11 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Datasets should have 0 direct identifiers included  Name  Fingerprint  Facial photographs  Signature  Biometric records  Telephone number Direct identifiers relate directly to an individual and are information that, on its own, allows the clear identification of individuals. Assessing sensitivity of personal data: direct identifiers
  • 13. 12 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Datasets should have <3 indirect identifiers included Gender Place of birth Income Race or ethnicity Unusual features, e.g. rare diseases, uncommon job titles, or a large number of children Indirect identifiers are information that allows the identification of individuals through their combination with other available information. Assessing sensitivity of personal data: indirect identifiers
  • 14. 13 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019  De-identified and shared publicly, e.g. in a repository.  Deposited in a controlled access repository.  Access managed and controlled by the researcher (e.g. “available on request”). Sensitive data can still be shared:
  • 15. 14 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Journal data policies may require data sharing, or a data availability statement describing how data can be accessed. • Authors may not have the expertise to de-identify data appropriately; editors may not be able to advise. • Alternatively, data are deposited in controlled access repositories (sometimes with minimal metadata). • Authors may also choose to share data on request (e.g. no metadata is available at all). The challenges of sharing sensitive data
  • 16. 15 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 doi:10.1038/s41523-018-0079-1 Data available on request
  • 17. 16 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Working with the curation team to provide editorial support for data sharing:  Reviewing accepted manuscripts.  Providing advice on data sharing.  Creating a metadata catalogue of rich metadata records for every article in the journal’s repository.  Writing detailed data availability statements.  Build on existing data sharing practice at the journal and support more authors to share. npj Breast Cancer is an open access, online- only, multidisciplinary research journal dedicated to publishing the finest research on breast cancer research and treatment. Metadata curation for the journal npj Breast Cancer
  • 18. 17 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 • Curators connected with authors when an article is accepted in principle. • Advice given on de-identification of data. • Advice given on suitable disciplinary repositories. • Curator reads paper for data-related information. • Additional information requested from author. • Curator creates rich metadata record and DAS. • Metadata and DAS reviewed and approved by author. Curation workflow
  • 19. 18 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Data reporting checklist
  • 20. 19 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Authors were not responsive even though it held up their article during publication The form was not suitable for paper which cited multiple datasets which need to be described.  Type/format of data  Filenames  Software required  Funder  Additional documentation For sensitive clinical data that aren’t shared openly:  Sample size  Cohort size  Registered trial number  Access requirements Initially the team used a Google form to capture contextual information about the author’s study and accompanying datasets. Gathering contextual metadata
  • 21. 20 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Metadata collection is now based on: A review of the author’s paper. + A short spreadsheet filled out by the author. + A review of the author’s datasets available in other repositories. + Email directly to the author where necessary. = Rich contextual information about the datasets + A consistent format for the metadata we use to describe studies Adapting the metadata collection process
  • 22. 21 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Example output: metadata record for data available on request • Authors • Title • A description of the study design • Data type • Data format • Number of files, file names • Software required • Access requirements • Funder information • Keywords • Link to associated paper • Metrics The dataset is available on request only due to commercial sensitivity. The metadata record is stored in the npj Breast Cancer figshare portal and includes:
  • 23. 22 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 https://doi.org/10.1038/s41523-019- 0106-x Example output: data availability statement
  • 24. 23 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 doi:10.1038/s41523-018-0079-1 Before: data available on request
  • 25. 24 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 14 submissions (journal articles) to date:  3 were deposited in specialist repositories on the curator’s recommendation (including GEO and dbGap) – Potential risk to funding if this was not done.  1 was a commercially sensitive dataset which required assessment, advised not to share openly – Potential risk of legal liability if shared without permission.  1 paper originally consisted of references to articles for 39 gene expression datasets, which the curator used to create a table including DOIs and accession numbers for each.  800+ views of the metadata records in the repository. Curation impacts so far
  • 26. 25 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 Improvements to accessibility of all datasets, particularly those only available on request. Opportunity for researcher to identify issues in related publications, e.g. incorrect accession codes. Allows curation without access to sensitive datasets, capitalising on knowledge of the researcher and the journal editor. Increasing accessibility of curation to a larger proportion of researchers – does not exclude those who cannot share openly. Demonstrating an approach that’s compatible with generalist or institutional repositories. Other impacts
  • 27. 26 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 26 The story behind the image John Maynard Keynes (1883–1946) John Maynard Keynes was a British economist who revolutionised the theory and practice of macroeconomics, reformed economics and had a profound influence on economic policy. This illustration represents the Keynesian model which shows that in a monetary economy it is possible to have periods of high unemployment unless governments use active monetary and fiscal policy to stimulate aggregate demand. Rebecca Grant, Research Data Manager Researchdata@springernature.com / Rebecca.Grant@springernature.com https://go.nature.com/ResearchDataServices https://researchdata.springernature.com/ Thank you
  • 28. 27 Developing metadata curation processes for data that can’t be shared openly CC-BY-ND 2019 27 The story behind the image John Maynard Keynes (1883–1946) John Maynard Keynes was a British economist who revolutionised the theory and practice of macroeconomics, reformed economics and had a profound influence on economic policy. This illustration represents the Keynesian model which shows that in a monetary economy it is possible to have periods of high unemployment unless governments use active monetary and fiscal policy to stimulate aggregate demand. Slide 10: photo by Frida Bredesen on Unsplash Slide 12: photo by Ashley Edwards on Unsplash Image credits

Editor's Notes

  1. In 2017, Springer Nature surveyed > 7,700 researchers worldwide, asking specifically about data sharing at the point of submitting an article for publication. The level of respondents from some regions of interest – notably Japan and China – meant that we could not do detailed analysis, so this year we have begun to extend our research to these territories:
  2. Main findings: Researchers do share and use one another’s data but lack places to put it. They would value a high quality data publication
  3. Mainly covering the left hand side of this list due to time
  4. A natural person is a person that is an individual human being, as opposed to a legal person, which may be a private (i.e., business entity or non-governmental organisation) or public (i.e., government) organisation
  5. If I was trying to identify a person in a dataset, what kind of information would allow me to recognise them uniquely?
  6. Features such as gender or place of birth aren’t usually unique in a dataset, but once they are combined they can become much more identifying. There are far more indirect identifiers than direct, and it can be more difficult to figure out whether they are identifying or not.
  7. A natural person is a person that is an individual human being, as opposed to a legal person, which may be a private (i.e., business entity or non-governmental organisation) or public (i.e., government) organisation
  8. A natural person is a person that is an individual human being, as opposed to a legal person, which may be a private (i.e., business entity or non-governmental organisation) or public (i.e., government) organisation
  9. Note that as well as authors not filling out the form, it was difficult to do for multiple datasets from one study
  10. Process takes around 4 hours of a curator’s time including gathering the information, drafting the record and allowing review