Providing support and
services for researchers in
good data governance
Robin Rice
University of Edinburgh
Characteristics of Good
Data Governance
Royal Statistical Society Ethics SIG
Newcastle
13 May, 2019
 About UoE and Information Services
 Research Data
Management Policy
 Research Data Service
 DataVault
 Data Safe Haven
 Data governance for
researchers: training, outreach & key messages
2
Overview
About University of Edinburgh
• Founded 1583
• 39,576 students
• 14,346 postgraduate students
• 6,816 academic staff
• Research income: £253.9
million (2015-16)
• Mission: the creation,
dissemination and curation of
knowledge
• 20 schools in 3 colleges:
– Arts, Humanities & Social
Sciences
– Medicine & Veterinary Medicine
– Science & Engineering
3
© www.nealesmith.com
About UoE Information Services
• Library & University
Collections
• IT Infrastructure
• Applications
• Learning, Teaching &
Web
• User Services
• Information Security
• EDINA
• Digital Curation
Centre
4
Argyle House © CoStar
University’s RDM Policy (May, 2011)
https://www.ed.ac.uk/is/
research-data-policy/
Policy by Nick Youngson CC BY-SA 3.0
Alpha Stock Images
• Commitment to
research integrity,
DMPs, open data
• Articulates clear
responsibilities of the
researcher and of the
institution
5
UoE Research Data Service = Tools and support
for working across the data lifecycle
6
https://www.ed.ac.uk/is/research
-data-service
Tools and Support Description
DMPOnline Online tool to create a data
management plan, based on
University and funders’ templates
Support and DMP Review Answer enquiries and review plans,
provide advice; in-depth or quick
turaround
Sample DMPs Library of successful plans to show
researchers in different disciplines
Before your research project begins
7
Tools and Support Description
Discover and re-use data Data portal and data librarian consultancy;
help with accessing / purchase of datasets or
data subscriptions
Active data storage (DataStore) Central, backed up storage for all researchers
- individual and shared spaces
Sensitive data
(Data Safe Haven)
New, secure facility for working with sensitive
data on remote server. We are pursuing ISO
27001 security certification
Code versioning (Subversion,
Gitlab)
Private or public software code storage and
management. Documents all code and allows
rollback to prior versions
Collaboration and data sync’ing
(DataSync)
Open source tool to allow external partners
to access your research data
Electronic Lab Notebook
(RSpace)
Data management for laboratory based
research; interoperable with local systems
Research in progress
8
Tools and Support Description
Open Access data repository
(DataShare)
Allows researchers to share
data publicly and preserve for
long-term
Long-term retention
(DataVault)
Deposit datasets for a specified
retention period (for example,
10 years), immutable copy
Data asset register through the
University CRIS (Pure for
datasets)
Record a description of your
dataset along with your
publications and research
projects
Approaching completion
9
Tools and Support Description
General RDM support Answer enquiries by email, phone or
appointment; track through helpdesk system
Online training (MANTRA
and RDMS MOOC)
Learn online at your own pace or with a cohort
of peers through our open educational
resources
Scheduled and bespoke
training
Sign up for a scheduled workshop or request a
special training session for your research group
Research Data Service
website
All the tools and support in one place,
increasingly self-serve
Blog and promotional
materials
New developments on our Research Data Blog.
Service video and brochure
Dealing with Data annual
event & workshop series
Annual conference of researchers talking about
their data challenges and solutions
Research Data Workshop
series in various settings
Compact, catered networking events for
researchers to engage with the service & each
other about challenging topics
Training and support throughout your project
DRILLING DOWN: KEY SERVICES
RELATING TO DATA
GOVERNANCE
• Data Safe Haven (during)
• Safe people, safe settings
• Edinburgh DataShare (after)
• Safe data
• DataVault (after)
• Safe people, safe projects, safe settings
References to the 5 Safes: https://en.wikipedia.org/wiki/Five_safes
For projects requiring advanced security, the Data
Safe Haven (DSH) provides a controlled and
secured service environment for undertaking
research using sensitive data.
The service provides robust controls and
safeguards to enable the secure transfer of
sensitive data into a highly secure environment
where it can be stored, manipulated and analysed
by approved members of a research team.
12
Data Safe Haven (1 of 3)
A virtual desktop environment built to your project-specific
needs.
Technical and operational procedures to safeguard the security
of sensitive data.
Access restricted to authorised team members.
Utilises Gatekeeper roles (trained individuals) to vet and
approve all ingress and egress of data.
2-factor authentication.
End to end encryption.
Up to 5 Terabytes of data storage.
1 CPU 8Gb RAM virtual computer environment.
Key data analysis tools and packages (e.g. SPSS, MatLab, R).
13
What does the service provide? (2)
You MUST consider using a Data Safe Haven if:
you have special category personal data,
you have data likely to have significant negative public
impact if released,
You cannot pseudonymise / anonymise the data (e.g.
videos of vulnerable people),
you have confidential data including data generated or
used under a restrictive commercial research funding
agreement,
your data provider (e.g., NHS) requires advanced
security to protect their data.
14
Data that require advanced security (3)
• Where it is intended that data will ultimately be
made public, they should instead be deposited
either in a suitable disciplinary repository or in
DataShare (https://datashare.is.ed.ac.uk/) our
open access data repository.
• DataShare deposits may be placed under embargo up
to 5 years, so that files will remain inaccessible
temporarily.
• Data needing to be retained only for a short
period.
• Data in which a student owns the copyright.
16
What is DataVault for? (1 of 3)
• Where it is intended that data will ultimately be
made public, they should instead be deposited
either in a suitable disciplinary repository or in
DataShare (https://datashare.is.ed.ac.uk/) our
open access data repository.
• DataShare deposits may be placed under embargo up
to 5 years, so that files will remain inaccessible
temporarily.
• Data needing to be retained only for a short
period.
• Data in which a student owns the copyright.
What is DataVault *not* for? (2)
Fills a gap for a complete data lifecycle institutional
service, helping to fulfil the 2011 RDM policy
Facilitates a collection of institutional data assets to be
managed by the University
Incentivises open sharing by pairing with DataShare
Open metadata records even though nominally ‘closed’
Buys time for appraising data worthy of further
curation
Combines paradigms of data centres and digital
preservation
What is innovative about DataVault? (3)
TRAINING, OUTREACH: SOME KEY
MESSAGES
Blog posts
RDM Training (online and offline)
Sample training slides
Research Data Workshop Series (sensitive data)
19
Personal data: What does GDPR mean
for your research data? (Blog post)
Consent remains a key for working with human subjects ethically and
legally, but at the University of Edinburgh and other HEIs, the legal
basis for processing research data by academic staff may not be
consent, it may simply be that research is the public task of the
University. This shifts consent into the ethical column, while also
ensuring fair, transparent, and lawful processing as part of GDPR
principles.
From: http://datablog.is.ed.ac.uk/2018/12/20/personal-data-what-does-
gdpr-mean-for-your-research-data/
Quick Guide–Research Data Management and GDPR: Do’s and
Don’ts
http://datablog.is.ed.ac.uk/files/2018/12/GDPR-Fact-Sheet-20-12-2018.pdf
RDM Training: Research Data Service
• Research Data Management and Sharing
• Research Data MANTRA (both online)
• Creating a data management plan
• Good practice in research data
management
• Working with personal & sensitive
research data
Why share data? (1 of 5)
From: Journal of Open Archaeology Data, CC-BY 3.0
Pseudonymize/anonymize data before storing.
Store identifiable data by stripping off identifiers.
Store identifiers in a separate encrypted container.
Encrypt identifiable data on portable devices, or
encrypt the entire device.
Give access to data only to authorized people.
Keep identifiable data on a secure, backed-up central
server and do not allow copies to proliferate.
23
Security safeguards (2)
Generally -
Data relating to people (i.e. 'personal data' as defined
in the General Data Protection Regulation).
Data generated or used under a restrictive commercial
research funding agreement.
Any data posing a threat to others or to national
security.
Data relating to rare or endangered species of plants or
animals.
Any data likely to have significant negative public
impact if released.
24
What do we mean by sensitive data?(3)
Ethical and legal perspectives on research data
Legal (GDPR) definitions, principles for research
Strategies for
 Data management plans (DMPs) and data protection
impact assessments (DPIAs)
 Data collection, consent, and transparency
 Active data management and data security
 Data sharing – anonymisation & controlling access
References for further information
25
Working with Personal and Sensitive
Data Overview (4)
1. Research is in the public interest.
2. Consent to take part in research is important.
3. GDPR recognises that research data is valuable,
it can be kept long-term.
4. GDPR forces a record of historical decision-
making.
5. GDPR safeguards reflect current research good
practice.
26
UKRI: Why GDPR matters for research (5)
From https://blog.esrc.ac.uk/2018/05/25/why-gdpr-matters-for-research
UoE Research Data Workshop Series:
Sensitive Data Challenges and
Solutions (April, 2019)
Researchers face a number of technical, ethical and legal
challenges in creating, analysing and managing research
data, including pressure to increase transparency and
conduct research openly. But for those who have
collected or are re-using sensitive or confidential data,
these challenges can be particularly taxing. …
Researchers attending this workshop will have the
opportunity to hear from experienced researchers on
related topics.
A. Working with sensitive data/research
 What does research involving various forms of sensitive data have in common, even
across disciplines?
 What do researchers need to learn about working with sensitive data or are they well
prepared through disciplinary knowledge of research methods and ethics?
 Does the GDPR / UK Data Protection Act 2018 change anything about current practices?
B. Requirements from service providers
 Do Data Safe Havens serve a useful purpose for University researchers?
 What kind of risks are involved in doing this sort of research and how should the
University be helping / be accountable?
C. Cost recovery
 Are funders willing to pay the costs necessary for working with sensitive data?
 What should happen if research projects don’t have sufficient funds to cover costs of
working with sensitive data?
Sensitive data workshop: Breakout
group questions for discussion
QUESTIONS?
Service home page: https://www.ed.ac.uk/is/research-data-service
Edinburgh Research Data Blog: http://datablog.is.ed.ac.uk/
r.rice@ed.ac.uk
@sparrowbarley

Providing support and services for researchers in good data governance

  • 1.
    Providing support and servicesfor researchers in good data governance Robin Rice University of Edinburgh Characteristics of Good Data Governance Royal Statistical Society Ethics SIG Newcastle 13 May, 2019
  • 2.
     About UoEand Information Services  Research Data Management Policy  Research Data Service  DataVault  Data Safe Haven  Data governance for researchers: training, outreach & key messages 2 Overview
  • 3.
    About University ofEdinburgh • Founded 1583 • 39,576 students • 14,346 postgraduate students • 6,816 academic staff • Research income: £253.9 million (2015-16) • Mission: the creation, dissemination and curation of knowledge • 20 schools in 3 colleges: – Arts, Humanities & Social Sciences – Medicine & Veterinary Medicine – Science & Engineering 3 © www.nealesmith.com
  • 4.
    About UoE InformationServices • Library & University Collections • IT Infrastructure • Applications • Learning, Teaching & Web • User Services • Information Security • EDINA • Digital Curation Centre 4 Argyle House © CoStar
  • 5.
    University’s RDM Policy(May, 2011) https://www.ed.ac.uk/is/ research-data-policy/ Policy by Nick Youngson CC BY-SA 3.0 Alpha Stock Images • Commitment to research integrity, DMPs, open data • Articulates clear responsibilities of the researcher and of the institution 5
  • 6.
    UoE Research DataService = Tools and support for working across the data lifecycle 6 https://www.ed.ac.uk/is/research -data-service
  • 7.
    Tools and SupportDescription DMPOnline Online tool to create a data management plan, based on University and funders’ templates Support and DMP Review Answer enquiries and review plans, provide advice; in-depth or quick turaround Sample DMPs Library of successful plans to show researchers in different disciplines Before your research project begins 7
  • 8.
    Tools and SupportDescription Discover and re-use data Data portal and data librarian consultancy; help with accessing / purchase of datasets or data subscriptions Active data storage (DataStore) Central, backed up storage for all researchers - individual and shared spaces Sensitive data (Data Safe Haven) New, secure facility for working with sensitive data on remote server. We are pursuing ISO 27001 security certification Code versioning (Subversion, Gitlab) Private or public software code storage and management. Documents all code and allows rollback to prior versions Collaboration and data sync’ing (DataSync) Open source tool to allow external partners to access your research data Electronic Lab Notebook (RSpace) Data management for laboratory based research; interoperable with local systems Research in progress 8
  • 9.
    Tools and SupportDescription Open Access data repository (DataShare) Allows researchers to share data publicly and preserve for long-term Long-term retention (DataVault) Deposit datasets for a specified retention period (for example, 10 years), immutable copy Data asset register through the University CRIS (Pure for datasets) Record a description of your dataset along with your publications and research projects Approaching completion 9
  • 10.
    Tools and SupportDescription General RDM support Answer enquiries by email, phone or appointment; track through helpdesk system Online training (MANTRA and RDMS MOOC) Learn online at your own pace or with a cohort of peers through our open educational resources Scheduled and bespoke training Sign up for a scheduled workshop or request a special training session for your research group Research Data Service website All the tools and support in one place, increasingly self-serve Blog and promotional materials New developments on our Research Data Blog. Service video and brochure Dealing with Data annual event & workshop series Annual conference of researchers talking about their data challenges and solutions Research Data Workshop series in various settings Compact, catered networking events for researchers to engage with the service & each other about challenging topics Training and support throughout your project
  • 11.
    DRILLING DOWN: KEYSERVICES RELATING TO DATA GOVERNANCE • Data Safe Haven (during) • Safe people, safe settings • Edinburgh DataShare (after) • Safe data • DataVault (after) • Safe people, safe projects, safe settings References to the 5 Safes: https://en.wikipedia.org/wiki/Five_safes
  • 12.
    For projects requiringadvanced security, the Data Safe Haven (DSH) provides a controlled and secured service environment for undertaking research using sensitive data. The service provides robust controls and safeguards to enable the secure transfer of sensitive data into a highly secure environment where it can be stored, manipulated and analysed by approved members of a research team. 12 Data Safe Haven (1 of 3)
  • 13.
    A virtual desktopenvironment built to your project-specific needs. Technical and operational procedures to safeguard the security of sensitive data. Access restricted to authorised team members. Utilises Gatekeeper roles (trained individuals) to vet and approve all ingress and egress of data. 2-factor authentication. End to end encryption. Up to 5 Terabytes of data storage. 1 CPU 8Gb RAM virtual computer environment. Key data analysis tools and packages (e.g. SPSS, MatLab, R). 13 What does the service provide? (2)
  • 14.
    You MUST considerusing a Data Safe Haven if: you have special category personal data, you have data likely to have significant negative public impact if released, You cannot pseudonymise / anonymise the data (e.g. videos of vulnerable people), you have confidential data including data generated or used under a restrictive commercial research funding agreement, your data provider (e.g., NHS) requires advanced security to protect their data. 14 Data that require advanced security (3)
  • 16.
    • Where itis intended that data will ultimately be made public, they should instead be deposited either in a suitable disciplinary repository or in DataShare (https://datashare.is.ed.ac.uk/) our open access data repository. • DataShare deposits may be placed under embargo up to 5 years, so that files will remain inaccessible temporarily. • Data needing to be retained only for a short period. • Data in which a student owns the copyright. 16 What is DataVault for? (1 of 3)
  • 17.
    • Where itis intended that data will ultimately be made public, they should instead be deposited either in a suitable disciplinary repository or in DataShare (https://datashare.is.ed.ac.uk/) our open access data repository. • DataShare deposits may be placed under embargo up to 5 years, so that files will remain inaccessible temporarily. • Data needing to be retained only for a short period. • Data in which a student owns the copyright. What is DataVault *not* for? (2)
  • 18.
    Fills a gapfor a complete data lifecycle institutional service, helping to fulfil the 2011 RDM policy Facilitates a collection of institutional data assets to be managed by the University Incentivises open sharing by pairing with DataShare Open metadata records even though nominally ‘closed’ Buys time for appraising data worthy of further curation Combines paradigms of data centres and digital preservation What is innovative about DataVault? (3)
  • 19.
    TRAINING, OUTREACH: SOMEKEY MESSAGES Blog posts RDM Training (online and offline) Sample training slides Research Data Workshop Series (sensitive data) 19
  • 20.
    Personal data: Whatdoes GDPR mean for your research data? (Blog post) Consent remains a key for working with human subjects ethically and legally, but at the University of Edinburgh and other HEIs, the legal basis for processing research data by academic staff may not be consent, it may simply be that research is the public task of the University. This shifts consent into the ethical column, while also ensuring fair, transparent, and lawful processing as part of GDPR principles. From: http://datablog.is.ed.ac.uk/2018/12/20/personal-data-what-does- gdpr-mean-for-your-research-data/ Quick Guide–Research Data Management and GDPR: Do’s and Don’ts http://datablog.is.ed.ac.uk/files/2018/12/GDPR-Fact-Sheet-20-12-2018.pdf
  • 21.
    RDM Training: ResearchData Service • Research Data Management and Sharing • Research Data MANTRA (both online) • Creating a data management plan • Good practice in research data management • Working with personal & sensitive research data
  • 22.
    Why share data?(1 of 5) From: Journal of Open Archaeology Data, CC-BY 3.0
  • 23.
    Pseudonymize/anonymize data beforestoring. Store identifiable data by stripping off identifiers. Store identifiers in a separate encrypted container. Encrypt identifiable data on portable devices, or encrypt the entire device. Give access to data only to authorized people. Keep identifiable data on a secure, backed-up central server and do not allow copies to proliferate. 23 Security safeguards (2)
  • 24.
    Generally - Data relatingto people (i.e. 'personal data' as defined in the General Data Protection Regulation). Data generated or used under a restrictive commercial research funding agreement. Any data posing a threat to others or to national security. Data relating to rare or endangered species of plants or animals. Any data likely to have significant negative public impact if released. 24 What do we mean by sensitive data?(3)
  • 25.
    Ethical and legalperspectives on research data Legal (GDPR) definitions, principles for research Strategies for  Data management plans (DMPs) and data protection impact assessments (DPIAs)  Data collection, consent, and transparency  Active data management and data security  Data sharing – anonymisation & controlling access References for further information 25 Working with Personal and Sensitive Data Overview (4)
  • 26.
    1. Research isin the public interest. 2. Consent to take part in research is important. 3. GDPR recognises that research data is valuable, it can be kept long-term. 4. GDPR forces a record of historical decision- making. 5. GDPR safeguards reflect current research good practice. 26 UKRI: Why GDPR matters for research (5) From https://blog.esrc.ac.uk/2018/05/25/why-gdpr-matters-for-research
  • 27.
    UoE Research DataWorkshop Series: Sensitive Data Challenges and Solutions (April, 2019) Researchers face a number of technical, ethical and legal challenges in creating, analysing and managing research data, including pressure to increase transparency and conduct research openly. But for those who have collected or are re-using sensitive or confidential data, these challenges can be particularly taxing. … Researchers attending this workshop will have the opportunity to hear from experienced researchers on related topics.
  • 28.
    A. Working withsensitive data/research  What does research involving various forms of sensitive data have in common, even across disciplines?  What do researchers need to learn about working with sensitive data or are they well prepared through disciplinary knowledge of research methods and ethics?  Does the GDPR / UK Data Protection Act 2018 change anything about current practices? B. Requirements from service providers  Do Data Safe Havens serve a useful purpose for University researchers?  What kind of risks are involved in doing this sort of research and how should the University be helping / be accountable? C. Cost recovery  Are funders willing to pay the costs necessary for working with sensitive data?  What should happen if research projects don’t have sufficient funds to cover costs of working with sensitive data? Sensitive data workshop: Breakout group questions for discussion
  • 29.
    QUESTIONS? Service home page:https://www.ed.ac.uk/is/research-data-service Edinburgh Research Data Blog: http://datablog.is.ed.ac.uk/ r.rice@ed.ac.uk @sparrowbarley

Editor's Notes

  • #27 “The General Data Protection Legislation and new Data Protection Act, which come into force in the UK, will enable greater accountability and transparency by those who process personal data. The new legislation, GDPR for short, offers enhanced rights to individuals whose data is being processed. In the context of research, GDPR has the potential to further benefit research and archiving, helping to improve trust and confidence between the public and universities, and between researchers and their participants.” “GDPR is useful for research, it recognises that research is special and largely conforms, allowing it certain privileges. It legalises much of the current good practice in research, placing people at the centre, something that has formed the cornerstone of ethical research for many years.”