Managing sensitive data in your repository

Managing sensitive data in your
repository
Natasha Simons
Sharing Health-y and Sensitive Data: Challenges and Solutions Workshop
Perth 3 September 2015

What is a data repository?
1
A research data repository is a
managed environment capable of
storing and sharing (largely)
digital data. The data repository
supports the process of curating,
preserving, and sharing research
data.

What kinds of data repositories are there?
2

Are repositories for open data only?
3
Yes and no….because it depends on the purpose/scope
Repositories can support data that is:
1. Open access only
2. Mediated access only
3. Closed/private only
Most data repositories are a combination of 1 & 2

Are there health data repositories?
4
Yes, many!
http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html

What’s the point of data repositories?
5
Data repositories assist researchers and
the research community to:
1. Support data sharing, data discovery &
reuse, data preservation
2. Comply with publisher requirements
3. Comply with funder requirements
4. Comply with institutional or govt policy
requirements
5. Support institutional goals Illustration credit: Ainsley Seago. doi:10.1371/journal.pbio.1001779.g001

Can sensitive data be managed in a repository?
6
Yes!
Ask:
• Can the raw data be (de-identified and)
made completely open? Or will access be
restricted? Mediated?
• What licence should be applied to enable
data reuse?
• What metadata elements, links (e.g. to
publications) and identifiers (e.g. DOIs,
ORCIDs) will aid discovery and reuse of the
data? Source: http://www.slideshare.net/WLSA_ORG/wh2014-workshop-health-data-consortium

Can sensitive data be managed in a repository?
7
Also ask:
• Can a citation element be added to
support attribution and reuse
tracking?
• Who/what will be the method of
contact for the data?
• Are there other conditions that the
data is subject to e.g. release subject
to an embargo period?

Examples of sensitive data in repositories?
8

9

10

11

12

What’s really challenging?
14
“Having longitudinal data on individuals is a part of many observational designs, and is
needed for research into outcomes, efficacy and many mechanistic studies. Most
repositories thus have longitudinal observations. To build such a database you need some
way to link observations on the same identified person. Therefore most repositories contain
personally identified data, but, because of privacy concerns, they often release only de-
identified data. Difficulties in the de-identification process can cause some data to be
omitted in a dataset. A lack of direct identifiers in a data collection or federation could
prevent linking of data for some patients.
From: Wade, T. Traits and Types of Health Data Repositories. Health Information Science
and Systems 2014, 2:4 doi:10.1186/2047-2501-2-4
http://www.hissjournal.com/content/2/1/4

Small group exercise
15
Discovering sensitive health data in repositories
Small group exercise

Acknowledgement
Australian National Data Service is funded by
the Commonwealth under the NCRIS Program
31 August, 2015 16

Managing sensitive data in your repository

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Managing sensitive data in your repository

Similar to Managing sensitive data in your repository (20)

More from ARDC

More from ARDC (20)

Recently uploaded

Recently uploaded (20)

Managing sensitive data in your repository