Managing sensitive data at the Australian Data Archive

Managing sensitive data at the
Australian Data Archive
“Making Data Social” webinar series
29 March 2017
Dr. Steven McEachern
Director, Australian Data Archive
ANU Centre for Social Research and Methods
Australian National University

Overview
• Sensitive data and the 5 Safes model
• Access to sensitive data in Australia
• Applying the 5 Safes model at ADA
• Sensitive data and the data lifecycle

Sensitive data
• “Sensitive data are data that can be
used to identify anindividual, species, object,
process,or locationthat introduces a risk
of discrimination, harm, or unwanted
attention.”
– ANDS Guide on Publishing and Sharing Sensitive
Data, p.7
– http://www.ands.org.au/__data/assets/pdf_file/0
010/489187/Sensitive-data.pdf

The 5 safes
1. Safe people: Can the researchers be trusted to do the
right thing?
2. Safe projects: Is the data to be used for an appropriate
purpose?
3. Safe settings: Is the environment in which the analysis
takes place safe?
4. Safe data: Is the data appropriately protected?
5. Safe output: Is there a low risk of disclosure in research
outputs?
Desai, T., F. Ritchie and R. Welpton (2016) Five Safes: designing data access for
research. Economics Working Paper Series
1601, University of the West of England.
http://www2.uwe.ac.uk/faculties/BBS/Documents/1601.pdf

What do researchers expect?
(or What is wanted? :-)
• “We emphasize that direct access to micro-data
is critical for success. Alternatives such as access
to synthetic data or submission of computer
programs to agency employees will not address
the key problem of restoring US leadership with
cutting-edge policy-relevant research.”
• Card, Chetty, Feldstein and Saez, 2010 (emphasis
in original)
– http://rajchetty.com/chettyfiles/NSFdataaccess.pdf

What is expected?
• “Here's what you need to do if you want an anonymised 1% sample
of the US Census
– Go to Google and type US Census 1% sample, click on link to the
Census.
– Download each of the state files from the FTP site and merge them
yourself. Or just check things out for one of the states. Whatever you
like.
– Start mucking about to test whether your pet theory is plausible.
• Here's what you need to do if you want an anoymised sample of
the NZ Census, or a Confidentialised Unit Record File (CURF) of any
of big Stats series:
– Go to Stats NZ's site, here.
– Follow the instructions below: …”
• (Followed by several pages of instructions, Application Process,
Assessment Criteria, Methods of Access, …)
Eric Crampton, the New Zealand Initiative, Wellington, formerly University of
Canterbury. http://offsettingbehaviour.blogspot.com.au/2015/10/curf-and-
turf.html

Can we bridge depositor and user
expectations?
I think so. Consider Card et al. again:
“We believe that five conditions must be satisfied to make a data
access program sustainable and efficient:
a) fair and open competition for data access based on scientific merit
b) sufficient bandwidth to accommodate a large number of projects
simultaneously
c) inclusion of younger scholars and graduate students in the
research teams that can access the data
d) direct access to de-identified micro data through local statistical
offices or, more preferably, secure remote connections
e) systematic electronic monitoring to allow immediate disclosure
of statistical results and prevent any disclosure of individual
records”

Current models in Australia
ABS:
• Confidentialised Unit
Record Files (CURFs)
• RADL
• ABSDL
• TableBuilder
ADA:
• Confidentialised Unit
Record Files (CURFs)
Shared (often remote
access) infrastructure:
• AURIN
• SURE (PHRN)
• Data linkage facilities
Ad hoc arrangements:
• “Secure rooms”
• Departmental
arrangements

Applying the 5 Safes
People Projects Settings Data Output
CURFs Yes? Yes? Yes? YES YES
TableBuilder No No YES YES YES
RADL Yes? Yes? YES YES YES
ABSDL Yes? Yes? YES YES YES
ABS Remote
Data Lab Yes Yes? YES YES YES
ADA Yes? No No YES No
AURIN No No YES YES Yes?
SURE (PHRN) Yes Yes YES No? ???
Data Linkage
facilities No? YES Yes? YES ???
Secure rooms Yes? Yes? YES No? ???

Australian experience
• Safe data
– Confidentialisation: ADA, ABS, DSS (HILDA, etc.)
– Indirect access to data: TableBuilder, ADA
• Safe settings
– Aggregated data: TableBuilder, AURIN
– Remote: RADL, ABS (Remote) Data Lab
– Secure environments: ABS (On site) Data Lab,
secure rooms

5 safes: lesser emphasis on…
• Safe outputs
– Difficult to scale (e.g. data lab output reviews)
– This is changing – e.g. TableBuilder is automated
– But need to consider replication and reproducibility
• Safe researchers and safe projects
– Considered in most models, but not closely monitored
– May be difficult to monitor? (Similar issues to the
reporting of research outputs in universities)
– Universities could (and do!!) provide imprimatur for
their staff and students

Frameworks for research practice
• There are existing Australian frameworks for
researcher accountability and responsibilities:
– the Australian Code for the Responsible Conduct of
Research (ACRCR), which sets out institutional and
researcher responsibilities for conduct of research
– (Note that this is currently under review)
– Human Research Ethics Committees (HREC)
• Increasingly, professional and journal
requirements for data sharing:
– E.g. PLOS One, AEA, DA-RT (political science)
– https://www.aeaweb.org/aer/data.php
– http://journals.plos.org/plosone/s/data-availability

Relevant content from ACRCR
• S.2: Management of research data and primary materials
– E.g. 2.7 Maintain confidentiality of research data and primary
materials
– Researchers given access to confidential information must
maintain that confidentiality. Primary materials and confidential
research data must be kept in secure storage. Confidential
information must only be used in ways agreed with those who
provided it. Particular care must be exercised when confidential
data are made available for discussion.
• S.4: Publication and dissemination of research findings
– E.g. 4.2.3 Institutions must ensure that the sponsors of research
understand the importance of publication in research and do
not delay publication beyond the time needed to protect
intellectual property and other relevant interests.
• S.9: Breaches of the Code and misconduct in research

ADA model
• Safe data: data is anonymised
(confidentialised) either prior to deposit or by
ADA archivists
• Safe people: virtually all data access is
mediated, and users must be identified and
provide contact and supervisor details
• Safe projects: users provide a project
description
• Safe settings and safe outputs: NOT applied

ABS Remote Data Lab (virtual enclave)
• Safe data: less of a focus – but the lab does not prohibit
use of safe data practices
– Risk: individual researchers can see individual records
– BUT this assumes “unsafe” people (researchers)
• Safe settings: Remote access environment hosted at
ABS
– Challenge: cost of establishing the system
• Safe outputs: outputs limited only to methods
approved through the access environment (i.e. no
printing)
– Risk: photographing the screen, taking notes
– Again, assumes “unsafe” people
– Challenge: managing output checking

• Safe people:
– Institutional support
• Note ACRCR – code of conduct, and HREC
– Training for researchers prior to access
• Intended breaches are uncommon (per background paper)
• Focus therefore on unintended breach
• Highlight also alternate access options to reduce breach due
to limitations of access method
– Assessing people: research background?
• Experience is difficult to evaluate here
• How would you build up a track record?
• Safe projects
– May be necessary for legislative reasons
– Should this matter?
• Basic research might generate just as useful insights

A suite of options
• Different existing models (each a mix of the 5
safes) all have their place
• Safe people can be incorporated into existing
models
– Many current models assume the “intruder”
– International evidence suggests this is not the case
• ADA has 2 “default” options

The principles should enable the right
mix of “safes” for a given data source
Source: http://www.shinyshiny.tv/2009/12/easymix_-_a_mix.html

Managing sensitive data at the Australian Data Archive

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Managing sensitive data at the Australian Data Archive

Similar to Managing sensitive data at the Australian Data Archive (20)

More from ARDC

More from ARDC (20)

Recently uploaded

Recently uploaded (20)

Managing sensitive data at the Australian Data Archive