Dr Steven McEachern, Director, Australian Data Archive, presenting at the Managing and publishing sensitive data in the Social Sciences webinar on 29/3/17
FULL webinar recording: https://youtu.be/7wxfeHNfKiQ
Webinar description:
1) Dr Steve McEachern (Director, Aust Data Archive) Stevediscussed how the Australian Data Archive manages and publishes sensitive social science data.
More about ADA: -- The Australian Data Archive (ADA) provides a national service for the collection and preservation of digital research data and to make these data available for secondary analysis by academic researchers and other users. -- The ADA is comprised of seven sub-archives - Social Science, HIstorical, Indigenous, Longitudinal, Qualitative, Crime & Justice and International. -- ADA data is free of charge to all users -- The archive is managed by the ADA central office based in the ANU Centre for Social Research and Methods at the Australian National University (ANU).https://www.ada.edu.au/
social pharmacy d-pharm 1st year by Pragati K. Mahajan
Managing sensitive data at the Australian Data Archive
1. Managing sensitive data at the
Australian Data Archive
“Making Data Social” webinar series
29 March 2017
Dr. Steven McEachern
Director, Australian Data Archive
ANU Centre for Social Research and Methods
Australian National University
2. Overview
• Sensitive data and the 5 Safes model
• Access to sensitive data in Australia
• Applying the 5 Safes model at ADA
• Sensitive data and the data lifecycle
3. Sensitive data
• “Sensitive data are data that can be
used to identify anindividual, species, object,
process,or locationthat introduces a risk
of discrimination, harm, or unwanted
attention.”
– ANDS Guide on Publishing and Sharing Sensitive
Data, p.7
– http://www.ands.org.au/__data/assets/pdf_file/0
010/489187/Sensitive-data.pdf
4. The 5 safes
1. Safe people: Can the researchers be trusted to do the
right thing?
2. Safe projects: Is the data to be used for an appropriate
purpose?
3. Safe settings: Is the environment in which the analysis
takes place safe?
4. Safe data: Is the data appropriately protected?
5. Safe output: Is there a low risk of disclosure in research
outputs?
Desai, T., F. Ritchie and R. Welpton (2016) Five Safes: designing data access for
research. Economics Working Paper Series
1601, University of the West of England.
http://www2.uwe.ac.uk/faculties/BBS/Documents/1601.pdf
5. What do researchers expect?
(or What is wanted? :-)
• “We emphasize that direct access to micro-data
is critical for success. Alternatives such as access
to synthetic data or submission of computer
programs to agency employees will not address
the key problem of restoring US leadership with
cutting-edge policy-relevant research.”
• Card, Chetty, Feldstein and Saez, 2010 (emphasis
in original)
– http://rajchetty.com/chettyfiles/NSFdataaccess.pdf
6. What is expected?
• “Here's what you need to do if you want an anonymised 1% sample
of the US Census
– Go to Google and type US Census 1% sample, click on link to the
Census.
– Download each of the state files from the FTP site and merge them
yourself. Or just check things out for one of the states. Whatever you
like.
– Start mucking about to test whether your pet theory is plausible.
• Here's what you need to do if you want an anoymised sample of
the NZ Census, or a Confidentialised Unit Record File (CURF) of any
of big Stats series:
– Go to Stats NZ's site, here.
– Follow the instructions below: …”
• (Followed by several pages of instructions, Application Process,
Assessment Criteria, Methods of Access, …)
Eric Crampton, the New Zealand Initiative, Wellington, formerly University of
Canterbury. http://offsettingbehaviour.blogspot.com.au/2015/10/curf-and-
turf.html
7. Can we bridge depositor and user
expectations?
I think so. Consider Card et al. again:
“We believe that five conditions must be satisfied to make a data
access program sustainable and efficient:
a) fair and open competition for data access based on scientific merit
b) sufficient bandwidth to accommodate a large number of projects
simultaneously
c) inclusion of younger scholars and graduate students in the
research teams that can access the data
d) direct access to de-identified micro data through local statistical
offices or, more preferably, secure remote connections
e) systematic electronic monitoring to allow immediate disclosure
of statistical results and prevent any disclosure of individual
records”
8. Current models in Australia
ABS:
• Confidentialised Unit
Record Files (CURFs)
• RADL
• ABSDL
• TableBuilder
ADA:
• Confidentialised Unit
Record Files (CURFs)
Shared (often remote
access) infrastructure:
• AURIN
• SURE (PHRN)
• Data linkage facilities
Ad hoc arrangements:
• “Secure rooms”
• Departmental
arrangements
9. Applying the 5 Safes
People Projects Settings Data Output
CURFs Yes? Yes? Yes? YES YES
TableBuilder No No YES YES YES
RADL Yes? Yes? YES YES YES
ABSDL Yes? Yes? YES YES YES
ABS Remote
Data Lab Yes Yes? YES YES YES
ADA Yes? No No YES No
AURIN No No YES YES Yes?
SURE (PHRN) Yes Yes YES No? ???
Data Linkage
facilities No? YES Yes? YES ???
Secure rooms Yes? Yes? YES No? ???
10. Australian experience
• Safe data
– Confidentialisation: ADA, ABS, DSS (HILDA, etc.)
– Indirect access to data: TableBuilder, ADA
• Safe settings
– Aggregated data: TableBuilder, AURIN
– Remote: RADL, ABS (Remote) Data Lab
– Secure environments: ABS (On site) Data Lab,
secure rooms
11. 5 safes: lesser emphasis on…
• Safe outputs
– Difficult to scale (e.g. data lab output reviews)
– This is changing – e.g. TableBuilder is automated
– But need to consider replication and reproducibility
• Safe researchers and safe projects
– Considered in most models, but not closely monitored
– May be difficult to monitor? (Similar issues to the
reporting of research outputs in universities)
– Universities could (and do!!) provide imprimatur for
their staff and students
12. Frameworks for research practice
• There are existing Australian frameworks for
researcher accountability and responsibilities:
– the Australian Code for the Responsible Conduct of
Research (ACRCR), which sets out institutional and
researcher responsibilities for conduct of research
– (Note that this is currently under review)
– Human Research Ethics Committees (HREC)
• Increasingly, professional and journal
requirements for data sharing:
– E.g. PLOS One, AEA, DA-RT (political science)
– https://www.aeaweb.org/aer/data.php
– http://journals.plos.org/plosone/s/data-availability
13. Relevant content from ACRCR
• S.2: Management of research data and primary materials
– E.g. 2.7 Maintain confidentiality of research data and primary
materials
– Researchers given access to confidential information must
maintain that confidentiality. Primary materials and confidential
research data must be kept in secure storage. Confidential
information must only be used in ways agreed with those who
provided it. Particular care must be exercised when confidential
data are made available for discussion.
• S.4: Publication and dissemination of research findings
– E.g. 4.2.3 Institutions must ensure that the sponsors of research
understand the importance of publication in research and do
not delay publication beyond the time needed to protect
intellectual property and other relevant interests.
• S.9: Breaches of the Code and misconduct in research
14. ADA model
• Safe data: data is anonymised
(confidentialised) either prior to deposit or by
ADA archivists
• Safe people: virtually all data access is
mediated, and users must be identified and
provide contact and supervisor details
• Safe projects: users provide a project
description
• Safe settings and safe outputs: NOT applied
15. ABS Remote Data Lab (virtual enclave)
• Safe data: less of a focus – but the lab does not prohibit
use of safe data practices
– Risk: individual researchers can see individual records
– BUT this assumes “unsafe” people (researchers)
• Safe settings: Remote access environment hosted at
ABS
– Challenge: cost of establishing the system
• Safe outputs: outputs limited only to methods
approved through the access environment (i.e. no
printing)
– Risk: photographing the screen, taking notes
– Again, assumes “unsafe” people
– Challenge: managing output checking
16. • Safe people:
– Institutional support
• Note ACRCR – code of conduct, and HREC
– Training for researchers prior to access
• Intended breaches are uncommon (per background paper)
• Focus therefore on unintended breach
• Highlight also alternate access options to reduce breach due
to limitations of access method
– Assessing people: research background?
• Experience is difficult to evaluate here
• How would you build up a track record?
• Safe projects
– May be necessary for legislative reasons
– Should this matter?
• Basic research might generate just as useful insights
17. A suite of options
• Different existing models (each a mix of the 5
safes) all have their place
• Safe people can be incorporated into existing
models
– Many current models assume the “intruder”
– International evidence suggests this is not the case
• ADA has 2 “default” options
18. The principles should enable the right
mix of “safes” for a given data source
Source: http://www.shinyshiny.tv/2009/12/easymix_-_a_mix.html