Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Managing sensitive data at the Australian Data Archive

287 views

Published on

Dr Steven McEachern, Director, Australian Data Archive, presenting at the Managing and publishing sensitive data in the Social Sciences webinar on 29/3/17

FULL webinar recording: https://youtu.be/7wxfeHNfKiQ
Webinar description:
1) Dr Steve McEachern (Director, Aust Data Archive) Stevediscussed how the Australian Data Archive manages and publishes sensitive social science data.

More about ADA: -- The Australian Data Archive (ADA) provides a national service for the collection and preservation of digital research data and to make these data available for secondary analysis by academic researchers and other users. -- The ADA is comprised of seven sub-archives - Social Science, HIstorical, Indigenous, Longitudinal, Qualitative, Crime & Justice and International. -- ADA data is free of charge to all users -- The archive is managed by the ADA central office based in the ANU Centre for Social Research and Methods at the Australian National University (ANU).https://www.ada.edu.au/

Published in: Education
  • Be the first to comment

  • Be the first to like this

Managing sensitive data at the Australian Data Archive

  1. 1. Managing sensitive data at the Australian Data Archive “Making Data Social” webinar series 29 March 2017 Dr. Steven McEachern Director, Australian Data Archive ANU Centre for Social Research and Methods Australian National University
  2. 2. Overview • Sensitive data and the 5 Safes model • Access to sensitive data in Australia • Applying the 5 Safes model at ADA • Sensitive data and the data lifecycle
  3. 3. Sensitive data • “Sensitive data are data that can be used to identify anindividual, species, object, process,or locationthat introduces a risk of discrimination, harm, or unwanted attention.” – ANDS Guide on Publishing and Sharing Sensitive Data, p.7 – http://www.ands.org.au/__data/assets/pdf_file/0 010/489187/Sensitive-data.pdf
  4. 4. The 5 safes 1. Safe people: Can the researchers be trusted to do the right thing? 2. Safe projects: Is the data to be used for an appropriate purpose? 3. Safe settings: Is the environment in which the analysis takes place safe? 4. Safe data: Is the data appropriately protected? 5. Safe output: Is there a low risk of disclosure in research outputs? Desai, T., F. Ritchie and R. Welpton (2016) Five Safes: designing data access for research. Economics Working Paper Series 1601, University of the West of England. http://www2.uwe.ac.uk/faculties/BBS/Documents/1601.pdf
  5. 5. What do researchers expect? (or What is wanted? :-) • “We emphasize that direct access to micro-data is critical for success. Alternatives such as access to synthetic data or submission of computer programs to agency employees will not address the key problem of restoring US leadership with cutting-edge policy-relevant research.” • Card, Chetty, Feldstein and Saez, 2010 (emphasis in original) – http://rajchetty.com/chettyfiles/NSFdataaccess.pdf
  6. 6. What is expected? • “Here's what you need to do if you want an anonymised 1% sample of the US Census – Go to Google and type US Census 1% sample, click on link to the Census. – Download each of the state files from the FTP site and merge them yourself. Or just check things out for one of the states. Whatever you like. – Start mucking about to test whether your pet theory is plausible. • Here's what you need to do if you want an anoymised sample of the NZ Census, or a Confidentialised Unit Record File (CURF) of any of big Stats series: – Go to Stats NZ's site, here. – Follow the instructions below: …” • (Followed by several pages of instructions, Application Process, Assessment Criteria, Methods of Access, …) Eric Crampton, the New Zealand Initiative, Wellington, formerly University of Canterbury. http://offsettingbehaviour.blogspot.com.au/2015/10/curf-and- turf.html
  7. 7. Can we bridge depositor and user expectations? I think so. Consider Card et al. again: “We believe that five conditions must be satisfied to make a data access program sustainable and efficient: a) fair and open competition for data access based on scientific merit b) sufficient bandwidth to accommodate a large number of projects simultaneously c) inclusion of younger scholars and graduate students in the research teams that can access the data d) direct access to de-identified micro data through local statistical offices or, more preferably, secure remote connections e) systematic electronic monitoring to allow immediate disclosure of statistical results and prevent any disclosure of individual records”
  8. 8. Current models in Australia ABS: • Confidentialised Unit Record Files (CURFs) • RADL • ABSDL • TableBuilder ADA: • Confidentialised Unit Record Files (CURFs) Shared (often remote access) infrastructure: • AURIN • SURE (PHRN) • Data linkage facilities Ad hoc arrangements: • “Secure rooms” • Departmental arrangements
  9. 9. Applying the 5 Safes People Projects Settings Data Output CURFs Yes? Yes? Yes? YES YES TableBuilder No No YES YES YES RADL Yes? Yes? YES YES YES ABSDL Yes? Yes? YES YES YES ABS Remote Data Lab Yes Yes? YES YES YES ADA Yes? No No YES No AURIN No No YES YES Yes? SURE (PHRN) Yes Yes YES No? ??? Data Linkage facilities No? YES Yes? YES ??? Secure rooms Yes? Yes? YES No? ???
  10. 10. Australian experience • Safe data – Confidentialisation: ADA, ABS, DSS (HILDA, etc.) – Indirect access to data: TableBuilder, ADA • Safe settings – Aggregated data: TableBuilder, AURIN – Remote: RADL, ABS (Remote) Data Lab – Secure environments: ABS (On site) Data Lab, secure rooms
  11. 11. 5 safes: lesser emphasis on… • Safe outputs – Difficult to scale (e.g. data lab output reviews) – This is changing – e.g. TableBuilder is automated – But need to consider replication and reproducibility • Safe researchers and safe projects – Considered in most models, but not closely monitored – May be difficult to monitor? (Similar issues to the reporting of research outputs in universities) – Universities could (and do!!) provide imprimatur for their staff and students
  12. 12. Frameworks for research practice • There are existing Australian frameworks for researcher accountability and responsibilities: – the Australian Code for the Responsible Conduct of Research (ACRCR), which sets out institutional and researcher responsibilities for conduct of research – (Note that this is currently under review) – Human Research Ethics Committees (HREC) • Increasingly, professional and journal requirements for data sharing: – E.g. PLOS One, AEA, DA-RT (political science) – https://www.aeaweb.org/aer/data.php – http://journals.plos.org/plosone/s/data-availability
  13. 13. Relevant content from ACRCR • S.2: Management of research data and primary materials – E.g. 2.7 Maintain confidentiality of research data and primary materials – Researchers given access to confidential information must maintain that confidentiality. Primary materials and confidential research data must be kept in secure storage. Confidential information must only be used in ways agreed with those who provided it. Particular care must be exercised when confidential data are made available for discussion. • S.4: Publication and dissemination of research findings – E.g. 4.2.3 Institutions must ensure that the sponsors of research understand the importance of publication in research and do not delay publication beyond the time needed to protect intellectual property and other relevant interests. • S.9: Breaches of the Code and misconduct in research
  14. 14. ADA model • Safe data: data is anonymised (confidentialised) either prior to deposit or by ADA archivists • Safe people: virtually all data access is mediated, and users must be identified and provide contact and supervisor details • Safe projects: users provide a project description • Safe settings and safe outputs: NOT applied
  15. 15. ABS Remote Data Lab (virtual enclave) • Safe data: less of a focus – but the lab does not prohibit use of safe data practices – Risk: individual researchers can see individual records – BUT this assumes “unsafe” people (researchers) • Safe settings: Remote access environment hosted at ABS – Challenge: cost of establishing the system • Safe outputs: outputs limited only to methods approved through the access environment (i.e. no printing) – Risk: photographing the screen, taking notes – Again, assumes “unsafe” people – Challenge: managing output checking
  16. 16. • Safe people: – Institutional support • Note ACRCR – code of conduct, and HREC – Training for researchers prior to access • Intended breaches are uncommon (per background paper) • Focus therefore on unintended breach • Highlight also alternate access options to reduce breach due to limitations of access method – Assessing people: research background? • Experience is difficult to evaluate here • How would you build up a track record? • Safe projects – May be necessary for legislative reasons – Should this matter? • Basic research might generate just as useful insights
  17. 17. A suite of options • Different existing models (each a mix of the 5 safes) all have their place • Safe people can be incorporated into existing models – Many current models assume the “intruder” – International evidence suggests this is not the case • ADA has 2 “default” options
  18. 18. The principles should enable the right mix of “safes” for a given data source Source: http://www.shinyshiny.tv/2009/12/easymix_-_a_mix.html

×