Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Overcoming obstacles to sharing data about human subjects


Published on

Force11 2016 conference presentation, Portland Oregon 18 April 2016

Published in: Education
  • Be the first to comment

Overcoming obstacles to sharing data about human subjects

  1. 1. Overcoming obstacles to sharing data about human subjects Force11 conference Portland, Oregon 18 April, 2016 Robin Rice EDINA and Data Library University of Edinburgh, UK
  2. 2. The elephant in the room (David Blackwell on Flickr)
  3. 3. The status quo  Most data underlying published research, even publicly funded research, are not shared. How can research claims be verified?  Common barriers are well-known, confidentiality concerns are high  Qualitative research data and small-scale surveys are not commonly re-used  Tendency is to err on side of caution, given legal & ethical responsibilities  As open science agenda pushes disciplines toward reproducibility, there is a danger of human subject-oriented research falling behind
  4. 4. Redressing the imbalance Caution vs open data sharing (Seesaw by harmishhk on Flickr)
  5. 5. What a researcher can do to be able to share  Plan for sharing (via a data management plan)  Don’t collect personal information that is not needed  Principle of informed consent: get consent to share data  Document all data processing (inside & outside analysis package)  Attribute, anonymise, or aggregate individual’s data
  6. 6. Anonymise it! (by Greendoula on Flickr)
  7. 7. How to create an anonymised, open dataset Numeric data, eg. surveys Qualitative data, eg. interviews Remove names and identifiers Share the edited transcript, not video or audio unless consented Renumber and resort case ids Agree a pseudonym with each subject Group numbers into categories - banding Remind subject not to disclose personal or sensitive information, eg. about family members Top and bottom code numbers (age, salaries) Replace proper nouns in text (names, placenames etc.) using square brackets, don’t blank out Use standard codes (eg. SOC, SIC) and geographic boundaries at appropriate levels; not fine-grained Avoid over-anonymising or data will lose value Check for low cell counts in cross-tabs Keep a log of all replacements, generalisations or removals made; store separately from anonymised data
  8. 8. Restrict access, if necessary (James Emery on Flickr)
  9. 9. When open data access is not plausible  When potential for harm to research subjects is too great  Information that can be used to discriminate requires extra protection  When required by the data producer, funder, health authority, etc.  Sometimes precautions are required even for anonymised data  When anonymization is either not feasible or would destroy value of dataset  Population too small to be anonymous, e.g. those with genetic condition
  10. 10. Lock it up to keep safe (Eric Parker on Flickr)
  11. 11. Take proportionate precautions; ease route to access  Make documentation and/or code about dataset openly available  Use a template for a data access application & data use agreement  Make arrangements for unbiased review of applications for access  Transfer data safely; use secure channels, encryption  Consider options for remote access in favour of on-site only access
  12. 12. The dangers of data linkage
  13. 13. Data linkage  Probabilistic or ‘fuzzy’ matching is one method used to identify individuals by combining information from different datasets  This can be done for legitimate research purposes, such as matching cases in different government (administrative) datasets  Informed consent is normally impossible for this technique; the data were collected for a different purpose than the current research proposal
  14. 14. Information governance to the rescue (Ryan Stevens on Flickr)
  15. 15. Information governance  Requires a bigger infrastructure than one researcher can create  Has been developed to meet ethical standards where informed consent is not possible and research is in the public interest  Allowed by current European Data Directive and new regulation forthcoming  Makes use of the ‘five safes’  safe data, safe researcher, safe project, safe settings, safe outputs
  16. 16. Check out our free educational resources - ( Research Data Management Training MANTRA Research Data Management & Sharing MOOC management