Successfully reported this slideshow.
Your SlideShare is downloading. ×

Overcoming obstacles to sharing data about human subjects

Loading in …3

Check these out next

1 of 16 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (20)


Similar to Overcoming obstacles to sharing data about human subjects (20)

More from Robin Rice (20)


Recently uploaded (20)

Overcoming obstacles to sharing data about human subjects

  1. 1. Overcoming obstacles to sharing data about human subjects Force11 conference Portland, Oregon 18 April, 2016 Robin Rice EDINA and Data Library University of Edinburgh, UK
  2. 2. The elephant in the room (David Blackwell on Flickr)
  3. 3. The status quo  Most data underlying published research, even publicly funded research, are not shared. How can research claims be verified?  Common barriers are well-known, confidentiality concerns are high  Qualitative research data and small-scale surveys are not commonly re-used  Tendency is to err on side of caution, given legal & ethical responsibilities  As open science agenda pushes disciplines toward reproducibility, there is a danger of human subject-oriented research falling behind
  4. 4. Redressing the imbalance Caution vs open data sharing (Seesaw by harmishhk on Flickr)
  5. 5. What a researcher can do to be able to share  Plan for sharing (via a data management plan)  Don’t collect personal information that is not needed  Principle of informed consent: get consent to share data  Document all data processing (inside & outside analysis package)  Attribute, anonymise, or aggregate individual’s data
  6. 6. Anonymise it! (by Greendoula on Flickr)
  7. 7. How to create an anonymised, open dataset Numeric data, eg. surveys Qualitative data, eg. interviews Remove names and identifiers Share the edited transcript, not video or audio unless consented Renumber and resort case ids Agree a pseudonym with each subject Group numbers into categories - banding Remind subject not to disclose personal or sensitive information, eg. about family members Top and bottom code numbers (age, salaries) Replace proper nouns in text (names, placenames etc.) using square brackets, don’t blank out Use standard codes (eg. SOC, SIC) and geographic boundaries at appropriate levels; not fine-grained Avoid over-anonymising or data will lose value Check for low cell counts in cross-tabs Keep a log of all replacements, generalisations or removals made; store separately from anonymised data
  8. 8. Restrict access, if necessary (James Emery on Flickr)
  9. 9. When open data access is not plausible  When potential for harm to research subjects is too great  Information that can be used to discriminate requires extra protection  When required by the data producer, funder, health authority, etc.  Sometimes precautions are required even for anonymised data  When anonymization is either not feasible or would destroy value of dataset  Population too small to be anonymous, e.g. those with genetic condition
  10. 10. Lock it up to keep safe (Eric Parker on Flickr)
  11. 11. Take proportionate precautions; ease route to access  Make documentation and/or code about dataset openly available  Use a template for a data access application & data use agreement  Make arrangements for unbiased review of applications for access  Transfer data safely; use secure channels, encryption  Consider options for remote access in favour of on-site only access
  12. 12. The dangers of data linkage
  13. 13. Data linkage  Probabilistic or ‘fuzzy’ matching is one method used to identify individuals by combining information from different datasets  This can be done for legitimate research purposes, such as matching cases in different government (administrative) datasets  Informed consent is normally impossible for this technique; the data were collected for a different purpose than the current research proposal
  14. 14. Information governance to the rescue (Ryan Stevens on Flickr)
  15. 15. Information governance  Requires a bigger infrastructure than one researcher can create  Has been developed to meet ethical standards where informed consent is not possible and research is in the public interest  Allowed by current European Data Directive and new regulation forthcoming  Makes use of the ‘five safes’  safe data, safe researcher, safe project, safe settings, safe outputs
  16. 16. Check out our free educational resources - ( Research Data Management Training MANTRA Research Data Management & Sharing MOOC management

Editor's Notes

  • Information that can be used to discriminate requires extra protection:
    Racial or ethnic origin
    Political opinions
    Membership of a political association
    Religious beliefs or affiliations
    Membership of a professional or trade association
    Membership of a trade union
    Sexual preferences or practices
    Criminal record
    Health and genetic information
    Just four bits of information gleaned from a shopper's credit card can be used to identify almost anyone, researchers have found.
    The study in the journal Science analysed three months of credit card records for 1.1m people in an unidentified industrialised country.
    Ninety percent of individuals could be uniquely identified using just four pieces of information, such as where they bought coffee one day or where they purchased a new jumper or pair of shoes.
    In other words, credit cards use was just as reliable at identifying someone as mobile phone records, the study found.
    Knowing the price of a transaction could boost the risk of re-identification by 22pc.