Risk Based De-identification for Sharing Health Data

1,332 views

Published on

This presentation describes a methodology, tools, and experiences for the de-identification of health information. The objective is to support data sharing for the purpose of research and public health.

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,332
On SlideShare
0
From Embeds
0
Number of Embeds
85
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Risk Based De-identification for Sharing Health Data

  1. 1. Risk-based De-identification Khaled El Emam, CHEO RI & uOttawa
  2. 2. <ul><li>Re-identification risk assessment, re-identification attacks, de-identification: </li></ul><ul><ul><li>Birth registry / newborn screening program </li></ul></ul><ul><ul><li>Tumor bank </li></ul></ul><ul><ul><li>Hospital data (discharge abstracts and pharmacy databases) – local, provincial/state, national </li></ul></ul><ul><ul><li>EMR data </li></ul></ul>Background
  3. 3. <ul><li>De-identification works well in practice if you adopt a risk-based approach </li></ul><ul><li>Re-identification attacks are hard </li></ul><ul><li>It is possible to de-identify data sets and still retain sufficient utility </li></ul><ul><li>De-identification can be made simple </li></ul>Issues
  4. 4. Re-identification Risk Spectrum
  5. 6. Managing Re-identification Risk
  6. 7. Determining Pr Re-identification Attempts
  7. 8. Determining Risk Threshold to Use
  8. 9. <ul><li>Adjust threshold </li></ul><ul><li>Adjust amount of suppression that is acceptable </li></ul><ul><li>Adjust precision of variables </li></ul><ul><li>Sub-sample </li></ul><ul><li>Adjust variable weights </li></ul>Tradeoffs Made
  9. 10. <ul><li>Passage through research ethics is significantly faster for “secondary use” protocols that are certified as low risk </li></ul><ul><li>Provides an incentive for data recipients to improve their security and privacy practices </li></ul><ul><li>Provides an incentive for funders to cover the costs of infrastructure for handling data </li></ul><ul><li>Amount of de-identification is proportionate to the actual risk </li></ul>Advantages
  10. 11. Risk Assessment
  11. 12. De-identification
  12. 13. Risk Assessment for REB
  13. 14. Risk Assessment for REB
  14. 15. Risk Assessment for REB
  15. 16. <ul><li>‘ Rogue researcher’ adversary </li></ul><ul><li>Search queries considered high risk </li></ul><ul><li>Combination of sub-sampling and generalization for each tumor site data </li></ul><ul><li>Moving towards researcher self-assessments to decide appropriate level of de-identification </li></ul>Example – Tumor Bank
  16. 17. <ul><li>‘ Nosey neighbor’ adversary </li></ul><ul><li>Creation of a public data file </li></ul><ul><li>Diagnosis and intervention codes presented difficulties </li></ul><ul><li>High level of suppression for a public file, but acceptable utility with stronger access controls (higher threshold) </li></ul>Example – Discharge Abstracts
  17. 18. <ul><li>An audit program is required to ensure compliance with ‘mitigating controls’ </li></ul><ul><li>What if a breach happens ? </li></ul><ul><ul><li>A risk management approach ensures that the data is highly de-identified in situations where breaches are most likely </li></ul></ul><ul><ul><li>Can demonstrate due diligence </li></ul></ul>Practical Considerations
  18. 19. <ul><li>Geospatial data and longitudinal data always represent challenges because they increase the risk of re-identification </li></ul><ul><li>Thus far we’ve never had to decline a data request because of identifiability or were unable to provide data with sufficient utility for a study </li></ul>Lessons Learned
  19. 20. www.ehealthinformation.ca www.ehealthinformation.ca/knowledgebase kelemam@uottawa.ca

×