Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

risks and mitigations of releasing data


Published on

Intro given to a 2015 Responsible Data Forum seminar on development data risks

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

risks and mitigations of releasing data

  1. 1. Risks and mitigations of releasing data Risk analysis and complexity in de-identifying and releasing data. Sara-Jayne Terp RDF Discussion
  2. 2. First, Do No Harm “If you make a dataset public, you have a responsibility, to the best of your knowledge, skills, and advice, to do no harm to the people connected to that dataset. You balance making data available to people who can do good with it and protecting the data subjects, sources, and managers.” 2
  3. 3. What is risk? What is the risk here? 3
  4. 4. RISK “The probability of something happening multiplied by the resulting cost or benefit if it does” (Oxford English Dictionary) Three parts: •Cost/benefit •Probability •Subject (to what/whom) 4
  5. 5. Subjects: Physical 5 “Witnesses told us that a helicopter had been circling around the area for hours by the time the bakery opened in the afternoon. It had, perhaps, 200 people lined up to get bread. Suddenly, the helicopter dropped a bomb that hit a building
  6. 6. Subjects: Reputational 6
  7. 7. Subjects: Physical 7
  8. 8. Collectors: Physical 8
  9. 9. Processors: Legal 9
  10. 10. Risk OF What? • Physical harm • Legal harm (e.g. jail, IP disputes) • Reputational harm • Privacy breach 10
  11. 11. Risk to Whom? • Data subjects (elections example) • Data collectors (conflict example) • Data processing team (military equipment example) • Person releasing the data (corruption example) • Person using the data 11
  12. 12. Likelihood of Risk Low Medium High 12
  13. 13. piI How I handle it 13
  14. 14. PII “Personally identifiable information (PII) is any data that could potentially identify a specific individual. Any information that can be used to distinguish one person from another and can be used for de- anonymizing anonymous data can be considered PII.” 14
  15. 15. Learn to spot Red Flags • Names, addresses, phone numbers • Locations: lat/long, GIS traces, locality (e.g. home + work as an identifier) • Members of small populations • Untranslated text • Codes (e.g. “41”) • Slang terms • Can be combined with other datasets to produce PII 15
  16. 16. Consider Partial Release Release to only some groups • Academics • People in your organisation • Data subjects Release at lower granularity • Town/district level, not street • Subset or sample of data ‘rows’ • Subset of data ‘columns’ 16
  17. 17. Include locals Locals can spot: •Local languages •Local slang •Innocent-looking phrases Locals might also choose the risk 17
  18. 18. Consider Interactions Between Datasets 18
  19. 19. Learn From Experts Over to you… 19
  20. 20. THANK YOU For questions or suggestions: Responsible Data Forum For questions or suggestions: Responsible Data Forum