risks and mitigations of releasing data

Risks and mitigations of
releasing data
Risk analysis and complexity
in de-identifying and
releasing data.
Sara-Jayne Terp
RDF Discussion

First, Do No Harm
“If you make a dataset public, you
have a responsibility, to the best of
your knowledge, skills, and advice, to do
no harm to the people connected to that dataset. You
balance making data available
to people who can do good with
it and protecting the data
subjects, sources, and
managers.”
2

What is risk?
What is the risk here?
3

RISK
“The probability of something happening
multiplied by the resulting cost or benefit
if it does” (Oxford English Dictionary)
Three parts:
•Cost/benefit
•Probability
•Subject (to what/whom)
4

Subjects: Physical
5
“Witnesses told us that
a helicopter had been
circling around the
area for hours by the
time the bakery opened
in the afternoon. It
had, perhaps, 200
people lined up to get
bread. Suddenly, the
helicopter dropped a
bomb that hit a building

Risk OF What?
• Physical harm
• Legal harm (e.g. jail, IP disputes)
• Reputational harm
• Privacy breach
10

Risk to Whom?
• Data subjects (elections example)
• Data collectors (conflict example)
• Data processing team (military equipment example)
• Person releasing the data (corruption example)
• Person using the data
11

Likelihood of Risk
Low
Medium
High
12

PII
“Personally identifiable information (PII) is any data that
could potentially identify a specific individual. Any
information that can be used to distinguish one
person from another and can be used for de-
anonymizing anonymous data can be
considered PII.”
14

Learn to spot Red Flags
• Names, addresses, phone numbers
• Locations: lat/long, GIS traces, locality (e.g. home +
work as an identifier)
• Members of small populations
• Untranslated text
• Codes (e.g. “41”)
• Slang terms
• Can be combined with other datasets to produce
PII
15

Consider Partial Release
Release to only some groups
• Academics
• People in your organisation
• Data subjects
Release at lower granularity
• Town/district level, not street
• Subset or sample of data ‘rows’
• Subset of data ‘columns’
16

Include locals
Locals can spot:
•Local languages
•Local slang
•Innocent-looking phrases
Locals might also choose the risk
17

Consider Interactions Between Datasets
18

Learn From Experts
Over to you…
19

THANK YOU
For questions or
suggestions:
Responsible Data Forum
For questions or
suggestions:
Responsible Data Forum

risks and mitigations of releasing data

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to risks and mitigations of releasing data

Similar to risks and mitigations of releasing data (20)

More from Sara-Jayne Terp

More from Sara-Jayne Terp (20)

Recently uploaded

Recently uploaded (20)

risks and mitigations of releasing data