Data ethics

Data Ethics
Mathieu d’Aquin - @mdaquin
Data Science Institute
National University of Ireland Galway
Insight Centre for Data Analytics

Data Ethics
The set of principles and processes that guide the ethical
collection, processing, analysis, use and application of data having
an effect on human lives and society
d’Aquin et al, Towards an “Ethics in Design” methodology for AI research projects, in AIES 2018

Ethics
What is right, what is fair, what is just.
Hosmer, L. T. (1995). "Trust: The Connecting Link between Organizational Theory and Philosophical Ethics". The
Academy of Management Review. 20 (2)

In an ideal world
What is ethical.
(right, fair, just)
What is legal.

In the real world
What is ethical.
(right, fair, just)
What is legal.

What does this
have to do with
data?

What is ethical.
(right, fair, just)
What is legal.
What does this
have to do with
data?
Data protection
Privacy Statistical bias
Black box decisions
Uneven access
self-governance
...

Machine ethics
(https://www.smbc-comics.com/comic/machine-ethics)

Example related to privacy/data protection
In 2014, New York City released data about 173m taxi
trips in the city, where the licence plates and identifier of
the taxi had been obfuscated for anonymisation
purposes.
It was de-anonymised within hours of being released…
… and later cross-referenced with timestamped pictures
of celebrities entering taxis in New York to figure out their
personal address, and how much they tipped.
See e.g. http://gawker.com/the-public-nyc-taxicab-database-that-accidentally-track-1646724546

Example related to privacy/data protection
In this case, it is useful to note that:
- Replacing identifiers with a hash is not anonymisation, it is
at best bad pseudonymisation
- Current data protection regulation in Europe regulates
against this sort of cases
- The upcoming GDPR will make the consequences of this
sort of mistakes stronger
- It defines its scope as “any information relating to an
identified or identifiable natural person ('data subject'); an
identifiable natural person is one who can be identified,
directly or indirectly”. Arguably, the unanticipated case of
the celebrities fall under this scope… and should therefore
have been anticipated.

But, should also
be asking:
What is my impact
on society? How can
I minimise the risk
of negative
implications?
(drawing upon critical
social science, and
regulation as guidelines)
How do I make
what I’m doing
compliant with
regulation?
In addition to:

Examples related to bias
Google search “unprofessional hair for work” and
“professional hair for work”

Example related to black-box decision
The US justice system relies on a tool to predict, when
judging for an offence, what is the likeliness an individual has
to re-offend.
It is based on many variables, including address, type of
offence, past history of offences, and ethnicity.
It has been demonstrated to make significant mistakes,
especially through being prone to give overly negative scores
to black people.
See https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Notes on those cases
- The algorithm is not biased, the data is. Garbage in,
garbage out.
- Human decisions are not gold standards, and therefore
should not be treated as such in training machine
learning models
- Sometimes, unrelated things just happen to correlate
(see http://www.tylervigen.com/spurious-correlations) - a
machine learning model will rely on those correlations to
make decisions.

Example related to uneven access and
under-represented cases
Researchers at Georgia Institute of Technology
developed and used a chatbot to act as a TA for
computer science courses (without the students’
knowledge).
It worked very well in most cases…
… but failed dramatically in uncommon, delicate
situation.
Bobbie Eicher et al., Jill Watson Doesn’t Care if You’re Pregnant:
Grounding AI Ethics in Empirical Studies, AIES 2018

Example related to uneven access and
under-represented cases
Notes on this case:
- Another form of bias, not related to spurious
or inaccurate correlations, but to
under-representation of specific parts of the
population.
- Raise issues with the uneven access to the
benefit of the technology, and therefore
unfairness.
- “The future is already here — it's just not very
evenly distributed” -- William Gibson
Bobbie Eicher et al., Jill Watson Doesn’t Care if You’re Pregnant:
Grounding AI Ethics in Empirical Studies, AIES 2018

Principles for designing ethics data science projects
‘Ethics in
Design’ for Data
Science
Dialectic
The process is based on a conversational
approach between data and critical social
scientists throughout the project’s life-cycle.
Reflective
Ethical concerns are not pre-fixed; they may
emanate from any stage of the project; thus,
constant reflexivity on activities and
researchers is needed.
Creative, not disruptive
The objective of this process is to achieve a
positive impact on the research, increase its
value addressing ethics throughout the
project’s life-cycle.
All- encompassing
Ethical concerns appear as much in the
research activities as in their outcomes, their
use and exploitation; the process needs to
expand on all stages.

‘Ethics in
Design’ for Data
Science
Dialectic
Reflective
All- encompassing
Methodology borrowed from design fiction:
the use of speculative and often provocative
scenarios involving the artifact to be design (a
data process), as a way to explore its
possible implications and reflect on their
consequences.
Pragmatically, it consist in telling stories
asking and answering what if questions (e.g.
“what if the student is pregnant? What would
happen then?”) and building mockups of the
final product to reflect on its behaviour.
See Anthony Dunne and Fiona
Raby, Speculative Everything, MIT
Press, 2013
and
Joseph Lindley and Paul Coulton,
"Back to the Future: 10 Years of
Design Fiction". British HCI 2015.

‘Ethics in
Design’ for Data
Science
Dialectic
Reflective
All- encompassing
i.e. don’t do that:

Some conclusions
Following regulation is insufficient for data ethics.
Ethical issues often appear after the development
phase, in scenarios that have not been
anticipated.
Need to uncover those scenarios to integrate in
the process ways of mitigating ethical
implications, and balance social, economic and
ethical values.
This cannot be done (currently) by the
technologists alone!

Shameless self-promotion
Check
Towards an “Ethics by Design” methodology for AI research projects at the first
conference on AI, Ethics and Society, AIES 2018
The Re-Coding Black Mirror worksop at The Web Conference (WWW 2018) -
https://kmitd.github.io/recoding-black-mirror/
MagnaCartaForData.org
Contacts: mathieu.daquin@insight-centre.ie, mdaquin.net, @mdaquin

Data ethics

More Related Content

What's hot

Similar to Data ethics

More from Mathieu d'Aquin

Recently uploaded

Data ethics