Data and Ethics: Why Data Science Needs One

DATA
Tim Rich
Director of Data Science
Publicis Worldwide
AND
ETHICS

WHAT IS THIS?
‣ Advertisers and ethics… WTF!
‣ What me ethical?
‣ Mapping the code.
‣ Why do this at all?

WHAT IS THIS NOT?
‣ An attempt to get you to Tweet about something
‣ A vision for Tim’s perfect future
‣ A shameless plug for any association, business 
or way of thinking

THAT BEING SAID, STICK AROUND
AND GET YOUR MIND BLOWN

ADVERTISING SPENDS THE MONEY
“Follow the money.” -Karl Marx

AND IT’S A LOT...
=
2015 GDP
Portugal
Vietnam 
Czech Republic
198 billion
199 billion
182 billion
579 billion
IMF - World Bank

https://blog.pagefair.com/2015/ad-blocking-report/
BUT WE HAVE A LOT TO LOSE

Brad Frost - Death to Bullshit
AND WE ALSO NEED TO RETHINK
OUR METHODS

BUT DON’T FEAR – WE HAVE
DATA AND DATA SCIENTISTS!

WHAT IS A DATA SCIENTIST?
‣ Statistics
‣ Data Strategy
‣ Social Science
‣ Coding chops
‣ Good Looks

AND WE SEEM TO HAVE MORE AND MORE
OF THEM IN THE WORLD IN GENERAL

O’Riley 2015 Data Science Survey
http://duu86o6n09pv.cloudfront.net/reports/2015-data-science-salary-survey.pdf
of +/- 600 respondents
1%
9%
23%
25%
14%
13%
6%
5%
4%
0%
5%
10%
15%
20%
25%
30%
<21 21+25 26+30 31+35 36+40 41+45 46+50 51+55 56<
Percent2of2Respondents
Reported2 Age
THEY ARE ALSO A YOUNG BUNCH

AND THAT MAKES SENSE AS
IT IS A YOUNG PROFESSION
1996 Members of the
International Federation of
Classiﬁcation Societies (IFCS)
meet in Kobe, Japan.
2001 William S. Cleveland
publishes “Data Science: An Action
Plan for Expanding the Technical
Areas of the Field of Statistics.”
FIRST USE OF
“DATA SCIENCE”
THE PAPER THAT
LAUNCHED A 1,000 NERDS

MOREOVER, NEW ENTRANTS INTO THE
FIELD ARE NOT GIVEN VERY MUCH
ETHICAL TRAINING
Surveyed Syllabi from 13 Intro to Data Science Courses

ONLY THREE HAVE AT LEAST ONE
MENTION OF AN “ETHICS” COMPONENT
IN THE SYLLABUS

REGARDLESS, DATA SCIENCE IS
AFFECTING ALL OF OUR EVERYDAY
LIVES… OUR ONLINE LIVES

EVEN OUR IPHONES...
– Tim Cook

Earl, I think Data
Science needs a code
of ethics.
Yup.

A CODE OF ETHICS WOULD
‣ Establish credibility and responsibility outside
of nerd-dom
‣ Provide a starting point to act as technology
changes
‣ Galvanize the disparate data practitioner
community

BUILD ANYTHING YOU FIRST HAVE
UNDERSTAND WHAT YOU ARE
WORKING WITH

A crash course in codes of ethics:
THAT SHIT HUMANS DO

A TIMELINE OF ETHICAL CODES
EGYPTIAN
CODE OF
MA’AT
JEWISH
TORAH
HIPPOCRATIC
OATH
BUSHIDO
WARRIOR
CODE
PIRATE’S
CODE OF THE
BRETHREN
FRENCH
FOREIGN
LEGION CODE
D'HONNEUR
JOURNALIST’S
CREED
NUREMBURG
CODE
I.R.B. - EXEMPT
COMMON RULE
INTERNATIONAL
STATISTICAL
INSTITUTE
ASSOCIATION
FOR COMPUTING
MACHINERY
AMERICAN
STATISTICAL
ASSOCIATION
DRAFT MODEL
BIOETHICISTS
CODE
~1200 bce~2300 bce ~500 bce 1914~1600
~1000 1831
1999199219811946
1985
2005
increase of professional codes

ETHICAL CODES ARE NOT ALL THE SAME
BUT THEY HAVE TWO CLASSES OF
CHARACTERISTICS
Inward
facing goals
Outward
facing goals

INWARD FACING GOALS
‣ Provide guidance when norms are not
explicit
‣ Reduce internal conﬂicts and build a
common purpose
‣ Establish professional behavior
‣ Deter unethical behavior with sanctions and
internal reporting structures

OUTWARD FACING GOALS
‣ Protect vulnerable populations who could be
harmed by profession’s activities
‣ Establish the profession as a distinct moral
community worthy of autonomy
‣ Serve as tool for disputes between member
and non-member parties
‣ Create institutions resilient to external
pressures

PROMOTE POSITIVE ENFORCEMENT
‣ Accept the distributed nature of
professional communities creates too many
judicial problems for active regulation
‣ Construct the code with consensus
allowing for broad buy-in
‣ Set boundaries and expectations of the
practicing community, allowing for self-
afﬁrming social control mechanisms

‣ Mediate internal group needs and external
community interactions
‣ Adapt to future unknown circumstances
‣ Inspire collective identity supporting
adherence and adoption
OVERALL A PROFESSIONAL
CODE OF ETHICS SHOULD:

OKAY PROFESSOR, SO WHAT IS THE
REAL REASON DATA SCIENCE NEEDS
AN ETHICAL CODE?

"In economics, moral hazard occurs
when one person takes more risks
because someone else bears the
burden of those risks."
– wikipedia
https://en.wikipedia.org/wiki/Moral_hazard

http://www.pnhp.org/facts/single-payer-resources
MORAL HAZARD IN HEALTH CARE

http://www.economist.com/news/world-week/21569742-kals-cartoon
MORAL HAZARD IN ARMAMENTS

‣ Connections between data and the people
it represents are very abstracted
‣ Digital creations affect people we never
see
‣ Unintended algorithmic consequences are
almost never known or explored
‣ When was the last time an algorithm ever
“hurt” anybody?
DATA SCIENCE IS STEEPED IN
MORAL HAZARD

HOW A DATA SCIENCE CODE
MAY BEGIN TO LOOK

–Paul Ohm 
“Broken Promises of Privacy: Responding to
the Surprising Failure of Anonymization,”
UCLA Law Review 57,p.1702
“Data can be useful
or anonymous,
but never both.”

THUS A CODE WOULD NEED
TO MAINTAIN THE UTILITY
OF DATA
WHILE BALANCING
CONTROL OF THAT DATA

A FRAMEWORK FOR A CODE IS
COMPOSED OF THREE CLUSTERS
Data Ethics Code
Safety of used 
data & analysis
Protection of
subjects
Mathematical
responsibility
Community
Privacy
bio-
information
Business
applications
3rd party
usage
Identity
Ownership Veriﬁcation
Right to be
forgotten
Incorrect data
correction

PRIVACY
‣ Once you buy or sell data what are the ethics around
using it? You did ‘buy it’ right?
3rd party data
‣ What is the relationship between privacy of internet
exploration and advertisement of relevant
products?
Business applications
‣ Is data generated from your body owned differently?
Bio-information

COMMUNITY
‣ How do we protect people who our analysis affects
for negative consequences?
Protection of subjects
‣ Is there a system for correct use of professional
tools and continuing education?
Mathematical responsibility
‣ Once data is used how is it discarded and sensitive
analysis protected?
Safety of used data & analysis

IDENTITY
‣ Is there a need for a centralized personal data
safe?
Ownership
‣ How do means of validation affect access, privacy and
safety?
Validation
‣ What are the mechanisms to correct bad data?
Incorrect data correction

THESE COMPONENTS PROVIDE THE
BASIS FOR CONVERSATION NOT A
HARD STRUCTURE
Data Ethics Code
Identity
Safety of used 
data & analysis
Protection of
subjects
Mathematical
responsibility
Community
Privacy
bio-
information
Business
applications
3rd party
usage
Ownership Veriﬁcation
Right to be
forgotten
Incorrect data
correction

ARE THERE OTHER THINGS
WE SHOULD THINK ABOUT?

The code can not
be built on
personal
conceptions of
right and wrong.
 
It must be general
enough to span
cultures,
companies and
continents.

THE CODE SHOULD EXIST OUTSIDE
ANY FORMAL BUSINESS.
YOU SHOULD NOT MAKE MONEY OFF
THE CODE.

The code should not be created
by a small group, but rather
presents a chance for a more
radical form of democracy

Whatever the
combination, the code
will have to be built by
data scientists to have
any chance at adoption

Often ethical codes
come up after social
disasters, can we get
out in front of this?

Other than it
could be good
for people, why
do this at all?

More ethical data
treatment lowers
liability and
reduces
corporate risk

Its not a matter of if you get
hacked it is a matter of when
(and frankly if you ﬁnd out)

http://www.techrepublic.com/article/data-breaches-may-cost-less-than-the-security-to-prevent-them/
$252 MILLION DOLLARS
2013 - data breach

ESTIMATED $100 MILLION - $500 MILLION
2006 - data theft
http://www.lifehealthpro.com/2015/06/18/the-10-most-expensive-data-breaches?t=regulatory&slreturn=1456110972&page=5

HIGH ESTIMATES $4 BILLION DOLLARS
2011 - data breach of 75 client companies
http://www.eweek.com/c/a/Security/Epsilon-Data-Breach-to-Cost-Billions-in-WorstCase-Scenario-459480
marketing data

THE MORAL HIGH GROUND
ALSO SELLS MORE SHIT

PEOPLE WHO ARE CAUGHT
UP IN UNETHICAL BEHAVIOR
ARE USUALLY SACKED

AND IT GETS UGLY
FROM A PROFESSIONAL
POINT OF VIEW

Some folks working on this:
‣ The Council for Big Data, Ethics and Society
‣ Certiﬁed Analytics Professionals
‣ Michael McFarland, S.J. - Computer Scientist
‣ Cynthia Dwork - Microsoft Research
‣ Kord Davis - Digital Strategist
READ MORE HERE

The right to be forgotten
an ideal or practically achievable?

It seems data is a commodity
does that make the data we create a
personal asset?

Ethical in a data decision making sense?
Edward Snowden

WHO IS LOOKING AFTER
YOUR DATA?

Data and Ethics: Why Data Science Needs One

More Related Content

What's hot

Similar to Data and Ethics: Why Data Science Needs One

Recently uploaded

Data and Ethics: Why Data Science Needs One