DATA
Tim Rich
Director of Data Science
Publicis Worldwide
AND
ETHICS
WHAT IS THIS?
‣ Advertisers and ethics… WTF!
‣ What me ethical?
‣ Mapping the code.
‣ Why do this at all?
WHAT IS THIS NOT?
‣ An attempt to get you to Tweet about something
‣ A vision for Tim’s perfect future
‣ A shameless plug for any association, business

or way of thinking
THAT BEING SAID, STICK AROUND
AND GET YOUR MIND BLOWN
WHY DOES ADVERTISING
CARE?
ADVERTISING SPENDS THE MONEY
“Follow the money.” -Karl Marx
AND IT’S A LOT...
=
2015 GDP
Portugal
Vietnam

Czech Republic
198 billion
199 billion
182 billion
579 billion
IMF - World Bank
https://blog.pagefair.com/2015/ad-blocking-report/
BUT WE HAVE A LOT TO LOSE
Brad Frost - Death to Bullshit
AND WE ALSO NEED TO RETHINK
OUR METHODS
BUT DON’T FEAR – WE HAVE
DATA AND DATA SCIENTISTS!
WHAT IS A DATA SCIENTIST?
‣ Statistics
‣ Data Strategy
‣ Social Science
‣ Coding chops
‣ Good Looks
AND WE SEEM TO HAVE MORE AND MORE
OF THEM IN THE WORLD IN GENERAL
O’Riley 2015 Data Science Survey
http://duu86o6n09pv.cloudfront.net/reports/2015-data-science-salary-survey.pdf
of +/- 600 respondents
1%
9%
23%
25%
14%
13%
6%
5%
4%
0%
5%
10%
15%
20%
25%
30%
<21 21+25 26+30 31+35 36+40 41+45 46+50 51+55 56<
Percent2of2Respondents
Reported2 Age
THEY ARE ALSO A YOUNG BUNCH
AND THAT MAKES SENSE AS
IT IS A YOUNG PROFESSION
1996 Members of the
International Federation of
Classification Societies (IFCS)
meet in Kobe, Japan.
2001 William S. Cleveland
publishes “Data Science: An Action
Plan for Expanding the Technical
Areas of the Field of Statistics.”
FIRST USE OF
“DATA SCIENCE”
THE PAPER THAT
LAUNCHED A 1,000 NERDS
MOREOVER, NEW ENTRANTS INTO THE
FIELD ARE NOT GIVEN VERY MUCH
ETHICAL TRAINING
Surveyed Syllabi from 13 Intro to Data Science Courses
ONLY THREE HAVE AT LEAST ONE
MENTION OF AN “ETHICS” COMPONENT
IN THE SYLLABUS
REGARDLESS, DATA SCIENCE IS
AFFECTING ALL OF OUR EVERYDAY
LIVES… OUR ONLINE LIVES
OUR MOVEMENT…
OUR MEDIA…
OUR MILITARY…
OUR POLITICS…
EVEN OUR IPHONES...
– Tim Cook
Earl, I think Data
Science needs a code
of ethics.
Yup.
A CODE OF ETHICS WOULD
‣ Establish credibility and responsibility outside
of nerd-dom
‣ Provide a starting point to act as technology
changes
‣ Galvanize the disparate data practitioner
community
ALL THAT’S FINE…
BUILD ANYTHING YOU FIRST HAVE
UNDERSTAND WHAT YOU ARE
WORKING WITH
A crash course in codes of ethics:
THAT SHIT HUMANS DO
A TIMELINE OF ETHICAL CODES
EGYPTIAN
CODE OF
MA’AT
JEWISH
TORAH
HIPPOCRATIC
OATH
BUSHIDO
WARRIOR
CODE
PIRATE’S
CODE OF THE
BRETHREN
FRENCH
FOREIGN
LEGION CODE
D'HONNEUR
JOURNALIST’S
CREED
NUREMBURG
CODE
I.R.B. - EXEMPT
COMMON RULE
INTERNATIONAL
STATISTICAL
INSTITUTE
ASSOCIATION
FOR COMPUTING
MACHINERY
AMERICAN
STATISTICAL
ASSOCIATION
DRAFT MODEL
BIOETHICISTS
CODE
~1200 bce~2300 bce ~500 bce 1914~1600
~1000 1831
1999199219811946
1985
2005
increase of professional codes
ETHICAL CODES ARE NOT ALL THE SAME
BUT THEY HAVE TWO CLASSES OF
CHARACTERISTICS
Inward
facing goals
Outward
facing goals
INWARD FACING GOALS
‣ Provide guidance when norms are not
explicit
‣ Reduce internal conflicts and build a
common purpose
‣ Establish professional behavior
‣ Deter unethical behavior with sanctions and
internal reporting structures
OUTWARD FACING GOALS
‣ Protect vulnerable populations who could be
harmed by profession’s activities
‣ Establish the profession as a distinct moral
community worthy of autonomy
‣ Serve as tool for disputes between member
and non-member parties
‣ Create institutions resilient to external
pressures
PROMOTE POSITIVE ENFORCEMENT
‣ Accept the distributed nature of
professional communities creates too many
judicial problems for active regulation
‣ Construct the code with consensus
allowing for broad buy-in
‣ Set boundaries and expectations of the
practicing community, allowing for self-
affirming social control mechanisms
‣ Mediate internal group needs and external
community interactions
‣ Adapt to future unknown circumstances
‣ Inspire collective identity supporting
adherence and adoption
OVERALL A PROFESSIONAL
CODE OF ETHICS SHOULD:
OKAY PROFESSOR, SO WHAT IS THE
REAL REASON DATA SCIENCE NEEDS
AN ETHICAL CODE?
MORAL HAZARD
"In economics, moral hazard occurs
when one person takes more risks
because someone else bears the
burden of those risks."
– wikipedia
https://en.wikipedia.org/wiki/Moral_hazard
MORAL HAZARD IN LENDING
http://www.pnhp.org/facts/single-payer-resources
MORAL HAZARD IN HEALTH CARE
http://www.economist.com/news/world-week/21569742-kals-cartoon
MORAL HAZARD IN ARMAMENTS
‣ Connections between data and the people
it represents are very abstracted
‣ Digital creations affect people we never
see
‣ Unintended algorithmic consequences are
almost never known or explored
‣ When was the last time an algorithm ever
“hurt” anybody?
DATA SCIENCE IS STEEPED IN
MORAL HAZARD
Well, shit.
HOW A DATA SCIENCE CODE
MAY BEGIN TO LOOK
–Paul Ohm

“Broken Promises of Privacy: Responding to
the Surprising Failure of Anonymization,”
UCLA Law Review 57,p.1702
“Data can be useful
or anonymous,
but never both.”
THUS A CODE WOULD NEED
TO MAINTAIN THE UTILITY
OF DATA
WHILE BALANCING
CONTROL OF THAT DATA
A FRAMEWORK FOR A CODE IS
COMPOSED OF THREE CLUSTERS
Data Ethics Code
Safety of used

data & analysis
Protection of
subjects
Mathematical
responsibility
Community
Privacy
bio-
information
Business
applications
3rd party
usage
Identity
Ownership Verification
Right to be
forgotten
Incorrect data
correction
PRIVACY
‣ Once you buy or sell data what are the ethics around
using it? You did ‘buy it’ right?
3rd party data
‣ What is the relationship between privacy of internet
exploration and advertisement of relevant
products?
Business applications
‣ Is data generated from your body owned differently?
Bio-information
COMMUNITY
‣ How do we protect people who our analysis affects
for negative consequences?
Protection of subjects
‣ Is there a system for correct use of professional
tools and continuing education?
Mathematical responsibility
‣ Once data is used how is it discarded and sensitive
analysis protected?
Safety of used data & analysis
IDENTITY
‣ Is there a need for a centralized personal data
safe?
Ownership
‣ How do means of validation affect access, privacy and
safety?
Validation
‣ What are the mechanisms to correct bad data?
Incorrect data correction
THESE COMPONENTS PROVIDE THE
BASIS FOR CONVERSATION NOT A
HARD STRUCTURE
Data Ethics Code
Identity
Safety of used

data & analysis
Protection of
subjects
Mathematical
responsibility
Community
Privacy
bio-
information
Business
applications
3rd party
usage
Ownership Verification
Right to be
forgotten
Incorrect data
correction
ARE THERE OTHER THINGS
WE SHOULD THINK ABOUT?
The code can not
be built on
personal
conceptions of
right and wrong.


It must be general
enough to span
cultures,
companies and
continents.
THE CODE SHOULD EXIST OUTSIDE
ANY FORMAL BUSINESS.
YOU SHOULD NOT MAKE MONEY OFF
THE CODE.
The code should not be created
by a small group, but rather
presents a chance for a more
radical form of democracy
Whatever the
combination, the code
will have to be built by
data scientists to have
any chance at adoption
Often ethical codes
come up after social
disasters, can we get
out in front of this?
Other than it
could be good
for people, why
do this at all?
IT MAKES GOOD
BUSINESS SENSE
More ethical data
treatment lowers
liability and
reduces
corporate risk
Its not a matter of if you get
hacked it is a matter of when
(and frankly if you find out)
http://www.techrepublic.com/article/data-breaches-may-cost-less-than-the-security-to-prevent-them/
$252 MILLION DOLLARS
2013 - data breach
ESTIMATED $100 MILLION - $500 MILLION
2006 - data theft
http://www.lifehealthpro.com/2015/06/18/the-10-most-expensive-data-breaches?t=regulatory&slreturn=1456110972&page=5
HIGH ESTIMATES $4 BILLION DOLLARS
2011 - data breach of 75 client companies
http://www.eweek.com/c/a/Security/Epsilon-Data-Breach-to-Cost-Billions-in-WorstCase-Scenario-459480
marketing data
THE MORAL HIGH GROUND
ALSO SELLS MORE SHIT
COVER YOUR ASS
PEOPLE WHO ARE CAUGHT
UP IN UNETHICAL BEHAVIOR
ARE USUALLY SACKED
THEIR PROJECTS ARE SCRAPPED
AND IT GETS UGLY
FROM A PROFESSIONAL
POINT OF VIEW
OK, SO WHAT’S NEXT?
Some folks working on this:
‣ The Council for Big Data, Ethics and Society
‣ Certified Analytics Professionals
‣ Michael McFarland, S.J. - Computer Scientist
‣ Cynthia Dwork - Microsoft Research
‣ Kord Davis - Digital Strategist
READ MORE HERE
TALK AMONGST YOUR FRIENDS
I’LL GIVE YOU A TOPIC
The right to be forgotten
an ideal or practically achievable?
It seems data is a commodity
does that make the data we create a
personal asset?
Ethical in a data decision making sense?
Edward Snowden
WHO IS LOOKING AFTER
YOUR DATA?
THANK YOU

Data and Ethics: Why Data Science Needs One

  • 1.
    DATA Tim Rich Director ofData Science Publicis Worldwide AND ETHICS
  • 2.
    WHAT IS THIS? ‣Advertisers and ethics… WTF! ‣ What me ethical? ‣ Mapping the code. ‣ Why do this at all?
  • 3.
    WHAT IS THISNOT? ‣ An attempt to get you to Tweet about something ‣ A vision for Tim’s perfect future ‣ A shameless plug for any association, business
 or way of thinking
  • 4.
    THAT BEING SAID,STICK AROUND AND GET YOUR MIND BLOWN
  • 5.
  • 6.
    ADVERTISING SPENDS THEMONEY “Follow the money.” -Karl Marx
  • 7.
    AND IT’S ALOT... = 2015 GDP Portugal Vietnam
 Czech Republic 198 billion 199 billion 182 billion 579 billion IMF - World Bank
  • 8.
  • 9.
    Brad Frost -Death to Bullshit AND WE ALSO NEED TO RETHINK OUR METHODS
  • 10.
    BUT DON’T FEAR– WE HAVE DATA AND DATA SCIENTISTS!
  • 11.
    WHAT IS ADATA SCIENTIST? ‣ Statistics ‣ Data Strategy ‣ Social Science ‣ Coding chops ‣ Good Looks
  • 12.
    AND WE SEEMTO HAVE MORE AND MORE OF THEM IN THE WORLD IN GENERAL
  • 13.
    O’Riley 2015 DataScience Survey http://duu86o6n09pv.cloudfront.net/reports/2015-data-science-salary-survey.pdf of +/- 600 respondents 1% 9% 23% 25% 14% 13% 6% 5% 4% 0% 5% 10% 15% 20% 25% 30% <21 21+25 26+30 31+35 36+40 41+45 46+50 51+55 56< Percent2of2Respondents Reported2 Age THEY ARE ALSO A YOUNG BUNCH
  • 14.
    AND THAT MAKESSENSE AS IT IS A YOUNG PROFESSION 1996 Members of the International Federation of Classification Societies (IFCS) meet in Kobe, Japan. 2001 William S. Cleveland publishes “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.” FIRST USE OF “DATA SCIENCE” THE PAPER THAT LAUNCHED A 1,000 NERDS
  • 15.
    MOREOVER, NEW ENTRANTSINTO THE FIELD ARE NOT GIVEN VERY MUCH ETHICAL TRAINING Surveyed Syllabi from 13 Intro to Data Science Courses
  • 16.
    ONLY THREE HAVEAT LEAST ONE MENTION OF AN “ETHICS” COMPONENT IN THE SYLLABUS
  • 17.
    REGARDLESS, DATA SCIENCEIS AFFECTING ALL OF OUR EVERYDAY LIVES… OUR ONLINE LIVES
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Earl, I thinkData Science needs a code of ethics. Yup.
  • 24.
    A CODE OFETHICS WOULD ‣ Establish credibility and responsibility outside of nerd-dom ‣ Provide a starting point to act as technology changes ‣ Galvanize the disparate data practitioner community
  • 25.
  • 26.
    BUILD ANYTHING YOUFIRST HAVE UNDERSTAND WHAT YOU ARE WORKING WITH
  • 27.
    A crash coursein codes of ethics: THAT SHIT HUMANS DO
  • 28.
    A TIMELINE OFETHICAL CODES EGYPTIAN CODE OF MA’AT JEWISH TORAH HIPPOCRATIC OATH BUSHIDO WARRIOR CODE PIRATE’S CODE OF THE BRETHREN FRENCH FOREIGN LEGION CODE D'HONNEUR JOURNALIST’S CREED NUREMBURG CODE I.R.B. - EXEMPT COMMON RULE INTERNATIONAL STATISTICAL INSTITUTE ASSOCIATION FOR COMPUTING MACHINERY AMERICAN STATISTICAL ASSOCIATION DRAFT MODEL BIOETHICISTS CODE ~1200 bce~2300 bce ~500 bce 1914~1600 ~1000 1831 1999199219811946 1985 2005 increase of professional codes
  • 29.
    ETHICAL CODES ARENOT ALL THE SAME BUT THEY HAVE TWO CLASSES OF CHARACTERISTICS Inward facing goals Outward facing goals
  • 30.
    INWARD FACING GOALS ‣Provide guidance when norms are not explicit ‣ Reduce internal conflicts and build a common purpose ‣ Establish professional behavior ‣ Deter unethical behavior with sanctions and internal reporting structures
  • 31.
    OUTWARD FACING GOALS ‣Protect vulnerable populations who could be harmed by profession’s activities ‣ Establish the profession as a distinct moral community worthy of autonomy ‣ Serve as tool for disputes between member and non-member parties ‣ Create institutions resilient to external pressures
  • 32.
    PROMOTE POSITIVE ENFORCEMENT ‣Accept the distributed nature of professional communities creates too many judicial problems for active regulation ‣ Construct the code with consensus allowing for broad buy-in ‣ Set boundaries and expectations of the practicing community, allowing for self- affirming social control mechanisms
  • 33.
    ‣ Mediate internalgroup needs and external community interactions ‣ Adapt to future unknown circumstances ‣ Inspire collective identity supporting adherence and adoption OVERALL A PROFESSIONAL CODE OF ETHICS SHOULD:
  • 34.
    OKAY PROFESSOR, SOWHAT IS THE REAL REASON DATA SCIENCE NEEDS AN ETHICAL CODE?
  • 35.
  • 36.
    "In economics, moralhazard occurs when one person takes more risks because someone else bears the burden of those risks." – wikipedia https://en.wikipedia.org/wiki/Moral_hazard
  • 37.
  • 38.
  • 39.
  • 40.
    ‣ Connections betweendata and the people it represents are very abstracted ‣ Digital creations affect people we never see ‣ Unintended algorithmic consequences are almost never known or explored ‣ When was the last time an algorithm ever “hurt” anybody? DATA SCIENCE IS STEEPED IN MORAL HAZARD
  • 41.
  • 42.
    HOW A DATASCIENCE CODE MAY BEGIN TO LOOK
  • 43.
    –Paul Ohm
 “Broken Promisesof Privacy: Responding to the Surprising Failure of Anonymization,” UCLA Law Review 57,p.1702 “Data can be useful or anonymous, but never both.”
  • 44.
    THUS A CODEWOULD NEED TO MAINTAIN THE UTILITY OF DATA WHILE BALANCING CONTROL OF THAT DATA
  • 45.
    A FRAMEWORK FORA CODE IS COMPOSED OF THREE CLUSTERS Data Ethics Code Safety of used
 data & analysis Protection of subjects Mathematical responsibility Community Privacy bio- information Business applications 3rd party usage Identity Ownership Verification Right to be forgotten Incorrect data correction
  • 46.
    PRIVACY ‣ Once youbuy or sell data what are the ethics around using it? You did ‘buy it’ right? 3rd party data ‣ What is the relationship between privacy of internet exploration and advertisement of relevant products? Business applications ‣ Is data generated from your body owned differently? Bio-information
  • 47.
    COMMUNITY ‣ How dowe protect people who our analysis affects for negative consequences? Protection of subjects ‣ Is there a system for correct use of professional tools and continuing education? Mathematical responsibility ‣ Once data is used how is it discarded and sensitive analysis protected? Safety of used data & analysis
  • 48.
    IDENTITY ‣ Is therea need for a centralized personal data safe? Ownership ‣ How do means of validation affect access, privacy and safety? Validation ‣ What are the mechanisms to correct bad data? Incorrect data correction
  • 49.
    THESE COMPONENTS PROVIDETHE BASIS FOR CONVERSATION NOT A HARD STRUCTURE Data Ethics Code Identity Safety of used
 data & analysis Protection of subjects Mathematical responsibility Community Privacy bio- information Business applications 3rd party usage Ownership Verification Right to be forgotten Incorrect data correction
  • 50.
    ARE THERE OTHERTHINGS WE SHOULD THINK ABOUT?
  • 51.
    The code cannot be built on personal conceptions of right and wrong. 
 It must be general enough to span cultures, companies and continents.
  • 52.
    THE CODE SHOULDEXIST OUTSIDE ANY FORMAL BUSINESS. YOU SHOULD NOT MAKE MONEY OFF THE CODE.
  • 53.
    The code shouldnot be created by a small group, but rather presents a chance for a more radical form of democracy
  • 54.
    Whatever the combination, thecode will have to be built by data scientists to have any chance at adoption
  • 55.
    Often ethical codes comeup after social disasters, can we get out in front of this?
  • 56.
    Other than it couldbe good for people, why do this at all?
  • 57.
  • 58.
    More ethical data treatmentlowers liability and reduces corporate risk
  • 59.
    Its not amatter of if you get hacked it is a matter of when (and frankly if you find out)
  • 60.
  • 61.
    ESTIMATED $100 MILLION- $500 MILLION 2006 - data theft http://www.lifehealthpro.com/2015/06/18/the-10-most-expensive-data-breaches?t=regulatory&slreturn=1456110972&page=5
  • 62.
    HIGH ESTIMATES $4BILLION DOLLARS 2011 - data breach of 75 client companies http://www.eweek.com/c/a/Security/Epsilon-Data-Breach-to-Cost-Billions-in-WorstCase-Scenario-459480 marketing data
  • 63.
    THE MORAL HIGHGROUND ALSO SELLS MORE SHIT
  • 65.
  • 66.
    PEOPLE WHO ARECAUGHT UP IN UNETHICAL BEHAVIOR ARE USUALLY SACKED
  • 67.
  • 68.
    AND IT GETSUGLY FROM A PROFESSIONAL POINT OF VIEW
  • 69.
  • 70.
    Some folks workingon this: ‣ The Council for Big Data, Ethics and Society ‣ Certified Analytics Professionals ‣ Michael McFarland, S.J. - Computer Scientist ‣ Cynthia Dwork - Microsoft Research ‣ Kord Davis - Digital Strategist READ MORE HERE
  • 71.
  • 72.
  • 73.
    The right tobe forgotten an ideal or practically achievable?
  • 74.
    It seems datais a commodity does that make the data we create a personal asset?
  • 75.
    Ethical in adata decision making sense? Edward Snowden
  • 76.
    WHO IS LOOKINGAFTER YOUR DATA?
  • 77.