The document discusses the necessity of an ethical framework for data science, highlighting the significant impact data science has on various sectors and the importance of establishing credibility and responsibility within the profession. It emphasizes the concept of moral hazard, where decisions affect individuals indirectly, and underlines the various components that should be included in a potential data ethics code. Additionally, it notes that ethical practices in data handling can mitigate risks and present a competitive advantage in the market.
WHAT IS THIS?
‣Advertisers and ethics… WTF!
‣ What me ethical?
‣ Mapping the code.
‣ Why do this at all?
3.
WHAT IS THISNOT?
‣ An attempt to get you to Tweet about something
‣ A vision for Tim’s perfect future
‣ A shameless plug for any association, business
or way of thinking
WHAT IS ADATA SCIENTIST?
‣ Statistics
‣ Data Strategy
‣ Social Science
‣ Coding chops
‣ Good Looks
12.
AND WE SEEMTO HAVE MORE AND MORE
OF THEM IN THE WORLD IN GENERAL
13.
O’Riley 2015 DataScience Survey
http://duu86o6n09pv.cloudfront.net/reports/2015-data-science-salary-survey.pdf
of +/- 600 respondents
1%
9%
23%
25%
14%
13%
6%
5%
4%
0%
5%
10%
15%
20%
25%
30%
<21 21+25 26+30 31+35 36+40 41+45 46+50 51+55 56<
Percent2of2Respondents
Reported2 Age
THEY ARE ALSO A YOUNG BUNCH
14.
AND THAT MAKESSENSE AS
IT IS A YOUNG PROFESSION
1996 Members of the
International Federation of
Classification Societies (IFCS)
meet in Kobe, Japan.
2001 William S. Cleveland
publishes “Data Science: An Action
Plan for Expanding the Technical
Areas of the Field of Statistics.”
FIRST USE OF
“DATA SCIENCE”
THE PAPER THAT
LAUNCHED A 1,000 NERDS
15.
MOREOVER, NEW ENTRANTSINTO THE
FIELD ARE NOT GIVEN VERY MUCH
ETHICAL TRAINING
Surveyed Syllabi from 13 Intro to Data Science Courses
16.
ONLY THREE HAVEAT LEAST ONE
MENTION OF AN “ETHICS” COMPONENT
IN THE SYLLABUS
Earl, I thinkData
Science needs a code
of ethics.
Yup.
24.
A CODE OFETHICS WOULD
‣ Establish credibility and responsibility outside
of nerd-dom
‣ Provide a starting point to act as technology
changes
‣ Galvanize the disparate data practitioner
community
A TIMELINE OFETHICAL CODES
EGYPTIAN
CODE OF
MA’AT
JEWISH
TORAH
HIPPOCRATIC
OATH
BUSHIDO
WARRIOR
CODE
PIRATE’S
CODE OF THE
BRETHREN
FRENCH
FOREIGN
LEGION CODE
D'HONNEUR
JOURNALIST’S
CREED
NUREMBURG
CODE
I.R.B. - EXEMPT
COMMON RULE
INTERNATIONAL
STATISTICAL
INSTITUTE
ASSOCIATION
FOR COMPUTING
MACHINERY
AMERICAN
STATISTICAL
ASSOCIATION
DRAFT MODEL
BIOETHICISTS
CODE
~1200 bce~2300 bce ~500 bce 1914~1600
~1000 1831
1999199219811946
1985
2005
increase of professional codes
29.
ETHICAL CODES ARENOT ALL THE SAME
BUT THEY HAVE TWO CLASSES OF
CHARACTERISTICS
Inward
facing goals
Outward
facing goals
30.
INWARD FACING GOALS
‣Provide guidance when norms are not
explicit
‣ Reduce internal conflicts and build a
common purpose
‣ Establish professional behavior
‣ Deter unethical behavior with sanctions and
internal reporting structures
31.
OUTWARD FACING GOALS
‣Protect vulnerable populations who could be
harmed by profession’s activities
‣ Establish the profession as a distinct moral
community worthy of autonomy
‣ Serve as tool for disputes between member
and non-member parties
‣ Create institutions resilient to external
pressures
32.
PROMOTE POSITIVE ENFORCEMENT
‣Accept the distributed nature of
professional communities creates too many
judicial problems for active regulation
‣ Construct the code with consensus
allowing for broad buy-in
‣ Set boundaries and expectations of the
practicing community, allowing for self-
affirming social control mechanisms
33.
‣ Mediate internalgroup needs and external
community interactions
‣ Adapt to future unknown circumstances
‣ Inspire collective identity supporting
adherence and adoption
OVERALL A PROFESSIONAL
CODE OF ETHICS SHOULD:
34.
OKAY PROFESSOR, SOWHAT IS THE
REAL REASON DATA SCIENCE NEEDS
AN ETHICAL CODE?
"In economics, moralhazard occurs
when one person takes more risks
because someone else bears the
burden of those risks."
– wikipedia
https://en.wikipedia.org/wiki/Moral_hazard
‣ Connections betweendata and the people
it represents are very abstracted
‣ Digital creations affect people we never
see
‣ Unintended algorithmic consequences are
almost never known or explored
‣ When was the last time an algorithm ever
“hurt” anybody?
DATA SCIENCE IS STEEPED IN
MORAL HAZARD
–Paul Ohm
“Broken Promisesof Privacy: Responding to
the Surprising Failure of Anonymization,”
UCLA Law Review 57,p.1702
“Data can be useful
or anonymous,
but never both.”
44.
THUS A CODEWOULD NEED
TO MAINTAIN THE UTILITY
OF DATA
WHILE BALANCING
CONTROL OF THAT DATA
45.
A FRAMEWORK FORA CODE IS
COMPOSED OF THREE CLUSTERS
Data Ethics Code
Safety of used
data & analysis
Protection of
subjects
Mathematical
responsibility
Community
Privacy
bio-
information
Business
applications
3rd party
usage
Identity
Ownership Verification
Right to be
forgotten
Incorrect data
correction
46.
PRIVACY
‣ Once youbuy or sell data what are the ethics around
using it? You did ‘buy it’ right?
3rd party data
‣ What is the relationship between privacy of internet
exploration and advertisement of relevant
products?
Business applications
‣ Is data generated from your body owned differently?
Bio-information
47.
COMMUNITY
‣ How dowe protect people who our analysis affects
for negative consequences?
Protection of subjects
‣ Is there a system for correct use of professional
tools and continuing education?
Mathematical responsibility
‣ Once data is used how is it discarded and sensitive
analysis protected?
Safety of used data & analysis
48.
IDENTITY
‣ Is therea need for a centralized personal data
safe?
Ownership
‣ How do means of validation affect access, privacy and
safety?
Validation
‣ What are the mechanisms to correct bad data?
Incorrect data correction
49.
THESE COMPONENTS PROVIDETHE
BASIS FOR CONVERSATION NOT A
HARD STRUCTURE
Data Ethics Code
Identity
Safety of used
data & analysis
Protection of
subjects
Mathematical
responsibility
Community
Privacy
bio-
information
Business
applications
3rd party
usage
Ownership Verification
Right to be
forgotten
Incorrect data
correction
ESTIMATED $100 MILLION- $500 MILLION
2006 - data theft
http://www.lifehealthpro.com/2015/06/18/the-10-most-expensive-data-breaches?t=regulatory&slreturn=1456110972&page=5
62.
HIGH ESTIMATES $4BILLION DOLLARS
2011 - data breach of 75 client companies
http://www.eweek.com/c/a/Security/Epsilon-Data-Breach-to-Cost-Billions-in-WorstCase-Scenario-459480
marketing data
Some folks workingon this:
‣ The Council for Big Data, Ethics and Society
‣ Certified Analytics Professionals
‣ Michael McFarland, S.J. - Computer Scientist
‣ Cynthia Dwork - Microsoft Research
‣ Kord Davis - Digital Strategist
READ MORE HERE