SlideShare a Scribd company logo
Ethics in Data Science
and Machine Learning
Thierry Silbermann
> 3000 members in the group !
1
Disclaimer
• Most of this presentation is taken from this class:
• https://www.edx.org/course/data-science-ethics-
michiganx-ds101x-1
2
3
Machine Learning
What are Ethics?
• Ethics tells us about right and wrong
• They are shared value / societal rule
• Ethic is not law
• No philosophical questions, but ethical practice of
Data Science
4
Inform Consent
5
Informed Consent
• Human Subject must be
• Informed about the experiment
• Must consent to the experiment
• Voluntarily
• Must have the right to withdraw consent at any
time
6
Facebook/Cornell
experiment
• Study of “emotional contagion”
7
Facebook/Cornell
experiment
• 689,003 people experiment
• One week period in 2012
• Facebook changed deliberately the content of the
feed
• Group A: Removed negative content from feed
• Group B: Removed positive content from feed
8
Facebook/Cornell
experiment
9
Data Ownership
10
• Most of the time you don’t own the data about you.
The data belongs to the company who collected it
• Nevertheless we might have some control over
these data that aren’t ours because they are about
us
Data Ownership
11
• We need to create the principle to reason about
this control and that’s the main concern of a
discussion about the right to privacy
• If the company goes bankrupt, the company
buying it should keep the same privacy
Data Ownership
12
Privacy
13
• Privacy, of course is the first concern that comes to so
many minds when we talk about Big Data.
• How do we get the value we would like by collecting,
linking, and analyzing data, while at the same time
avoiding the harms that can occur due to data about us
being collected, linked, analyzed, and propagated?
• Can we define reasonable rules that all would agree to?
• Can we make these tradeoffs, knowing that maintaining
anonymity is very hard?
Privacy
14
• People have different privacy boundaries.
• As society adapt to new technologies, attitude
change
• But different boundaries doesn’t mean no boundary
Privacy
15
No option to exit
• In the past, one could get a fresh start by:
• moving to a new place
• waiting till the past fades (reputation can rebuild
over time)
• Big Data is universal and never forgets
• Data Science results in major asymmetries in
knowledge
16
Wayback Machine
• Archives pages on the web (https://archive.org/
web/ - 300 billion pages saved over time)
• almost everything that is accessible
• should be retain forever
• If you have an unflattering page written about you,
it will survive for ever in the archive (even if the
original is removed)
17
Right to be forgotten
• Laws are often written to clear a person’s record
after some years.
• Law in EU and Argentina since 2006
• impacts search engines (not removed completely
but hard to find)
18
Collection vs Use
• Privacy is usually harmed only upon use of data
• Collection is a necessary first step before use
• Existence of collection can quickly lead to use
• But collection without use may sometimes be right
• E.g surveillance
• By the time you know what you need, it is too late to
go back and get it
19
Loss of Privacy
• Due to loss of control over personal data
• I am ok with you having certain data about me that
I have chosen to share with you or that is public,
but I really do not want you to share my data in
ways that I do not approve
20
‘Waste’ Data Collection
• Your ID is taken at the club (Name, address, age)
• How this data is being used ? Is it stored ?
21
Metadata
• Data about the data
• Often distinguish from data content
22
Metadata
• E.g phone call, metadata includes
• Caller
• Callee
• Time of Date of Call
• Duration
• Location
23
Underestimating Analysis
• A smart meter at your house can recognise
“signatures” of water use every time you flush the
toilet, take a shower, or wash clothes
24
Privacy is a basic human
need
• Even for people who have nothing to hide
25
Sneaky mobile App
• There was a time where App didn’t tell you what kind
of data they were collecting
• Many app asks for far more permissions that they
need
• Might be used for future functionality
• But most of the time just for adware
• Picture management app that needs your location
26
Anonymity
27
On the internet, nobody
knows you are a dog
• You can say whoever you are
• You can say whatever you are
• You can make up a persona
• But today, we find it less and less true
28
Many transactions need ID
• You must provide an address to receive goods
• You must give your name for travel booking
• You must reveal your location to get cellular service
• You must disclose intimate details of your health
care and lifestyle to get effective medical care
29
Facebook real name policy
• Robin Kills The Enemy
• Hiroko Yoda
• Phuc Dat Bich (pronunced: Phoo Da Bi)
https://en.wikipedia.org/wiki/Facebook_real-name_policy_controversy
30
Enough history tells all
• Search pattern for person can reveal your identity
• If we have a log of all your web searches over
some period, we can form a very good idea of who
you are, and quite likely identify you
31
De-identification
• Given zip code, birth date and sex, about 87% of
Social Security Numbers can be determined
uniquely
• Those three fields are usually not considered PII
(Personally Identifiable Information)
32
Netflix Prize
• User_ID, Movie, Ratings, Date
• Merge with data from IMDb
• With only a few ratings, user could be linked across the
two systems
• Their movie choices could be used to determined sexual
orientation, even if all their IMDb reviews revealed no such
information
• Bad already for only movie recommendation, so what
about medical records ?
33
Four Types of Leakage
• Reveal identity
• Reveal value hidden attribute
• Reveal link between two entities
• Reveal group membership
34
Anonymity is Impossible
• Anonymity is virtually impossible, with enough other
data
• Diversity of entity sets can be eliminated through
joining external data
• Aggregation works only if there is no known
structure among entities aggregated
• Face can be recognised in image data
35
Should we prevent sharing
data ?
• If anonymity is not possible, the simplest way to
prevent misuse is not to publish the dataset
• E.g. government agencies should not make public
potentially sensitive data
• Yes access to data is crucial for many desirable
purposes
• Medical data
• Public watchdog
36
Little game
• For the next questions, assume the following approximate
numbers:
• Population of Brazil = 210,000,000
• Number of states in Brazil: 26
• Number of zip codes in the Brazil = 100,000
• Number of days in a year = 350
• Assume also that each person lives for exactly 75 years
• Finally assume that all distributions are perfectly uniform.
37
Little game
• How many people live in any one zip code in the
Brazil?
38
Brazil
population
Nb States Nb zip Code Nb days in a
year
Life
expectation
210,000,000 26 100,000 350 75
Little game
• How many people live in any one zip code in the
Brazil? Answer: 2,100
39
Brazil
population
Nb States Nb zip Code Nb days in a
year
Life
expectation
210,000,000 26 100,000 350 75
Little game
• How many people live in any one zip code in the
Brazil? Answer: 2,100
• How many people in the Brazil share the same
gender, zip code and birthday (but not birth year)?
40
Brazil
population
Nb States Nb zip Code Nb days in a
year
Life
expectation
210,000,000 26 100,000 350 75
Little game
• How many people live in any one zip code in the
Brazil? Answer: 2,100
• How many people in the Brazil share the same
gender, zip code and birthday (but not birth year)?
Answer: 3
41
Brazil
population
Nb States Nb zip Code Nb days in a
year
Life
expectation
210,000,000 26 100,000 350 75
Little game
• How many people live in any one zip code in the Brazil?
Answer: 2,100
• How many people in the Brazil share the same gender,
zip code and birthday (but not birth year)? Answer: 3
• How many people in the Brazil share the same zip code
and birth date (including birth year)?
42
Brazil
population
Nb States Nb zip Code Nb days in a
year
Life
expectation
210,000,000 26 100,000 350 75
Little game
• How many people live in any one zip code in the Brazil?
Answer: 2,100
• How many people in the Brazil share the same gender,
zip code and birthday (but not birth year)? Answer: 3
• How many people in the Brazil share the same zip code
and birth date (including birth year)? Answer: 0
43
Brazil
population
Nb States Nb zip Code Nb days in a
year
Life
expectation
210,000,000 26 100,000 350 75
Data Validity
44
Validity
• Bad Data and Bad Models lead to bad decisions
• If decision making is opaque, results can be bad in
the aggregate, and catastrophic for an individual
• What if someone has a loan denied because of an
error in the data analysed
45
Sources of Error
1. Choice of representative sample
2. Choice of attributes and measures
3. Errors in the Data
4. Errors in Model Design
5. Errors in Data Processing
6. Managing change
46
1) Choice of representative
sample
• Twitterverse (young, tech-savvy, richer than
average population)
• Should make sure that race, gender, age are well-
balanced
47
Example: Google Labelling
Error
• Due to poor representative sample, Google image
recognition technology used to classify picture of
black people as gorilla…
• You rarely can think of how your model can go
wrong
• Most likely the training set contained very few
dark faces
48
2) Choice of attributes and
measures
• Usually limited to what is available
• Additional attributes can sometimes be purchased
or collected
• Require cost to value tradeoff
• Still, need to think about missing attributes
49
3) Errors in the Data
• In the US in 2012, FTC found that 26% of the
consumers had at least one error in their credit
report
• 20% of these errors resulted in substantially lower
credit score
• Credit reporting agencies have a correction
process
• By 2015, about 20% still had unresolved errors
50
Third Party Data
• Material decisions can often be made on the basis
of public data or data provided by third party
• There are often errors in these data
• Does the affected subject have a mechanism to
correct errors ?
• Does the affected subject even know what data
were used ?
51
Solutions
• Are data sources authoritative, complete and timely
?
• Will subject have access ?
• How are mistakes and unintended consequences
detected and corrected ?
52
4) Errors in Model Design
• Ending up with invalid conclusions even with perfect
inputs, perfect data going in…
• Many ways model could be incorrect
1. Model structure and Extrapolation
2. Feature Selection
3. Ecological Fallacy
4. Simpson’s Paradox
53
4.1) Model Structure and
Extrapolation
• Most machine learning model just estimates
parameters to fit a pre-determined model
• Do you know the model is appropriate ?
• Are you trying to fit a linear model to a complex
non-linear reality ? Or the opposite ?
54
https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/
55
4.1) Model Structure and
Extrapolation
https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/
56
4.1) Model Structure and
Extrapolation
https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/
57
4.1) Model Structure and
Extrapolation
Overfitted model: Earthquake
magnitude 9 or more every 13000
years
Good model: Every 300 years
https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/
58
4.1) Model Structure and
Extrapolation
4.2) Feature selection
• Did you know that taller people are more likely to
grow beards?
59
4.2) Feature selection
• Did you know that taller people are more likely to
grow beards?
• Women are generally shorter
• They don’t grow beards
60
4.3) Ecological Fallacy
• Analysing results for a group and assign results to
individual
• Districts with higher income have lower crime rates
• => richer individual less likely to be a criminal
61
4.4) Other Game
Men Women
Easy
University
Hard
University
62
Other Game
Men Women
Easy
University
7/10=0.7 4/5=0.8
Hard
University
3/10=0.3 5/15=0.33
63
Other Game
Men Women
Easy
University
7/10=0.7 4/5=0.8
Hard
University
3/10=0.3 5/15=0.33
All 10/20=0.5 9/20=0.45
64
Simpson’s Paradox
Men Women
Easy
University
7/10=0.7 4/5=0.8
Hard
University
3/10=0.3 5/15=0.33
All 10/20=0.5 9/20=0.45
What happened ?
Aggregate data is
reflecting the combination
of two separate ratios
65
5) Error in Data Processing
• Wrong entry for example
• Bug in code
66
6) Managing Change
• System change continuously
• Is analysis still valid?
• Most changes may not impact analysis
• But some do and we might not know which one
• Famous case of “Google Flu”
• Predictor worked beautifully for a while
• Then crashed
67
Campbell’s Law
• “The more any quantitative social indicator (or even
some qualitative indicator) is used for social
decision making, the more subject it will be to
corruption pressures and the more apt it will be
distort and corrupt the social processes it is
intended to monitor” — Donal Campbell, 1979
68
Equivalent for DS
• If the metric are known, and they matter, people will
work towards the metrics
• If critical analysis inputs can be manipulated, they
will be
69
Algorithmic Fairness
70
Algorithmic Fairness
• Can algorithm be biased?
• Can we make algorithms unbiased?
Is training data set representative of the population?
Is past population representative of future
population?
Are observed correlations due to confounding
processes?
71
Example: Algorithmic
Vicious Cycle
• Company has only 10% women employees
• Company has “‘boys’ club culture” that makes it
difficult for women to succeed
• Hiring algorithm trained on current data, based on
current employee success, scores women
candidates lower
• Company ends up hiring fewer women
72
Bad Analysis from Good
Data
• Correlated attributes
• Correct but misleading results
• P-Hacking
73
Racial Discrimination
• Universities prohibited by law from considering
race in admission
• Can find surrogate features that get them close,
without violating the law
• Lender prohibited by law from redlining on race
• Can find surrogate features
• In general, proxy can be found
74
Discrimination Intent
• Big Data provides the technology to facilitate such
proxy discrimination
• Whether this technology is used this way becomes
a matter of intent
• It also provides the technology to detect and
address discrimination
75
Unintentional Discrimination
76
https://www.youtube.com/watch?v=hDgXIUM3Rmw
Unintentional Discrimination
77 https://www.youtube.com/watch?v=hDgXIUM3Rmw
Unintentional Discrimination
78
https://blog.conceptnet.io/2017/07/13/how-to-make-a-racist-ai-without-really-trying/
Unintentional Discrimination
79
https://blog.conceptnet.io/2017/07/13/how-to-make-a-racist-ai-without-really-trying/
Unintentional Discrimination
80
https://blog.conceptnet.io/2017/07/13/how-to-make-a-racist-ai-without-really-trying/
Unintentional Discrimination
81
https://blog.conceptnet.io/2017/07/13/how-to-make-a-racist-ai-without-really-trying/
Unintentional Discrimination
82
https://blog.conceptnet.io/2017/07/13/how-to-make-a-racist-ai-without-really-trying/
How to manage?
• https://github.com/columbia/fairtest
83
Ossification / Echo chamber
• When optimising for a metric only, we might forget
about some other issues
• Problem in Recommendation
84
Correct but
misleading
85
Correct but Misleading
Results: Visualisation edition
86
http://viz.wtf/
Correct but Misleading
Results: Visualisation edition
87
http://viz.wtf/
Correct but Misleading
Results: Visualisation edition
88
http://viz.wtf/
Correct but Misleading
Results: Visualisation edition
89
http://viz.wtf/
Correct but Misleading
Results: Visualisation edition
90
Correct but Misleading
Results: Score Edition
• Hotel A gets average 3.2, with a mix of mostly 3
and 4
• Hotel B gets average 3.2, with a mix of mostly 1
and 5
91
Correct but Misleading
Results: Score Edition
• Hotel A gets average 4.5, based on 2 reviews
• Hotel B gets average 4.4, based on 200 reviews
92
Correct but Misleading
Results: Score Edition
• Hotel A gets average 4.5, based on 10 reviews
• Hotel B gets average 4.4, based on 500 reviews
93
Correct but Misleading
Results: Score Edition
• Hotel A gets average 4.5, based on 10 reviews
• Hotel B gets average 4.4, based on 500 reviews
• Hotel A has 5 rooms, while hotel B has 500 rooms.
94
95
Minority Loses
96
Minority Loses
97
Diversity Suppression
(Hiring)
• Use Data Science to find promising prospects
• Criteria are tuned to fit the majority
• Algorithm performs poorly on (some) minorities
• Best minority applicants are not hired
98
Diversity Suppression
(Medical)
• Group A in Majority
• Drug is found effective with suitable significance level
• Patients in group B are also given this drug
• Group B in majority
• Drug is not found effective with sufficient significance
over the whole population
• Drug is not approved, even though minority (group A)
patients could have benefitted from it
P-Value Hacking
• Please read extensively about it before reporting
any result saying that your experiment has a p
value < 0.05
• Standard p-value mathematics was developed for
traditional experimental techniques where you
design the experiment first and then collect the
data
99
Algorithmic Fairness
Conlusion
• Human have many biases
No human is perfectly fair even with the best of
intentions and biases are hard to detect
sometime
• Biases in algorithms usually easier to measure,
even if outcome is no fairer
100
Model interpretability
• Have a right to understand an algorithm decision
101
Code of Ethics
102
Code of Ethics
• Doctor have Hippocratic oath
• Journalists, lawyers, etc…
• Regulation?
• Trade associations
• Companies lose if they annoy customers
• They self-regulate
103
Code of Ethics
• Can’t just take data and spit out what the algorithm
spit out
• Need to understand the outcomes
• Need to own the outcomes
104
(One) Code of Ethics
• Do not surprise
• Who owns the data?
• What can the data be used for?
• What can you hide in exposed data?
• Own the outcome
• Is the data analysis valid?
• Is the data analysis fair?
• What are the societal consequences?
105
Conclusion: Ethics Matter
• Data Science has great power — to harm or to help
• Data Scientists must care about how this power is
used
• Cannot hide behind a claim of “neutral
technology”
• We are all better off if we voluntarily limit how this
power is used
106
Questions?
• Resources:
• https://www.youtube.com/watch?v=hDgXIUM3Rmw
• https://blog.conceptnet.io/2017/07/13/how-to-make-a-racist-ai-
without-really-trying/
• https://www.edx.org/course/data-science-ethics-michiganx-
ds101x-1
• https://www.socialcooling.com/
• http://www.fatml.org/
• http://viz.wtf
107
https://creativecommons.org/licenses/by-nc-nd/4.0/
The rise of social cooling
• Like oil leads to global warming, data leads to
social cooling
• If you feel you are being watched, you change your
behaviour
• Your data is turned into thousands of different
scores.
108
109
110
Social cooling
• People are starting to realise that this ‘digital
reputation’ could limit their opportunities
• People are changing their behaviour to get better
scores
111

More Related Content

What's hot

Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
Edureka!
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Krishnaram Kenthapadi
 

What's hot (20)

Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Ethical Issues in Machine Learning Algorithms. (Part 1)
Ethical Issues in Machine Learning Algorithms. (Part 1)Ethical Issues in Machine Learning Algorithms. (Part 1)
Ethical Issues in Machine Learning Algorithms. (Part 1)
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Seaborn visualization.pptx
Seaborn visualization.pptxSeaborn visualization.pptx
Seaborn visualization.pptx
 
Fairness in AI (DDSW 2019)
Fairness in AI (DDSW 2019)Fairness in AI (DDSW 2019)
Fairness in AI (DDSW 2019)
 
Data Cleaning
Data CleaningData Cleaning
Data Cleaning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Machine Learning for Sales & Marketing
Machine Learning for Sales & MarketingMachine Learning for Sales & Marketing
Machine Learning for Sales & Marketing
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Data Wrangling
Data WranglingData Wrangling
Data Wrangling
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
How Will Knowledge Graphs Improve Clinical Reporting Workflows
How Will Knowledge Graphs Improve Clinical Reporting WorkflowsHow Will Knowledge Graphs Improve Clinical Reporting Workflows
How Will Knowledge Graphs Improve Clinical Reporting Workflows
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting WorkflowsGSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 

Viewers also liked

Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and Recruiting
HackerEarth
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
Antti Haapala
 
Druva Casestudy - HackerEarth
Druva Casestudy - HackerEarthDruva Casestudy - HackerEarth
Druva Casestudy - HackerEarth
HackerEarth
 

Viewers also liked (20)

Kill the wabbit
Kill the wabbitKill the wabbit
Kill the wabbit
 
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case Study
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growth
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
6 rules of enterprise innovation
6 rules of enterprise innovation6 rules of enterprise innovation
6 rules of enterprise innovation
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and Recruiting
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
 
Smart Switchboard: An home automation system
Smart Switchboard: An home automation systemSmart Switchboard: An home automation system
Smart Switchboard: An home automation system
 
State of women in technical workforce
State of women in technical workforceState of women in technical workforce
State of women in technical workforce
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
 
Vowpal Wabbit
Vowpal WabbitVowpal Wabbit
Vowpal Wabbit
 
Work - LIGHT Ministry
Work - LIGHT MinistryWork - LIGHT Ministry
Work - LIGHT Ministry
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
Druva Casestudy - HackerEarth
Druva Casestudy - HackerEarthDruva Casestudy - HackerEarth
Druva Casestudy - HackerEarth
 

Similar to Ethics in Data Science and Machine Learning

Class_5_Data_2018W_pptx.pptx
Class_5_Data_2018W_pptx.pptxClass_5_Data_2018W_pptx.pptx
Class_5_Data_2018W_pptx.pptx
ccaskumba
 
Basic of Biometrics Technology
Basic of Biometrics Technology Basic of Biometrics Technology
Basic of Biometrics Technology
NEHA SINGH
 

Similar to Ethics in Data Science and Machine Learning (20)

Privacy and libraries
Privacy and librariesPrivacy and libraries
Privacy and libraries
 
Class_5_Data_2018W_pptx.pptx
Class_5_Data_2018W_pptx.pptxClass_5_Data_2018W_pptx.pptx
Class_5_Data_2018W_pptx.pptx
 
Risk Assessment of Social Media Use v3.01
Risk Assessment of Social Media Use v3.01Risk Assessment of Social Media Use v3.01
Risk Assessment of Social Media Use v3.01
 
Helping Developers with Privacy
Helping Developers with PrivacyHelping Developers with Privacy
Helping Developers with Privacy
 
Ethics for Conversational AI
Ethics for Conversational AIEthics for Conversational AI
Ethics for Conversational AI
 
Ethics and IA - seven deadly sins that prevent us from building a better world
Ethics and IA - seven deadly sins that prevent us from building a better worldEthics and IA - seven deadly sins that prevent us from building a better world
Ethics and IA - seven deadly sins that prevent us from building a better world
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Dox Yourself BSides Orlando
Dox Yourself BSides OrlandoDox Yourself BSides Orlando
Dox Yourself BSides Orlando
 
New Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingNew Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max Welling
 
Big Data and You
Big Data and YouBig Data and You
Big Data and You
 
Privacy & Big Data - What do they know about me?
Privacy & Big Data - What do they know about me?Privacy & Big Data - What do they know about me?
Privacy & Big Data - What do they know about me?
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
Creating a digital toolkit for users: How to teach our users how to limit the...
Creating a digital toolkit for users: How to teach our users how to limit the...Creating a digital toolkit for users: How to teach our users how to limit the...
Creating a digital toolkit for users: How to teach our users how to limit the...
 
Basic of Biometrics Technology
Basic of Biometrics Technology Basic of Biometrics Technology
Basic of Biometrics Technology
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social Web
 
Intellectual Freedom, Privacy and Social Media
Intellectual Freedom, Privacy and Social MediaIntellectual Freedom, Privacy and Social Media
Intellectual Freedom, Privacy and Social Media
 
DECEPTICONv2
DECEPTICONv2DECEPTICONv2
DECEPTICONv2
 
Management by data
Management by dataManagement by data
Management by data
 
Free geek class on Data privacy
Free geek class on Data privacyFree geek class on Data privacy
Free geek class on Data privacy
 
EthUX - ethics and ux
EthUX - ethics and uxEthUX - ethics and ux
EthUX - ethics and ux
 

More from HJ van Veen (6)

Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Tda presentation
Tda presentationTda presentation
Tda presentation
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 

Recently uploaded

Introduction of Biology in living organisms
Introduction of Biology in living organismsIntroduction of Biology in living organisms
Introduction of Biology in living organisms
soumyapottola
 
527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf
rajpreetkaur75080
 

Recently uploaded (14)

Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
 
Introduction of Biology in living organisms
Introduction of Biology in living organismsIntroduction of Biology in living organisms
Introduction of Biology in living organisms
 
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
 
527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
 
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfOracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
 
05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community Networking05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community Networking
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
 
The Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDFThe Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDF
 
Hi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptxHi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptx
 
123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptx123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptx
 
Breathing in New Life_ Part 3 05 22 2024.pptx
Breathing in New Life_ Part 3 05 22 2024.pptxBreathing in New Life_ Part 3 05 22 2024.pptx
Breathing in New Life_ Part 3 05 22 2024.pptx
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 

Ethics in Data Science and Machine Learning

  • 1. Ethics in Data Science and Machine Learning Thierry Silbermann > 3000 members in the group ! 1
  • 2. Disclaimer • Most of this presentation is taken from this class: • https://www.edx.org/course/data-science-ethics- michiganx-ds101x-1 2
  • 4. What are Ethics? • Ethics tells us about right and wrong • They are shared value / societal rule • Ethic is not law • No philosophical questions, but ethical practice of Data Science 4
  • 6. Informed Consent • Human Subject must be • Informed about the experiment • Must consent to the experiment • Voluntarily • Must have the right to withdraw consent at any time 6
  • 7. Facebook/Cornell experiment • Study of “emotional contagion” 7
  • 8. Facebook/Cornell experiment • 689,003 people experiment • One week period in 2012 • Facebook changed deliberately the content of the feed • Group A: Removed negative content from feed • Group B: Removed positive content from feed 8
  • 11. • Most of the time you don’t own the data about you. The data belongs to the company who collected it • Nevertheless we might have some control over these data that aren’t ours because they are about us Data Ownership 11
  • 12. • We need to create the principle to reason about this control and that’s the main concern of a discussion about the right to privacy • If the company goes bankrupt, the company buying it should keep the same privacy Data Ownership 12
  • 14. • Privacy, of course is the first concern that comes to so many minds when we talk about Big Data. • How do we get the value we would like by collecting, linking, and analyzing data, while at the same time avoiding the harms that can occur due to data about us being collected, linked, analyzed, and propagated? • Can we define reasonable rules that all would agree to? • Can we make these tradeoffs, knowing that maintaining anonymity is very hard? Privacy 14
  • 15. • People have different privacy boundaries. • As society adapt to new technologies, attitude change • But different boundaries doesn’t mean no boundary Privacy 15
  • 16. No option to exit • In the past, one could get a fresh start by: • moving to a new place • waiting till the past fades (reputation can rebuild over time) • Big Data is universal and never forgets • Data Science results in major asymmetries in knowledge 16
  • 17. Wayback Machine • Archives pages on the web (https://archive.org/ web/ - 300 billion pages saved over time) • almost everything that is accessible • should be retain forever • If you have an unflattering page written about you, it will survive for ever in the archive (even if the original is removed) 17
  • 18. Right to be forgotten • Laws are often written to clear a person’s record after some years. • Law in EU and Argentina since 2006 • impacts search engines (not removed completely but hard to find) 18
  • 19. Collection vs Use • Privacy is usually harmed only upon use of data • Collection is a necessary first step before use • Existence of collection can quickly lead to use • But collection without use may sometimes be right • E.g surveillance • By the time you know what you need, it is too late to go back and get it 19
  • 20. Loss of Privacy • Due to loss of control over personal data • I am ok with you having certain data about me that I have chosen to share with you or that is public, but I really do not want you to share my data in ways that I do not approve 20
  • 21. ‘Waste’ Data Collection • Your ID is taken at the club (Name, address, age) • How this data is being used ? Is it stored ? 21
  • 22. Metadata • Data about the data • Often distinguish from data content 22
  • 23. Metadata • E.g phone call, metadata includes • Caller • Callee • Time of Date of Call • Duration • Location 23
  • 24. Underestimating Analysis • A smart meter at your house can recognise “signatures” of water use every time you flush the toilet, take a shower, or wash clothes 24
  • 25. Privacy is a basic human need • Even for people who have nothing to hide 25
  • 26. Sneaky mobile App • There was a time where App didn’t tell you what kind of data they were collecting • Many app asks for far more permissions that they need • Might be used for future functionality • But most of the time just for adware • Picture management app that needs your location 26
  • 28. On the internet, nobody knows you are a dog • You can say whoever you are • You can say whatever you are • You can make up a persona • But today, we find it less and less true 28
  • 29. Many transactions need ID • You must provide an address to receive goods • You must give your name for travel booking • You must reveal your location to get cellular service • You must disclose intimate details of your health care and lifestyle to get effective medical care 29
  • 30. Facebook real name policy • Robin Kills The Enemy • Hiroko Yoda • Phuc Dat Bich (pronunced: Phoo Da Bi) https://en.wikipedia.org/wiki/Facebook_real-name_policy_controversy 30
  • 31. Enough history tells all • Search pattern for person can reveal your identity • If we have a log of all your web searches over some period, we can form a very good idea of who you are, and quite likely identify you 31
  • 32. De-identification • Given zip code, birth date and sex, about 87% of Social Security Numbers can be determined uniquely • Those three fields are usually not considered PII (Personally Identifiable Information) 32
  • 33. Netflix Prize • User_ID, Movie, Ratings, Date • Merge with data from IMDb • With only a few ratings, user could be linked across the two systems • Their movie choices could be used to determined sexual orientation, even if all their IMDb reviews revealed no such information • Bad already for only movie recommendation, so what about medical records ? 33
  • 34. Four Types of Leakage • Reveal identity • Reveal value hidden attribute • Reveal link between two entities • Reveal group membership 34
  • 35. Anonymity is Impossible • Anonymity is virtually impossible, with enough other data • Diversity of entity sets can be eliminated through joining external data • Aggregation works only if there is no known structure among entities aggregated • Face can be recognised in image data 35
  • 36. Should we prevent sharing data ? • If anonymity is not possible, the simplest way to prevent misuse is not to publish the dataset • E.g. government agencies should not make public potentially sensitive data • Yes access to data is crucial for many desirable purposes • Medical data • Public watchdog 36
  • 37. Little game • For the next questions, assume the following approximate numbers: • Population of Brazil = 210,000,000 • Number of states in Brazil: 26 • Number of zip codes in the Brazil = 100,000 • Number of days in a year = 350 • Assume also that each person lives for exactly 75 years • Finally assume that all distributions are perfectly uniform. 37
  • 38. Little game • How many people live in any one zip code in the Brazil? 38 Brazil population Nb States Nb zip Code Nb days in a year Life expectation 210,000,000 26 100,000 350 75
  • 39. Little game • How many people live in any one zip code in the Brazil? Answer: 2,100 39 Brazil population Nb States Nb zip Code Nb days in a year Life expectation 210,000,000 26 100,000 350 75
  • 40. Little game • How many people live in any one zip code in the Brazil? Answer: 2,100 • How many people in the Brazil share the same gender, zip code and birthday (but not birth year)? 40 Brazil population Nb States Nb zip Code Nb days in a year Life expectation 210,000,000 26 100,000 350 75
  • 41. Little game • How many people live in any one zip code in the Brazil? Answer: 2,100 • How many people in the Brazil share the same gender, zip code and birthday (but not birth year)? Answer: 3 41 Brazil population Nb States Nb zip Code Nb days in a year Life expectation 210,000,000 26 100,000 350 75
  • 42. Little game • How many people live in any one zip code in the Brazil? Answer: 2,100 • How many people in the Brazil share the same gender, zip code and birthday (but not birth year)? Answer: 3 • How many people in the Brazil share the same zip code and birth date (including birth year)? 42 Brazil population Nb States Nb zip Code Nb days in a year Life expectation 210,000,000 26 100,000 350 75
  • 43. Little game • How many people live in any one zip code in the Brazil? Answer: 2,100 • How many people in the Brazil share the same gender, zip code and birthday (but not birth year)? Answer: 3 • How many people in the Brazil share the same zip code and birth date (including birth year)? Answer: 0 43 Brazil population Nb States Nb zip Code Nb days in a year Life expectation 210,000,000 26 100,000 350 75
  • 45. Validity • Bad Data and Bad Models lead to bad decisions • If decision making is opaque, results can be bad in the aggregate, and catastrophic for an individual • What if someone has a loan denied because of an error in the data analysed 45
  • 46. Sources of Error 1. Choice of representative sample 2. Choice of attributes and measures 3. Errors in the Data 4. Errors in Model Design 5. Errors in Data Processing 6. Managing change 46
  • 47. 1) Choice of representative sample • Twitterverse (young, tech-savvy, richer than average population) • Should make sure that race, gender, age are well- balanced 47
  • 48. Example: Google Labelling Error • Due to poor representative sample, Google image recognition technology used to classify picture of black people as gorilla… • You rarely can think of how your model can go wrong • Most likely the training set contained very few dark faces 48
  • 49. 2) Choice of attributes and measures • Usually limited to what is available • Additional attributes can sometimes be purchased or collected • Require cost to value tradeoff • Still, need to think about missing attributes 49
  • 50. 3) Errors in the Data • In the US in 2012, FTC found that 26% of the consumers had at least one error in their credit report • 20% of these errors resulted in substantially lower credit score • Credit reporting agencies have a correction process • By 2015, about 20% still had unresolved errors 50
  • 51. Third Party Data • Material decisions can often be made on the basis of public data or data provided by third party • There are often errors in these data • Does the affected subject have a mechanism to correct errors ? • Does the affected subject even know what data were used ? 51
  • 52. Solutions • Are data sources authoritative, complete and timely ? • Will subject have access ? • How are mistakes and unintended consequences detected and corrected ? 52
  • 53. 4) Errors in Model Design • Ending up with invalid conclusions even with perfect inputs, perfect data going in… • Many ways model could be incorrect 1. Model structure and Extrapolation 2. Feature Selection 3. Ecological Fallacy 4. Simpson’s Paradox 53
  • 54. 4.1) Model Structure and Extrapolation • Most machine learning model just estimates parameters to fit a pre-determined model • Do you know the model is appropriate ? • Are you trying to fit a linear model to a complex non-linear reality ? Or the opposite ? 54
  • 58. Overfitted model: Earthquake magnitude 9 or more every 13000 years Good model: Every 300 years https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/ 58 4.1) Model Structure and Extrapolation
  • 59. 4.2) Feature selection • Did you know that taller people are more likely to grow beards? 59
  • 60. 4.2) Feature selection • Did you know that taller people are more likely to grow beards? • Women are generally shorter • They don’t grow beards 60
  • 61. 4.3) Ecological Fallacy • Analysing results for a group and assign results to individual • Districts with higher income have lower crime rates • => richer individual less likely to be a criminal 61
  • 62. 4.4) Other Game Men Women Easy University Hard University 62
  • 63. Other Game Men Women Easy University 7/10=0.7 4/5=0.8 Hard University 3/10=0.3 5/15=0.33 63
  • 64. Other Game Men Women Easy University 7/10=0.7 4/5=0.8 Hard University 3/10=0.3 5/15=0.33 All 10/20=0.5 9/20=0.45 64
  • 65. Simpson’s Paradox Men Women Easy University 7/10=0.7 4/5=0.8 Hard University 3/10=0.3 5/15=0.33 All 10/20=0.5 9/20=0.45 What happened ? Aggregate data is reflecting the combination of two separate ratios 65
  • 66. 5) Error in Data Processing • Wrong entry for example • Bug in code 66
  • 67. 6) Managing Change • System change continuously • Is analysis still valid? • Most changes may not impact analysis • But some do and we might not know which one • Famous case of “Google Flu” • Predictor worked beautifully for a while • Then crashed 67
  • 68. Campbell’s Law • “The more any quantitative social indicator (or even some qualitative indicator) is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be distort and corrupt the social processes it is intended to monitor” — Donal Campbell, 1979 68
  • 69. Equivalent for DS • If the metric are known, and they matter, people will work towards the metrics • If critical analysis inputs can be manipulated, they will be 69
  • 71. Algorithmic Fairness • Can algorithm be biased? • Can we make algorithms unbiased? Is training data set representative of the population? Is past population representative of future population? Are observed correlations due to confounding processes? 71
  • 72. Example: Algorithmic Vicious Cycle • Company has only 10% women employees • Company has “‘boys’ club culture” that makes it difficult for women to succeed • Hiring algorithm trained on current data, based on current employee success, scores women candidates lower • Company ends up hiring fewer women 72
  • 73. Bad Analysis from Good Data • Correlated attributes • Correct but misleading results • P-Hacking 73
  • 74. Racial Discrimination • Universities prohibited by law from considering race in admission • Can find surrogate features that get them close, without violating the law • Lender prohibited by law from redlining on race • Can find surrogate features • In general, proxy can be found 74
  • 75. Discrimination Intent • Big Data provides the technology to facilitate such proxy discrimination • Whether this technology is used this way becomes a matter of intent • It also provides the technology to detect and address discrimination 75
  • 83. How to manage? • https://github.com/columbia/fairtest 83
  • 84. Ossification / Echo chamber • When optimising for a metric only, we might forget about some other issues • Problem in Recommendation 84
  • 86. Correct but Misleading Results: Visualisation edition 86 http://viz.wtf/
  • 87. Correct but Misleading Results: Visualisation edition 87 http://viz.wtf/
  • 88. Correct but Misleading Results: Visualisation edition 88 http://viz.wtf/
  • 89. Correct but Misleading Results: Visualisation edition 89 http://viz.wtf/
  • 90. Correct but Misleading Results: Visualisation edition 90
  • 91. Correct but Misleading Results: Score Edition • Hotel A gets average 3.2, with a mix of mostly 3 and 4 • Hotel B gets average 3.2, with a mix of mostly 1 and 5 91
  • 92. Correct but Misleading Results: Score Edition • Hotel A gets average 4.5, based on 2 reviews • Hotel B gets average 4.4, based on 200 reviews 92
  • 93. Correct but Misleading Results: Score Edition • Hotel A gets average 4.5, based on 10 reviews • Hotel B gets average 4.4, based on 500 reviews 93
  • 94. Correct but Misleading Results: Score Edition • Hotel A gets average 4.5, based on 10 reviews • Hotel B gets average 4.4, based on 500 reviews • Hotel A has 5 rooms, while hotel B has 500 rooms. 94
  • 97. 97 Diversity Suppression (Hiring) • Use Data Science to find promising prospects • Criteria are tuned to fit the majority • Algorithm performs poorly on (some) minorities • Best minority applicants are not hired
  • 98. 98 Diversity Suppression (Medical) • Group A in Majority • Drug is found effective with suitable significance level • Patients in group B are also given this drug • Group B in majority • Drug is not found effective with sufficient significance over the whole population • Drug is not approved, even though minority (group A) patients could have benefitted from it
  • 99. P-Value Hacking • Please read extensively about it before reporting any result saying that your experiment has a p value < 0.05 • Standard p-value mathematics was developed for traditional experimental techniques where you design the experiment first and then collect the data 99
  • 100. Algorithmic Fairness Conlusion • Human have many biases No human is perfectly fair even with the best of intentions and biases are hard to detect sometime • Biases in algorithms usually easier to measure, even if outcome is no fairer 100
  • 101. Model interpretability • Have a right to understand an algorithm decision 101
  • 103. Code of Ethics • Doctor have Hippocratic oath • Journalists, lawyers, etc… • Regulation? • Trade associations • Companies lose if they annoy customers • They self-regulate 103
  • 104. Code of Ethics • Can’t just take data and spit out what the algorithm spit out • Need to understand the outcomes • Need to own the outcomes 104
  • 105. (One) Code of Ethics • Do not surprise • Who owns the data? • What can the data be used for? • What can you hide in exposed data? • Own the outcome • Is the data analysis valid? • Is the data analysis fair? • What are the societal consequences? 105
  • 106. Conclusion: Ethics Matter • Data Science has great power — to harm or to help • Data Scientists must care about how this power is used • Cannot hide behind a claim of “neutral technology” • We are all better off if we voluntarily limit how this power is used 106
  • 107. Questions? • Resources: • https://www.youtube.com/watch?v=hDgXIUM3Rmw • https://blog.conceptnet.io/2017/07/13/how-to-make-a-racist-ai- without-really-trying/ • https://www.edx.org/course/data-science-ethics-michiganx- ds101x-1 • https://www.socialcooling.com/ • http://www.fatml.org/ • http://viz.wtf 107 https://creativecommons.org/licenses/by-nc-nd/4.0/
  • 108. The rise of social cooling • Like oil leads to global warming, data leads to social cooling • If you feel you are being watched, you change your behaviour • Your data is turned into thousands of different scores. 108
  • 109. 109
  • 110. 110
  • 111. Social cooling • People are starting to realise that this ‘digital reputation’ could limit their opportunities • People are changing their behaviour to get better scores 111