How do we train AI to be Ethical and Unbiased?

HOW DO WE
TRAIN AI TO BE
ETHICAL AND UNBIASED?
MARK BORG
AI MALTA SUMMIT – 13 JULY 2018

RECENT ACHIEVEMENTS IN AI
2
Word
Error
rate
Improvements in word error rate over time on the Switchboard
conversational speech recognition benchmark.
Credit: Awni Hannun
Automated Speech Recognition results
Credit: Business Insider/Yu Han

3
Credit: H. Fang et al. (2015), “From Captions to Visual Concepts and Back”
#1 A woman holding a
camera in a crowd.
Image Captioning

4
0 days
AlphaGo Zero has no prior knowledge of the
game and only the basic rules as an input.
3 days
AlphaGo Zero surpasses the abilities of AlphaGo
Lee, the version that beat world champion Lee
Sedol in 4 out of 5 games in 2016.
21 days
AlphaGo Zero reaches the level of AlphaGo
Master, the version that defeated 60 top
professionals online and world champion Ke Jie in
3 out of 3 games in 2017.
40 days
AlphaGo Zero surpasses
all other versions of
AlphaGo and, arguably,
becomes the best Go
player in the world. It
does this entirely from
self-play, with no human
intervention and using no
historical data.
Credit: DeepMind
AlphaGo Zero

WIDESPREAD USE OF AI
• AI has now wide and deep societal influences, permeating every sphere of our lives
• No longer single applications operating in standalone mode
• ML Pipelines, more complex AI systems, operating at Internet Scale
• AI as a Service (AIaaS), Machine Learning as a Service (MLaaS)
• Running “under the hood”, as well as in “human-facing technology”
• High-stake applications, sometimes involving life-and-death decisions
➢ AI-enabled Future
➢ Benefits and Implications
5

BENEFITS AND CONCERNS OF AI
6
• What if an AI algorithm could predict death better than doctors?
• The “dying algorithm” (NY Times)
• Stanford's AI Predicts Death for Better End-of-Life Care (IEEE Spectrum)
• What are the benefits and implications of such a system?

CONCERNS
• A Predictive Policing algorithm unfairly targeted
certain neighbourhoods – Chicago 2013/2014
• Idea: to stop crime before it occurs
• Unintended consequences due to systematic bias in the
data used by these systems
• Saunders et al. (2016), “Predictions put into practice: a quasi-
experimental evaluation of Chicago’s predictive policing project”
• COMPAS assesses a defendant’s risk of re-offending
• used for bail determination by judges
• Issues of reliability and racial bias
• Dressel & Farid (2018), “The Accuracy, Fairness, and Limits of Predicting
Recividism”
7Credit: ProPublica

CONCERNS
• YouTube Recommender system
• The algorithm appears to have concluded that
people are drawn to content that is more
extreme than what they started with — or to
incendiary content in general
• Accusations that YouTube is acting as a
“radicalisation agent”
8
Credit: Covington
Recommendations drive 70%
of YouTube’s viewing time
(~200 million
recommendations per day)
YouTube tops a cumulative of 1
billion hours of video per day in
2017

CONCERNS
• Adversarial AI
9Credit: IBM
Credit: Biggio & Roli

CONCERNS
• Ethical and moral issues
• Self driving cars
10
The Trolley Problem
Credit: Waymo
(Philippa Foot, 1967)

LONG-TERM CONCERNS
• GAI, Superintelligence, existential threat, need for Benevolent AI
• The Sorcerer’s Apprentice problem
• Eliezer Yudkowsky: The Paperclip Maximiser Scenario
11
Credit: Disney
If a machine can think,
it might think more
intelligently than we do,
and then where should we
be? …
This new danger … is
certainly something which
can give us anxiety
Alan Turing, 1951
“
“

IMPLICATIONS & CONSEQUENCES OF AI
• To maximise the benefits of AI: saving lives, raising the quality of life, …
Need also to address issues and consequences
• the “rough edges of AI” – Eric Horvitz (Microsoft Research)
• Robustness, Ethics, Benevolent AI
• Short-term implications (need solving now)
• Longer term implications (prepare the groundwork…)
• Spans multiple fields: engineering, cognitive science, philosophy, etc.
12

13
AIES
ICAILEP
Conference on Artificial Intelligence:
Law, Ethics, and Policy
7008 - Standard for Ethically Driven Nudging for Robotic, Intelligent & Autonomous Systems
7009 - Standard for Fail-Safe Design of Autonomous & Semi-Autonomous Systems
7010 - Wellbeing Metrics Standard for Ethical Artificial Intelligence & Autonomous Systems

IMPLICATIONS & CONSEQUENCES OF AI
14
Benevolent AI
AI Safety
Robust AI Beneficial AI
Value Alignment
AI Ethics
Roboethics
Machine Ethics
Adversarial AI
Increasedcomplexity
AI transparency
ANI
Artificial Narrow Intelligence
AGI
Artificial General Intelligence
ASI
Artificial Super Intelligence

15
LADDER OF
CAUSATION
Credit: Judea Pearl (2018),
“The Book of Why: The New
Science of Cause and Effect”

AI SAFETY
• Data Bias (Algorithmic Bias)
• Fairness
• AI Robustness & Reliability
• AI Transparency
16

DATA BIAS
• Algorithmic Bias is NOT model bias (bias-variance trade-off, generalisation problem)
• Algorithmic Bias (or Data Bias) – will always be present; need to minimise the impact
• E.g. predictive policing algorithm
• Police-recorded datasets suffer from systematic bias:
• Non-complete census
• Not a representative random sample
• Crime databases do not measure crime; they measure some complex interaction
between criminality, policing strategy, and community-police relationships
17

DATA BIAS
• Data bias is prevalent throughout the whole field of AI
• Unintentional bias vs. intentional bias
• Addressing data bias has particular significance in ML pipelines, complex AI systems,
AIaaS, etc.
• E.g.
• Howard (2017), “Addressing Bias in Machine Learning Algorithms: A Pilot Study on
Emotion Recognition for Intelligent Systems”
• did not perform well for children
• original training dataset had few such cases
18

DATA BIAS
• Unintentional self-created bias (“poisoning your own data”)
• E.g. Google Flu Trends
• began suggesting flu-related queries to people who did not have the flu, and
thus Google Flu Trends began itself corrupting the dataset by seeding it with
excess flu-related queries, thus creating a feedback loop
• Despite good intentions, biased data can lead to a far worse result
• E.g. beauty.ai
• a startup organising the world's first AI-driven beauty contest in 2016
• The concept is to remove social biases of human judges
• problem: image samples used to train the algorithms weren’t balanced in terms
of race and ethnicity.
• so-called 'white guy problem’
19

DATA BIAS
• Naive application of algorithms to everyday problems
could amplify structural discrimination and reproduce
biases present in the data
• Detecting (automatically?) such bias and addressing it?
• Quite difficult!
• Since AI is data-driven, it’s difficult
20
Credit: Buolamwini & Gebru
• Some very recent work on two fronts:
• More balanced datasets, e.g., new facial image dataset released in February 2018 (Pilot Parliaments
Benchmark dataset)
• Buolamwini & Gebru (2018), “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender
Classification”
• Measuring bias and fairness:
• Shaikh et al. (2017), “An End-to-End Machine Learning Pipeline that Ensures Fairness Policies”
• Srivastava & Rossi (2018), “Towards Composable Bias Rating of AI Services”

DATA BIAS
• creating more balanced (heterogeneous) datasets
• One solution would be to create shared and regulated databases that are in
possession of no single entity, thus preventing any party from unilaterally manipulating
the data to their own favour.
• public datasets curated to be bias free
• One concern is that even when machine-learning systems are programmed to
be blind to race or gender, for example, they may use other signals in data
such as the location of a person’s home as a proxy for it
• E.g. In COMPAS bail system, geographic neighbourhood highly correlated to ethnicity,
thus still suffering from racial discrimination
21

AI ROBUSTNESS & RELIABILITY
• Making AI systems more robust, so that they work as intended, without failing
or getting misused?
• Reliable prediction of performance
• Avoiding overconfidence in AI systems
• How much it knows that it does not know?
• Make strong predictions that are just inaccurate
• Classification label accuracy + ROC curve
• Learning to predict confidence
• Current statistical models “tend to assume that the data that they’ll see in the future will
look a lot like the data they’ve seen in the past”
22

• Blindspots of algorithms (Eric Horvitz, Microsoft Research)
• The “unknown unknowns” (Tom Dietterich , Oregon State University)
• AI algorithms for learning and acting safely in the presence of unknown unknowns.
• learning about blindspots of algorithms:
• Lakkaraju (2016), “Identifying Unknown Unknowns in the Open World: Representations
and Policies for Guided Exploration”
• Ramakrishnan (2018), “Discovering Blind Spots in Reinforcement Learning”
• Human supervision
• human correction to prevent AI failure
23

• Watch out for Anomalies?
• Robust anomaly detection
• BugID system by Tom Dietterich,
• A system that learn when there is another unknown class out there
• automated counting of freshwater macro-invertebrates
• trained on 29 insect classes, detection of novel classes
• monitoring of performance of the system, especially for self-learning systems or local-
based learning
• E.g. Microsoft’s chatbot Tay
• research into adding a “reflection layer” into systems (introspection?)
• Failsafe designs
• Auto-pilot of self-driving cars disengaging suddenly
24

MISUSE OF AI
• Privacy Challenges
• Exclusion – denying services
• Persuasion, and Manipulation of Attention / Behaviour / Beliefs
• Harms
• Hacking of AI systems
• Adversarial AI
25

MISUSE OF AI
• Tay
• Microsoft’s Chatbot
• "The more you talk the smarter Tay gets!"
• March 2016, suspended after 16 hours
• Tay’s conversation extended to racist, inflammatory and political statements
• A main problem was Tay’s “repeat after me” feature
• Intentional misuse of AI (coordinated attack)
• Neff and Nagy (2016), “Talking to Bots: Symbiotic Agency and the Case of Tay”
26
Credit: Microsoft

MISUSE OF AI
• Harnessing AI to increase attention & engagement for a particular application or service
• large-scale personalised targeting
• Persuasion, and Manipulation of Attention / Behaviour / Beliefs
• auto twitter feed generation that persuades a user to click on links
• data-driven behaviour change
• Intentional / Unintentional
27

YOUTUBE RECOMMENDER SYSTEM
• The recommender system’s goal is to
maximise attention and engagement via
personalised targeting
• Eric Horvitz (Microsoft Research) calls this:
"Adversarial Attacks on Attention"
28
Credit: Eric Horvitz
Recommendations drive
70% of YouTube’s viewing
time (~200 million
recommendations per day)
YouTube tops a cumulative of
1 billion hours of video per
day in 2017
Recommendation system architecture demonstrating the “funnel”
where candidate videos are retrieved and ranked before
presenting only a few to the user
Covington et al. (2016), “Deep Neural Networks for YouTube
Recommendations”

• Its algorithm seems to have concluded that people are drawn to content that is more
extreme than what they started with — or to incendiary content in general
• a bias toward extreme/divisive/inflammatory/fringe/sensational content
• WSJ investigation (Feb 2018):
• amplifies human bias, fake news, isolate users in "filter bubbles"
• AlgoTransparency.org
• Zeynep Tufekci (sociologist, Univ. of North Carolina):
• Calls YouTube the “Great Radicaliser”
• AI exploiting a natural human desire to "look behind the curtain", to dig deeper into something that
engages us. As we click and click, we are carried along by the exciting sensation of uncovering more
secrets and deeper truths. YouTube leads viewers down a rabbit hole of extremism, while Google racks
up the ad sales.
29

• But is the algorithm really to blame?
• Main issue due to scale
• Also simplified human behaviour modelling:
• Watching more nuanced content or videos which diverge from the
established viewing pattern could be rooted out as noise, thus contributing
to simplification and generalization of interests towards the more extreme
ends of a spectrum, instead of complex content catering to views which
are harder to define.
• Possible solutions?
• YouTube has been applying changes to their algorithm
• Improved human behaviour models
• Changes to the exploration-exploitation strategy adopted by the
recommender system
• Value policies, encoding notion of “time well spent”
30

ADVERSARIAL AI
• Goodfellow et al. (2015), “Explaining and Harnessing Adversarial Examples”
• Szegedy (2013), “Traversing the manifold to find blind spots in the input space”
• DNN can be easily fooled by adversaries
• No need for hand-crafting the adversarial attack
• Can exploit AI to perform an adversarial attack
• One AI deceiving another AI
31
“panda”
57.7% confidence
“gibbon”
99.3% confidence
Adversarial Noise
(exaggerated)

ADVERSARIAL AI
• Adversarial systems subtly alter normal inputs such that humans doing the same task can
easily recognize what the intended input is, but mislead the AI into giving a predictable
and very different false output.
• Performed by stealth (humans won’t spot the difference)
• Potential Attacks
• Adversarial examples can be printed out on standard paper, then photographed with a
standard smartphone, and it will still fool AI systems.
• Kurakin et al. (2017), “Adversarial examples in the physical world”
32
Credit: Biggio & Roli

ADVERSARIAL AI
• The famous 3D printed Turtle that fooled
Google’s AI
• Adversarial attacks without perturbing the
whole image
33
Athalye et al. (2017), “Synthesizing Robust Adversarial Examples”
• Sharif et al. (2016), “Accessorize to a Crime: Real and
Stealthy Attacks on State-of-the-Art Face Recognition”
• Impersonation attacks
• Invisibility attacks
Credit: Sharif et al. (2016)

ADVERSARIAL AI
• Audio Adversarial attacks
• Carlini and Wagner (2018), “Audio Adversarial Examples: Targeted Attacks on Speech-to-Text”
• Given any speech audio, can produce another that is 99.9% similar to the original, but contains any text
one wants.
• Fools DeepSpeech with 100% accuracy
34Credit: IBM

ADVERSARIAL AI
• Not limited to Deep Neural Networks
• Papernot et al. (2016), “Transferability in machine learning: from phenomena to black-box
attacks using adversarial samples”
• DNNs, logistic regression, support vector machines, decision trees, nearest neighbour
classifiers, ensembles – all vulnerable to adversarial AI!
• any machine learning classifier can be tricked to give incorrect predictions, and with a little
bit of work, one can get them to give pretty much any result one wants
35

ADVERSARIAL AI
36
Adversarial AI
panda57.7%
Defended AI model
+
gibbon100%

ADVERSARIAL AI
• White-box adversarial attack
37
Adversarial AI
panda57.7%
Defended AI model
gradient
+
gibbon100%
Score-based
attack

Substitute AI model
ADVERSARIAL AI
• Black-box adversarial attack
38
Adversarial AI
panda
Defended AI model
+
gibbon
Transfer-based
attack
Decision-based
attack
scores & gradients

Some countermeasures:
DEFENDING AGAINST ADVERSARIAL AI
39
• Smoothing and hiding the gradients
• Randomisation techniques
• image compression
• Image blurring
• random image resizing
• employ dropout in neural networks
• Defensive distillation
• Use of ensembles
• Evaluate model’s adversarial resilience
• Metrics available
• Pre-emptive hardening of AI models
• Enhance robustness to tampering
Some defensive techniques:
IBM Adversarial Robustness Toolbox (ART)
https://github.com/IBM/adversarial-robustness-toolbox
Cleverhans library
https://github.com/openai/cleverhans
DeepFool
https://github.com/LTS4/DeepFool

• Fredrikson et al. (2015), “Model Inversion Attacks that Exploit Confidence Information and Basic
Countermeasures”
• Violating privacy of subjects in the training set
ADVERSARIAL AI – MODEL INVERSION ATTACKS
40
Adversarial AI
Tom
2.3%
70%
Training Set
Tom
Tom
Defended AI model
Face Recognition

• Leveraging adversarial AI to make a generative model, consisting of two neural networks
competing with each other
• The discriminator tries to distinguish genuine data from forgeries created by the generator
• The generator turns random noise into imitations of the data, in an attempt to fool the
discriminator
GENERATIVE ADVERSARIAL NETWORKS (GANS)
41
real
DiscriminatorGenerator
random
noise
fake
real fake

AI ETHICS & VALUE ALIGNMENT
• Codification of Ethics
• Values, Utility Functions
• Teaching AI to be Ethical
• Reinforcement Learning
• Inverse Reinforcement Learning and beyond
▪ Ethics – comprehending “right” from “wrong”, and behaving in a right way
▪ Value Alignment – ensuring that the goals, behaviours, values and ethics of autonomous
AI systems align with those of humans
42

CODIFICATION OF ETHICS
• Rule-based ethics (deontological ethics)
• Isaac Asimov’s “Three Laws of Robotics”
• And similar sets of rules
43
• Challenges:
• Too rigid
• Asimov’s literature addresses many of these issues:
• conflicts between the 3 laws, conflicts within a law by itself, conflicting orders, etc.
• How to codify the rules?
• How to program the notion of "harm"?
• Often human ethics and values are implicit
• Process of elicitation is very challenging
Isaac Asimov (1942)

• Pre-programming ethical rules:
• Impossible to program for every scenario
• Fail to address uncertainty and randomness
• Fail to address ambiguous cases, ethical and moral dilemmas
• Rules on their own not enough
• Must be accompanied by very strong accountability mechanisms
• Need moral conflict resolution mechanism
• Values and ethics are dependent on the socio-cultural context
• Difficult to standardise
• Need to account for changes in the values of society, shifts in beliefs, attitudes, etc.
44

45
• Rule-based ethics example:
• specifically & explicitly programme ethical values into self-driving
cars to prioritize the protection of human life above all else
• In the event of an unavoidable accident, the car should be
“prohibited to offset victims against one another”
• A car must not choose whether to kill a person based on individual
features, when a fatal crash is inescapable
Credit: BMVI (www.bmvi.de) The Trolley Problem

VALUES, UTILITY FUNCTIONS
• Ethics as Utility Functions
• Any system or person who acts or gives advice is using some value system of what is important and
what is not
• Utility-based Agent
• Agent’s Actions
• Agent’s Beliefs
• Agent’s Preferences
• The agent chooses actions based on their outcomes
• Outcomes are what the agent has preference on
• Preferences → Utility → Utility Function
• A policy specifies what an agent should do under all contingencies
• An agent wants to find an optimal policy – one that maximises its expected utility
46

TEACHING AI TO BE ETHICAL
• Teaching AI ethics, social rules and norms
• Adopt a “blank slate” approach
• Similar to how a human child learns ethics from those around him/her
• Basic values are learnt, and the AI will, in time, be able to apply those principles in unforeseen scenarios
• What machine learning method to use?
47Credit: GoodAI

48
• Reinforcement Learning
• Has shown promise in learning policies that can solve complex problems
• An agent explores its environment, performing action after action and receiving rewards and punishments
according to the reward function (i.e. utility function)
• As it repeats this, the agent will gradually learn to perform the right actions in the right states so as to
maximise its reward
• Reward = total sum of the actions’ rewards over
time, where future rewards are discounted (treated
as less valuable than present rewards)
• When learning ethics, the reward function will
reward/punish the agent depending on the choice of
action performed, whether “right” or “wrong”
Environment
Model
Reward
Function(s)
Reinforcement
Learning
Reward-
maximising
behaviour
Kose (2017), “Ethical Artificial Intelligence – An Open Question”

• Reinforcement Learning (RL) challenges:
• Difficulty in setting up ethical scenarios in the environment model of RL
• May take a very long time till the agent manages to fully cover all ethical scenarios, ambiguous cases, etc.
49
• Potential solution:
• Using stories as a way of short-circuiting the
reinforcement learning process
• Employ more complex stories as time goes by
• Riedl et al. (2016), “Using Stories to Teach Human
Values to Artificial Agents”
Environment
Model
Reward
Function(s)
Reinforcement
Learning
Kose (2017), “Ethical Artificial Intelligence – An Open Question”
Reward-
maximising
behaviour

• Another solution:
• Curriculum-based approach to improve the learning process
• The learning process in humans and animals is enhanced when scenarios are not randomly presented, but
organized in a meaningful order – gradual exposure to an increasing number of concepts, and to more
complex ones
• For teaching ethics, simpler scenarios are presented before more complex and ambiguous cases
• GoodAI’s “School for AI” project is employing a curriculum based approach for enhancing the teaching of
ethics via reinforcement learning
• www.goodai.com/school-for-ai
• Bengio et al. (2009), “Curriculum Learning”
• Weinshall et al. (2018), “Curriculum Learning by Transfer Learning: Theory and Experiments with Deep
Networks”
50

• Crowd-Sourcing Ethics and Morality
• Crowdsourced stories simplify the manually-intensive process of creating stories
• Can capture consensus for ambiguous and moral dilemmas (“wisdom of the crowds”)
• Example:
• An AI agent is given several hundred stories about stealing versus not stealing, explores different actions in a
reinforcement learning setting, and learns the consequences and optimal policy based on the rewards /
punishments given. (Mark Riedl, Georgia Tech)
51

• MIT’s “Moral Machine”:
• Crowdsourcing to aid self-driving cars
make better moral decisions in cases
of moral dilemmas (variations of the
Trolley problem)
• http://moralmachine.mit.edu
52

• Reinforcement Learning (RL) requires the manual specification of the reward function
• “Reward engineering” is hard (especially for ethics)
• May be susceptible to “reward cheating” by the AI agent
53
In RL, the reward function is specified by the
user, and then the agent does the acting
What if the agent could instead watch
someone else do the acting, and try to come
up with the reward function by itself?
Environment
Model
Reward
Function(s)
Reinforcement
Learning
Provided by the user
Reward-
maximising
behaviour

• Inverse Reinforcement Learning (IRL)
• IRL is able to learn the underlying reward function
(what is ethical?) from expert demonstrations
(humans solving ethical problems)
• IRL is also called “imitation-based learning”
• Learn from watching good behaviour
54
Reward-
maximising
behaviour
Environment
Model
Reward
Function(s)
Reinforcement
Learning
Environment
Model
Reward
Function(s)
Inverse
Reinforcement
Learning
Observed
Behaviour
Reward-
maximising
behaviour
Reinforcement
Learning

• Inverse Reinforcement Learning (IRL)
• Very promising results for AI ethics (value alignment)
• No need to explicitly model rules, reward function
• Recent works advocating IRL:
• Russell et al. (2016), “Research Priorities for Robust and
Beneficial Artificial Intelligence”
• Abel (2016), “Reinforcement Learning as a Framework for
Ethical Decision Making”
• Challenges of IRL:
• Interpretability of the auto-learnt reward function
• Human bias can creep into the observed behaviour
• Difficulty of learnt ethics to be domain independent
• Arnold (2017), “Value Alignment or Misalignment – What Will
Keep Systems Accountable?”
55
Environment
Model
Reward
Function(s)
Inverse
Reinforcement
Learning
Observed
Behaviour
Reward-
maximising
behaviour
Reinforcement
Learning

BEYOND IRL…
• Cooperative IRL
• What if we reward both the “good behaviour” of the AI while
learning ethics, as well as reward the “good teaching” of the
human?
• Cooperation between AI and humans to accomplish a shared
goal – value alignment
• Generative Adversarial Networks (GANs)
• Hadfield-Menell (2016), “Cooperative Inverse Reinforcement
Learning”
56
Environment
Model
Reward
Function(s)
Inverse
Reinforcement
Learning
Observed
Behaviour
Reward-
maximising
behaviour
Reinforcement
Learning

BEYOND IRL…
• Harnessing Counterfactuals
• … “imagination” rung on the ladder of causation
• As perfect knowledge of the world is unavailable, counterfactuals
allow for the revision of one’s belief system and the relying solely
on past (data driven) experience
• It is also through counterfactuals that one ultimately enters into
social appraisals of blame and praise
• Might prove to be one of the key technologies needed both for
the advancement of AI itself on the trajectory towards AGN, as
well as for aligning as much as possible the values of machines
with our values to achieve benevolent AI
57

BENEVOLENT AI
58
Value Alignment
AI values
our values

BENEVOLENT AI
59
our values
AI values
mutually
beneficial
values
Value Alignment
Everything we love about civilisation is a product of intelligence, so amplifying our
human intelligence with artificial intelligence has the potential of helping civilisation
flourish like never before – as long as we manage to keep the technology beneficial.
Max Tegmark, Cosmologist & President of the Future of Life Institute
“
“

How do we train AI to be Ethical and Unbiased?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How do we train AI to be Ethical and Unbiased?

Similar to How do we train AI to be Ethical and Unbiased? (20)

Recently uploaded

Recently uploaded (20)

How do we train AI to be Ethical and Unbiased?