Security of Artificial Intelligence

Sicurezza del-
l'Intelligenza Artiﬁciale
Federico Cerutti
federico.cerutti@unibs.it
1 Ottobre 2020
Background image: Shutterstock
Levi-Montalcini portrait: Age Fotostock

Agenda
Introduction to AI (for security)
• GOFAI
• ML
Guidelines
• Rules
• Guidelines for creating and using AI
Security of AI
• GOFAI
• ML
2

https://commons.wikimedia.org/wiki/File:Parasailing_Tuerkei_2015-09-15_16-33-36_-_0122.JPG

https://commons.wikimedia.org/wiki/File:HAL9000.svg https://commons.wikimedia.org/wiki/File:Terminator_in_Madame_Tussaud_London_(33465711484).jpg https://www.pxfuel.com/en/free-photo-oolxa

What are we actually talking about?
6

https://commons.wikimedia.org/wiki/File:Model_of_a_Turing_machine.jpg7

Continuously check
that the radar works
and its measure: if the
radar identiﬁes a car
less than 50 meters
away, compute the
speed of the car in
front of you and adapt
your speed.
Start
Radar Check
and Input
∃car(X) s.t.
distance(X) <
50
Adapt speed
yes
no
while (1){
i f ( radarcheck () &&
radar () < 50){
adapt ( ) ;
}
}
8

// bool v , w, x , y , z , radarcheck ;
radarcheck = f a l s e ;
i f ( w && x && ( ! x | | y ) && y && z && ( ! ( x && y & z ) | | w) && ( ! x | | !w) ) {
goto A7630 ;
} e l s e {
goto A2092 ; }
A231 : goto A0928 ;
A7630 : radarcheck = t r u e ;
A2092 : goto A231 ;
A0928 : a s s e r t ( radarcheck == f a l s e ) ;
10

http://tiny.cc/GOFAICyberINGBS20
11

! (w == FALSE | | x == FALSE | | y == FALSE | | z == FALSE | |
! (w == FALSE) && ! ( x == FALSE) | | ! ( x == FALSE) && y == FALSE | |
! ( ( ( signed i n t ) y & ( signed i n t ) z ) == 0)} && ! ( x == FALSE) && w == FALSE
)
! (w == FALSE | | x == FALSE | | y == FALSE | | z == FALSE | |
! (w == FALSE) && ! ( x == FALSE) | | ! ( x == FALSE) && y == FALSE | |
! ( y == FALSE | | z == FALSE) && ! ( x == FALSE) && w == FALSE
)
¬(¬w ∨ ¬x ∨ ¬y ∨ ¬z ∨ w ∧ x ∨ x ∧ ¬y ∨ ¬(¬y ∨ ¬z) ∧ x ∧ ¬w)
12

http://tiny.cc/GOFAICyberINGBS20
13

“
[. . . ] Particularly vexing is the realisation that the
error came from a piece of the software that was not
needed. The software involved is part of the Iner-
tial Reference System. [. . . ] After takeoff [. . . ] this
computation is useless. In the Ariane 5 flight, how-
ever, it caused an exception, which was not caught
and—boom.
The exception was due to a floatin gpoint error
during a conversion from a 64- bit floating-point value
[. . . ] to a 16-bit signed integer. [. . . ] There was no
explicit exception handler to catch the exception, so
it followed the usual fate of uncaught exceptions and
crashed the entire software, hence the onboard com-
puters, hence the [(500 million USD, ed.)] mission.,, https://www.flickr.com/photos/48213136@N06/8958839420
J. -. Jazequel and B. Meyer, “Design by contract: the lessons of Ariane,” in Computer, vol. 30, no. 1, pp.
129-130, Jan. 1997, doi: 10.1109/2.562936.
14

“
We have proved that the initial boot code running in data centers at Amazon Web
Services is memory safe [using, ed.] the C Bounded Model Cheker (CBMC). ,,
https://aws.amazon.com/security/provable-security/
http://tiny.cc/AWSINGBS20
B. Cook, et. al., “Model checking boot code from AWS data centers,” in CAV 2018, pp. 457-486, 2018
Federico Cerutti has had no interaction with AWS.
15

SAT solving is one of GOFAI success stories
16

w ∧
x ∧
y ∧
z ∧
(y ∨ ¬x) ∧
(¬w ∨ ¬x) ∧
(w ∨ ¬x ∨ ¬y ∨ ¬z)
⊤ ∧
⊤ ∧
⊤ ∧
⊤ ∧
(⊤ ∨ ⊥) ∧
(⊥ ∨ ⊥) ∧
(⊤ ∨ ⊥ ∨ ⊥ ∨ ⊥)
Unit propagation
17

Propositional SATisﬁability Problem is
NP-Complete
18

(Simpliﬁed) Needham-Schroeder Protocol
A B
1: {Na, A}Kb
2: {Na, Nb}Ka
3: {Nb}Kb
A is authenticated with B only if A has sent a fresh challenge nonce encrypted with an
appropriate key to B.
B must reply to A’s challenge with the same nonce, again encrypted with a key so that only A
can decrypt it.
All must happen in the right order.
19
R. M. Needham and M. D. Schroeder. 1978. Using encryption for authentication in large networks of computers. Commun. ACM 21, 12 (Dec. 1978), 993–999.
J. P. Delgrande, T. Grote, and A. Hunter. 2009. A General Approach to the Veriﬁcation of Cryptographic Protocols Using Answer Set Programming. In LPNMR ’09, 355–367.

authenticated (A, B, T) :− send (A, B, enc (M1, K1) , T1) ,
fresh (A, nonce (A, Na) , T1) ,
part m ( nonce (A, Na) , M1) ,
key pair (K1 , Kinv1 ) , has (A, K1 , T1) ,
has (B, Kinv1 , T1) ,
not has (C, Kinv1 , T1) : agent (C) : C != B,
send (B, A, enc (M2, K2) , T2) ,
receive (A, B, enc (M2, K2) , T) ,
key pair (K2 , Kinv2 ) , has (B, K2 , T1) ,
has (A, Kinv2 , T1) ,
not has (C, Kinv2 , T1 ) : agent (C) : C != A,
T1 < T2 , T2 < T.
receive (A, B, M, T+1) :− send (B, A, M, T) ,
not intercept (M, T+1).
20

believes (A, completed (A, B) , T) :− send (A, B,
enc (m( nonce (C, Na) ,
p r i n c i p a l (A)) ,
pub key (B)) ,
T1) ,
receive (A, B, enc (m( nonce (C, Na) , nonce (D, Nb)) , pub key (A)) , T2) ,
send (A, B, enc (m( nonce (D, Nb)) , pub key (B)) , T3) ,
not intruder (A) ,
T1 < T2 ,
T2 <= T3 ,
T3 <= T,
A != B,
C != D.
believes (A, authenticated (A, B) , T) :− believes (A, completed (A, B) , T) ,
A != B.
21

Attacker’s capabilities
It can intercept messages and it receives the messages that it intercepts
0 { intercept (M, T+1) } 1 :− send (A, B, M, T) .
receive ( I , A, M, T+1) :− send (A, B, M, T) ,
intercept (M, T+1).
Send messages whenever it wants to, also faking the sender id
1 { receive (A, B, M, T+1) : p r i n c i p a l (B) } 1 :− send ( I , A, M, T) .
22

Goals
Agents
They should both believe to be authenticated, and they should both are
goal (A, B, T) :− authenticated (A, B, T) ,
believes (A, authenticated (A, B) , T) ,
authenticated (B, A, T) ,
believes (B, authenticated (B, A) , T) .
Attacker
An agent believes to be authenticated when it is not true
attack :− believes (A, authenticated (A, B) , T) ,
not authenticated (A, B, T) .
23

Problems encoded in this way can be solved
using search algorithms very similar to the SAT
one
24

ASP handles problems up-to ∆P
3 -complete
PΣP
2 = PNPNP
25

ASP is another success story of GOFAI
26

What about probabilities?
An emerging paradigm: probabilistic logic programming
Just as an example
https://dtai.cs.kuleuven.be/problog/tutorial/various/09_airflap.html
27

These are Knowledge-based approaches
The solving algorithms are general purpose
The real value is in the encoded knowledge
28

For the avoidance of doubt, this is knowledge
w ∧
x ∧
y ∧
z ∧
(y ∨ ¬x) ∧
(¬w ∨ ¬x) ∧
(w ∨ ¬x ∨ ¬y ∨ ¬z)
. . .
authenticated (A, B, T) :−
send (A, B, enc (M1, K1) , T1) ,
f r e s h (A, nonce (A, Na) , T1) ,
k e y p a i r (K1 , Kinv1 ) ,
has (A, K1 , T1) ,
has (B, Kinv1 , T1) ,
not has (C, Kinv1 , T1) :
agent (C) : C != B,
send (B, A, enc (M2, K2) , T2) ,
r e c e i v e (A, B, enc (M2, K2) , T) ,
k e y p a i r (K2 , Kinv2 ) ,
has (B, K2 , T1) ,
has (A, Kinv2 , T1) ,
not has (C, Kinv2 , T1) :
agent (C) : C != A,
T1 < T2 , T2 < T.
. . .
. . .
0 . 7 : : wind ( weak ) ;
0 . 3 : : wind ( s t r o n g ) .
0 . 2 5 : : w i n d e f f e c t (T, −1);
0 . 5 : : w i n d e f f e c t (T, 0 ) ;
0 . 2 5 : : w i n d e f f e c t (T, 1 ) :−
wind ( weak ) .
. . .
f l a p p o s i t i o n (Time , Pos ) :−
Time > 0 ,
a t t e m p t e d f l a p p o s i t i o n (Time ,
Pos ) ,
l e g a l f l a p p o s i t i o n ( Pos ) .
. . .
29

What if we do not have knowledge?
30

Agenda
• GOFAI
• ML
Guidelines
• Rules
Security of AI
• GOFAI
• ML
32

Training set: N observations of a real-valued input variable x, x ≡ (x1, . . . , xN )T
together with
corresponding observations of the values of the real-valued target variable t, denoted
t ≡ (t1, . . . , tN )T
.
The goal is to ﬁnd a function y(x, w) as close as possible to the original function f (x) from
which we obtained the training set.
x
t
0 1
−1
0
1
33
Fig. 1.2 of C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag.
c 2006 C. M. Bishop. Permission is given to reproduce the ﬁgures for non-commercial purposes including education and research.

Let’s approximate with a polynomial with degree M
x
t
M = 0
0 1
−1
0
1
x
t
M = 1
0 1
−1
0
1
34
Fig. 1.4a and 1.4b of C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag.

Let’s approximate with a polynomial with degree M
x
t
M = 3
0 1
−1
0
1
x
t
M = 9
0 1
−1
0
1
35
Fig. 1.4c and 1.4d of C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag.

Training set: learning the parameters
Validation set: optimise model complexity
Validation set: get the performance of the ﬁnal model
36

Regression: y(x, w) ∈ Rn
Classiﬁcation: y(x, w) ∈ N
Supervised learning: training/validation/test set contain observations of the target variable
Unsupervised learning: no observations of the target variable
Semi-Supervised learning: few target variable observation, between supervised and
unsupervised
Self-supervised learning: the system uses some automatic techniques for creating some
labelling and (hopefully) improves with time
Online learning: data available in sequence
Reinforcement learning: no training/validation/test provided, just a reward function and
the ability to learn from mistakes
. . .
37

http://tiny.cc/StaticMalwareINGBS20
38

X (resp. Y ) be a discrete random variable that can take values xi with i = 1, . . . , M (resp. yj
with j = 1, . . . , L).
The probability that X will take the value xi and Y will take the value yj is written
p(X = xi , Y = yj ) and is called the joint probability of X = xi and Y = yj .
39

Sum rule or marginalisation:
p(X = xi ) =
L
j=1
p(X = xi , Y = xj )
Product rule:
p(X = xi , Y = yj ) = p(Y = yj |X = xi )p(X = xi )
Sum and product rules apply to general random variables, not only discrete ones.
40

p(X,Y )
X
Y = 2
Y = 1
p(Y )
p(X)
X X
p(X|Y = 1)
41
Fig. 1.11a–1.11d of C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag.

The weighted average of the function f (x) under a probability distribution p(x), or expectation
of f (x) is:
E[f ] =
x
p(x)f (x)
E[f ] = p(x)f (x)dx
It can be approximate from N points drawn from the distribution
E[f ] ≃
1
N
N
n=1
f (xn)
In the case of functions of several variables,
Ex [f (x, y)] =
x
p(x)f (x, y)
42

Variance of f (x)
var[f ] = E[(f (x) − E[f (x)])2
]
var[f ] = E[f (x)2
] − E[f (x)]2
43

For two random variables, the covariance expresses the extent to which they vary together, and
is deﬁned by:
cov[x, y] = Ex,y [xy] − E[x]E[y]
In the case of vectors of random variables:
cov[x, y] = Ex,y [xyT
] − E[x]E[yT
]
Note that cov[x] ≡ cov[x, x].
44

45

Bayesian Probabilities
Bayes theorem
p(Y |X) =
p(X|Y )p(Y )
p(X)
where
p(X) =
Y
p(X|Y )p(Y )
46

Suppose we randomly pick one of the boxes and from that
box we randomly select an item of fruit, and having observed
which sort of fruit it is we replace it in the box from which it
came.
We could imagine repeating this process many times. Let us
suppose that in so doing we pick the red box 40% of the time
and we pick the blue box 60% of the time, and that when we
remove an item of fruit from a box we are equally likely to
select any of the pieces of fruit in the box.
We are told that a piece of fruit has been selected and it is an orange.
Which box does it came from?
47

p(B = r|F = o) =
p(F = o|B = r)p(B = r)
p(F = o)
=
=
6
8 · 4
10
6
8 · 4
10 + 1
4 · 6
10
=
=
3
4
·
2
5
·
20
9
=
2
3
48

The goal in classification is to take an input vector x and to assign it to one of K discrete
classes Ck .
The input space is thus divided into decision regions whose boundaries are called decision
boundaries or decision surfaces.
Consider first the case of two classes. The posterior probability for class C1 can be written as:
p(C1|x) =
p(x|C1)p(C1)
p(x|C1)p(C1) + p(x|C2)p(C2)
=
1
1 + exp (−a)
= σ(a)
with
a = ln
p(x|C1)p(C1)
p(x|C2)p(C2)
and σ(a) is the logistic sigmoid function defined by
σ(a) =
1
1 + exp (−a)
In the case of more than two classes we obtain the softmax function.
49

−10 −5 0 5 10
0.5
1
σ(−a) = 1 − σ(a)
d
da
σ(a) = σ(a)(1 − σ(a))
50

The neural network model is a nonlinear function from a set of input variables {xi } to a set of
output variables {yk } controlled by a vector w of adjustable parameters.
x0
x1
xD
z0
z1
zM
y1
yK
w
(1)
MD
w
(2)
KM
w
(2)
10
hidden units
inputs outputs
Hidden units zj = h(aj ) with aj activations and
aj =
D
i=1
w
(1)
ji xi + w
(1)
j0
Assuming a sigmoid output function:
yk (x, w) = σ


M
j=1
w
(2)
kj h
D
i=1
w
(1)
ji xi + w
(1)
j0 + w
(2)
k0


51

The nonlinear function h(·) can be sigmoidal functions such as the logistic sigmoid, but also
the rectiﬁer linear unit (ReLU)
ReLU(x) = max {0, x}
−4 −2 0 2 4
0
2
4
6
52

Given q ≥ 1 an integer, K = Rq
be a compact space,∗
f : K → R be continuous, h : R → R
be continuous but not polynomial. Then for every ε > 0 there exist N ≥ 1, ak , bk ∈ R,
wk ∈ Rq
such that:
max
x∈K
f (x) −
N
k=1
ak h(wk x + bk ) ≤ ε
Pinkus, A. (1999). “Approximation theory of the MLP model in neural networks.” In:Acta Numerica,
pp. 143-195. doi:10.1017/1190S0962492900002919.
∗
A compact space contains all its limit points and has all its points lying within some ﬁxed distance of each
other.
53

Parameters Optimisation
Given a training set comprising a set of input vectors sec xn, n = 1, . . . , N, and a corresponding
set of target vectors {tn}, we want to minimise an error function E(w).
54

w1
w2
E(w)
wA wB wC
∇E
First note that if we make a small step in weight
space from w to w + δw then the change in the
error function is δE ≃ δwT
∇E(w), where the
vector ∇E(w) points in the direction of greatest
rate of increase of the error function.
w(τ+1)
= w(τ)
+ ∆w(τ)
where τ labels the iteration step.
The simplest approach to using gradient information
is to choose the weight update so that:
w(τ+1)
= w(τ)
− η∇E(w(τ)
)
where the parameter η > 0 is the learning rate.
55

Batch methods: at each step the weight vector is moved in the direction of the greatest rate of
decrease of the error function, and so this approach is known as gradient descent or steepest
descent.
On-line version aka sequential gradient descent or stochastic gradient descent of gradient
descent: error functions based on maximum likelihood for a set of independent observations
comprise a sum of terms, one for each data point:
E(w) =
N
n=1
En(w)
Update to the weight vector based on one data point at the time, so that
w(τ+1)
= w(τ)
− η∇En(wτ
)
56

value of δ for a particular hidden unit can be obtained by propagating the δ’s backward from
units higher up in the network.
zi
zj
δj
δk
δ1
wji wkj
57

58

Generative models
Determine p(x|Ck ) for each
class Ck individually; then
infer the prior class
probabilities; then use
Bayes’ theorem to ﬁnd the
posterior probabilities.
Alternatively, obtain the
posterior probabilities from
the joint distribution
p(x, Ck ).
Analogously for regression.
Discriminative models
Model directly the posterior
class probabilities p(Ck |x)
(analogously p(t|x))
without computing the joint
distribution.
Direct
Find a discriminant function
f (x) which maps each input
x directly onto a class label.
Analogously, ﬁnd a
regression function y(x)
directly from the training
data.
59

p(x|C1)
p(x|C2)
x
classdensities
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
Generative
x
p(C1|x) p(C2|x)
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
1.2
Discriminative/Direct
60
Fig. 1.27a-b of C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag.

x
p(C1|x) p(C2|x)
0.0
1.0
θ
reject region
61

The no free lunch theorem for machine learning†
states that, averaged over all possible data
generating distributions, every classiﬁcation (resp. regression) model has the same error rate
when dealing with previously unobserved points. In other words, no model is universally any
better than any other.
These results hold only when we average over all possible data generating distributions.
However, if we can make assumptions about the kinds of distributions we encounter in
real-world applications, then we can design models that perform well on these distributions.
†
Wolpert, D. H. and Macready, W. G. (1997). “No free lunch theorems for optimization.” In: IEEE
transactions on evolutionary computation 1.1, pp. 67-82.
62

Predicted
Complex Event
Ground truth
Gradient back-propagation
M. Roig Vilamala, H. Taylor, T. Xing, L. Garcia, M. Srivastava, L. Kaplan, A. Preece, A. Kimming, F. Cerutti.
A Hybrid Neuro-Symbolic Approach for Complex Event Processing. ICLP 2020.
https://arxiv.org/abs/2009.03420

Agenda
• GOFAI
• ML
Guidelines
• Rules
Security of AI
• GOFAI
• ML
65

GDPR–AI: Conceptual framework
4(1) Personal data
Identification • Re-identification (e.g. pseudo-anonimity) • Identifiability (e.g. fusion
with external datasources)
4(2) Profiling
Inferred personal data is personal data • Data subjects have the right to rectification
independently of whether inferred information is verifiable or statistic
4(11) Consent
Specificity • Granularity • Freedom (and the problem of clear imbalance)
67
https://www.europarl.europa.eu/thinktank/en/document.html?reference=EPRS_STU(2020)641530

GDPR–AI: Data protection principles
5(1)(a) Fairness, transparency
Data subjects should not be misled • Inference should also be fair (veriﬁable, etc. . . )
5(1)(b) Purpose limitation
New purposes for data must be compatible • Personal data in training set/model
• Personal data aﬀecting personalised inferences
5(1)(c) Data minimisation
5(1)(d) Accuracy
Personal data must be accurate
5(1)(e) Storage limitation
68

GDPR–AI: Information duties
13/14 Data subjects need to receive relevant information
Information on automated decision-making
Existence of automated decision-making, including proﬁling • Meaningful information
about the logic involved and the envisaged consequences of such processing for the data
subject
69

GDPR–AI: Data subjects’ rights
15 The right to access
A little ambiguous and it seems not to imply the need for individualised explanation of
automated assessment and decisions
17 The right to be forgotten
Delete data used for constructing a model, albeit not deleting the model if shown not to
have personal data any longer
19 The right to object
Objecting to proﬁling and direct marketing • Objecting to research and statistical
purposes (except for reasons of public interests)
70

GDPR–AI: Automated decision-making
22(1-2) Prohibition of automated decisions
The data subject shall have the right not to be subject to a decision based solely
on automated processing, including profiling, which produces legal effects concerning
him or her or similarly significantly affects him or her.
Exceptions: Necessary for entering into or performing a contract • Authorised by
EU/member state law with counterbalances • Based on data subject’s explicit consent
22(3) Safeguard measures
the data controller shall implement suitable measures to safeguard the data sub-
ject’s rights and freedoms and legitimate interests, at least the right to obtain human
intervention on the part of the controller, to express his or her point of view and to
contest the decision
22(4) No automated decision on sensitive data
Unless with explicit consent or in the interest of public interest
71

GDPR–AI: Privacy
24 Responsibility of data controller
implement appropriate technical and organisational measures to ensure and to be
able to demonstrate that processing is performed in accordance with this Regulation
25(1) Data protection by design and privacy by default
25(2) Data minimisation
35-36 Data protection impact assessment
In presence of high-risk, ask the supervisory authority (national data protection authority)
37 Need for data protection oﬃcers
40-43 Codes of conduct and certiﬁcation
72

Agenda
• GOFAI
• ML
Guidelines
• Rules
Security of AI
• GOFAI
• ML
73

1. IEEE Ethics in Action and IEEE 7000
The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. Ethically
Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and
Intelligent Systems, First Edition. IEEE, 2019.
https://ethicsinaction.ieee.org/
2. EU coordinated plan 2018 and whitepaper 2020
https://ec.europa.eu/digital-single-market/en/news/
coordinated-plan-artificial-intelligence
https://ec.europa.eu/info/publications/
white-paper-artificial-intelligence-european-approach-excellence-and-trust_
en
74

IEEE Global Initiative on Ethics of A/IS: Principles
1. Human Rights: A/IS‡
shall be created and operated to respect, promote, and protect
internationally recognized human rights.
2. Well-being: A/IS creators shall adopt increased human well-being as a primary success
criterion for development.
3. Data Agency: A/IS creators shall empower individuals with the ability to access and
securely share their data, to maintain people’s capacity to have control over their identity.
4. Effectiveness: A/IS creators and operators shall provide evidence of the effectiveness and
fitness for purpose of A/IS.
‡
Autonomous and Intelligent Systems
76

IEEE Global Initiative on Ethics of A/IS: Principles (cont.)
5. Transparency: The basis of a particular A/IS decision should always be discoverable.
6. Accountability: A/IS shall be created and operated to provide an unambiguous rationale
for all decisions made.
7. Awareness of Misuse: A/IS creators shall guard against all potential misuses and risks of
A/IS in operation.
8. Competence: A/IS creators shall specify and operators shall adhere to the knowledge and
skill required for safe and eﬀective operation.
77

Autonomous and Intelligent System
78

IEEE Global Initiative on Ethics suggests adoption of Millian ethics to overcome general
assumptions of anthropomorphism in A/IS
Determinism: human actions follow necessarily from antecedent conditions and psychological
laws. Laplace’s demon could predict perfectly human behaviour.
However, for example:
1. Antecedent conditions include the education we received
2. We can therefore modify our future actions by self-education
Doctrine of free will: “our will, by inﬂuencing some of our circumstances, can modify our
future habits or capabilities of willing”§
§
J. S. Mill. “Autobiography”. 1873.
79

Which Ethics
Traditional ethics: virtues or moral character (Plato, Aristotle); deontology or duties and
rules (Kant); utilitarianism or consequence of actions (Mill)
Feminist ethics or ethics of care (Noddings): should we care for A/IS?
Globally diverse traditions: Buddhist/Ubuntu/Shinto-Inﬂuenced Ethical tradition and their
role in A/IS
80

Policy framework
1. Ensure that A/IS support, promote, and enable internationally recognised legal norms.
2. Develop government expertise in A/IS.
3. Ensure governance and ethics are core components in A/IS research, development,
acquisition, and use
4. Create policies for A/IS to ensure public safety and responsible A/IS design.
5. Educate the public on the ethical and societal impacts of A/IS.
81

7000. Model Process for Addressing Ethical Concerns During System Design
7001. Transparency of Autonomous Systems
7002. Data Privacy Process
7003. Algorithmic Bias Considerations
7004. Standard on Child and Student Data Governance
7005. Standard on Employer Data Governance
7006. Standard on Personal Data AI Agent Working Group
7007. Ontological Standard for Ethically driven Robotics and Automation Systems
7008. Standard for Ethically Driven Nudging for Robotic, Intelligent and Autonomous Systems
7009. Standard for Fail-Safe Design of Autonomous and Semi-Autonomous Systems
7010. IEEE Recommended Practice for Assessing the Impact of Autonomous and Intelligent
Systems on Human Well-Being
7011. Standard for the Process of Identifying & Rating the Trust-worthiness of News Sources
7012. Standard for Machine Readable Personal Privacy Terms
7013. Standard for Inclusion and Application Standards for Automated Facial Analysis
Technology
7014. Standard for Ethical considerations in Emulated Empathy in Autonomous and Intelligent
Systems
82
https://ethicsinaction.ieee.org/p7000/

IEEE 7010-2020. Well-being Impact Assessment: an iterative process
1. Internal analysis and user and stakeholder engagement
The nature of the AI system • The needs it meets or problems it solves • Who the
users (intended and unintended) are • Who the broader stakeholders might be • The
likelihood of possible positive and negative impacts, and how can they be mitigated.
2. Development and refinement of well-being indicators dashboard
Twelve domains: affect, community, culture, education, economy, environment, health,
human settlements, government, psychological/mental well-being, and work.
3. Data planning and collection
Collection of both baseline data and data over time, allowing changes in well-being
indicators to be assessed over time
4. Data analysis and improvement to AI
Analysis helps determine if an AI does have negative impacts, or if efforts to mitigate
negative impacts or increase positive impacts are successful. Importantly, analysis then
feeds into improvements to AI design, development, assessment, monitoring, and
management.
5. Iteration
83

https://www.ﬂickr.com/photos/9561097@N08/18898145409/

“
For AI made in Europe one key principle will be ethics by design whereby ethical and
legal principles, on the basis of the General Data Protection Regulation, competition
law compliance, absence of data bias are implemented since the beginning of the design
process. When defining the operational requirements, it is also important to take into
account the interactions between humans and AI systems.
Another key principle will be security by design, whereby cybersecurity, the protec-
tion of victims and the facilitation of law enforcement activities should be taken into
account from the beginning of the design process. ,,
85 European Commission, 2018. Annex to COM(2018) 795 final - Coordinated Plan on Artificial Intelligence

AI systems need to be human-centric, resting on a commitment to their use in the service of
humanity and the common good, with the goal of improving human welfare and freedom. We
therefore identify Trustworthy AI as our foundational ambition, since human beings and
communities will only be able to have conﬁdence in the technology’s development.
86
EUROPEAN COMMISSION, 2019. High-Level Expert Group on Artiﬁcial Intelligence.

Trustworthy AI has three components, which should be met throughout the system’s entire life
cycle:
1. it should be lawful, complying with all applicable laws and regulations;
2. it should be ethical, ensuring adherence to ethical principles and values; and
3. it should be robust, both from a technical and social perspective, since, even with good
intentions, AI systems can cause unintentional harm.
Even if an ethical purpose is ensured, individuals and society must also be conﬁdent that
AI systems will not cause any unintentional harm. Such systems should perform in a safe,
secure and reliable manner, and safeguards should be foreseen to prevent any unintended
adverse impacts. It is therefore important to ensure that AI systems are robust.
87

Ethical Principles
• Respect for human autonomy
• Prevention of harm
AI systems should neither cause nor exacerbate harm or otherwise adversely affect human
beings. [...] They must be technically robust and it should be ensured that they are not
open to malicious use. Note that the principle of prevention of harm and the principle of
human autonomy may be in conflict.
• Fairness
• Explicability
Explicability is crucial for building and maintaining users’ trust in AI systems. This means
that processes need to be transparent, the capabilities and purpose of AI systems openly
communicated, and decisions—to the extent possible—explainable to those directly and
indirectly affected. Without such information, a decision cannot be duly contested. An
explanation as to why a model has generated a particular output or decision (and what
combination of input factors contributed to that) is not always possible.
88

Requirements of Trustworthy AI
Human agency and oversight
Technical robustness and safety
Privacy and data governance
Transparency
Diversity, non-discrimination, and fairness
Societal and environmental wellbeing
Accountability
89

Technical Robustness and Safety: Resilience to attack and security
AI systems, like all software systems, should be protected against vulnerabilities that can allow
them to be exploited by adversaries, e.g. hacking.
Attacks may target the data (data poisoning), the model (model leakage) or the underlying
infrastructure, both software and hardware. If an AI system is attacked, e.g. in adversarial
attacks, the data as well as system behaviour can be changed, leading the system to make
diﬀerent decisions, or causing it to shut down altogether.
Systems and data can also become corrupted by malicious intention or by exposure to
unexpected situations.
Insuﬃcient security processes can also result in erroneous decisions or even physical harm.
For AI systems to be considered secure, possible unintended applications of the AI system (e.g.
dual-use applications) and potential abuse of the system by malicious actors should be taken
into account, and steps should be taken to prevent and mitigate these.
90

Technical Robustness and Safety: Fallback plan and general safety
AI systems should have safeguards that enable a fallback plan in case of problems. This can
mean that AI systems switch from a statistical to rule-based procedure, or that they ask for a
human operator before continuing their action.
It must be ensured that the system will do what it is supposed to do without harming living
beings or the environment. This includes the minimisation of unintended consequences and
errors.
In addition, processes to clarify and assess potential risks associated with the use of AI systems,
across various application areas, should be established. The level of safety measures required
depends on the magnitude of the risk posed by an AI system, which in turn depends on the
system’s capabilities.
Where it can be foreseen that the development process or the system itself will pose
particularly high risks, it is crucial for safety measures to be developed and tested proactively.
91

Technical Robustness and Safety: Accuracy
Accuracy pertains to an AI system’s ability to make correct judgements, for example to
correctly classify information into the proper categories, or its ability to make correct
predictions, recommendations, or decisions based on data or models.
An explicit and well-formed development and evaluation process can support, mitigate and
correct unintended risks from inaccurate predictions. When occasional inaccurate predictions
cannot be avoided, it is important that the system can indicate how likely these errors are. A
high level of accuracy is especially crucial in situations where the AI system directly aﬀects
human lives.
92

Technical Robustness and Safety: Reliability and Reproducibility
It is critical that the results of AI systems are reproducible, as well as reliable. A reliable AI
system is one that works properly with a range of inputs and in a range of situations. This is
needed to scrutinise an AI system and to prevent unintended harms.
Reproducibility describes whether an AI experiment exhibits the same behaviour when repeated
under the same conditions. This enables scientists and policy makers to accurately describe
what AI systems do. Replication ﬁles can facilitate the process of testing and reproducing
behaviours.
93

Transparency: Traceability
The data sets and the processes that yield the AI system’s decision, including those of data
gathering and data labelling as well as the algorithms used, should be documented to the best
possible standard to allow for traceability and an increase in transparency.
This also applies to the decisions made by the AI system.
This enables identiﬁcation of the reasons why an AI-decision was erroneous which, in turn,
could help prevent future mistakes. Traceability facilitates auditability as well as explainability.
94

Transparency: Explainability
Explainability concerns the ability to explain both the technical processes of an AI system and
the related human decisions (e.g. application areas of a system).
Technical explainability requires that the decisions made by an AI system can be understood
and traced by human beings.
Moreover, trade-offs might have to be made between enhancing a system’s explainability (which
may reduce its accuracy) or increasing its accuracy (at the cost of explainability). Whenever an
AI system has a significant impact on people’s lives, it should be possible to demand a suitable
explanation of the AI system’s decision-making process. Such explanation should be timely and
adapted to the expertise of the stakeholder concerned (e.g. layperson, regulator or researcher).
In addition, explanations of the degree to which an AI system influences and shapes the
organisational decision-making process, design choices of the system, and the rationale for
deploying it, should be available (hence ensuring business model transparency).
95

Transparency: Communication
AI systems should not represent themselves as humans to users; humans have the right to be
informed that they are interacting with an AI system. This entails that AI systems must be
identiﬁable as such.
In addition, the option to decide against this interaction in favour of human interaction should
be provided where needed to ensure compliance with fundamental rights. Beyond this, the AI
system’s capabilities and limitations should be communicated to AI practitioners or end-users in
a manner appropriate to the use case at hand. This could encompass communication of the AI
system’s level of accuracy, as well as its limitations.
96

Initially: Make clear what the system can do • Make clear how well the system can do
what it can do
During interaction: Time services based on context • Show contextually relevant
information • Match relevant social norms • Mitigate social biases
When wrong: Support efficient invocation • Support efficient dismissal • Support
efficient correction • Scope services, when in doubt • Make clear why the system did
what it did
Over time: Remember recent interactions • Learn from user behaviour • Update and
adapt cautiously • Encourage granular feedback • Convey the consequences of user
actions • Provide global controls • Notify users about changes
https://aka.ms/aiguidelines
§
S. Amershi et. al., “Guidelines for Human-AI Interaction,” CHI 2019
97

“
Even in simple collaboration scenarios, e.g. those in which an AI system assists a
human operator with predictions, the success of the team hinges on the human correctly
deciding when to follow the recommendations of the AI system and when to override
them. [. . . ]
Extracting beneﬁts from collaboration with the AI system depends on the human
developing insights (i.e., a mental model) of when to trust the AI system with its
recommendations. [. . . ]
If the human mistakenly trusts the AI system in regions where it is likely to err,
catastrophic failures may occur. ,,
§
Bansal, Gagan, et al. “Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance.”
AAAI Conference on Human Computation and Crowdsourcing. 2019.
98

Misclassiﬁcation of the white side of a trailer as bright sky: this caused a car operating with
automated vehicle control systems (level 2) to crash against a tractor-semitrailer truck near
Williston, Florida, USA on 7th May 2016.
The car driver died due to the sustained injury.
The car manufacturer stated that the “camera failed to recognize the white truck against a
bright sky.”¶
¶
http://tiny.cc/2tb4uy
99

Agenda
• GOFAI
• ML
Guidelines
• Rules
Security of AI
• GOFAI
• ML
101

Knowledge + Search Algorithm
102

Quality of Knowledge
Accuracy
Syntactic • Semantic • Currency • Timeliness • Trustworthiness • Provenance
Completeness
Complete and relevant
Conciseness
Understandability
By humans and machine alike
Accessibility
Availability • Licensing • Interoperability
Consistency
¶
Batini, C., Scannapieco, M. et al.. Data and information quality. Springer (2016).
103

(1) 1 = 0
(2) 3 = 0 (multiplying both sides of (1) by 3)
(3) π = 0 (multiplying both sides of (1) by π)
(4) π = 3 (from (2) and (3))
105

“
If I am a rock (r) then light travels at 1 metres per second (l) ,,
(1) r ⊃ l ≡ ¬r ∨ l
(2) r ≡ ⊥ (I am not a rock)
(3) l ≡ ⊤ makes (1) true
¬ r ∨ l
⊥ ⊤ ⊤ ⊤
⊥ ⊤ ⊥ ⊥
⊤ ⊥ ⊤ ⊤
⊤ ⊥ ⊤ ⊥
106

“
All the elephants in the room are pink ,,
(1) ∀Xelephant(X) ⊃ pink(X)
(2) If ∀Xelephant(X) = ⊥, then (1) is trivially true (vacuous truth)
107

“
Fish identiﬁed previously from Sri Lanka as P. amphibius (Pethiyagoda 1991), are
now recognized as an endemic species, P. kamalika (Silva et al. 2008), which is re-
stricted to the wet zone. ,,
P. amphibius
FishXYZ
Pethiyagoda, 1991
¶
Bahir, M., & Gabadage, D. (2009). Taxonomic and scientiﬁc inaccuracies in a consultancy report on
biodiversity: a cautionary note. Journal of Threatened Taxa, 1(6), 317-322.
108

“
Fish identiﬁed previously from Sri Lanka as P. amphibius (Pethiyagoda 1991), are
now recognized as an endemic species, P. kamalika (Silva et al. 2008), which is re-
stricted to the wet zone. ,,
P. amphibius
FishXYZ
Pethiyagoda, 1991
Silva et al. 2008
P. kamalika
¶
109

“
P. ticto (Hamilton, 1822) was described from Bengal. Deraniyagala (1956) gave a
name to the P. ticto like ﬁsh in Sri Lanka, describing it as P. ticto melanomaculatus.
At present P. ticto melanomaculatus is not recognized as a valid taxon (Pethiyagoda
1991). Recent molecular investigations show marked diﬀerences between P. ticto and
P. melanomaculatus (Meegaskumbura et al. 2008). ,,
P. ticto
P.ticto
melanomaculatus
Deraniyagala 1956
¶
110

“
P. ticto
P.ticto
melanomaculatus
Deraniyagala 1956
Pethiyagoda, 1991
¶
111

“
P. ticto
P.ticto
melanomaculatus
Deraniyagala 1956
Pethiyagoda, 1991
Meegaskumbura et al., 2008
¶
112

Agenda
• GOFAI
• ML
Guidelines
• Rules
Security of AI
• GOFAI
• ML
114

Training Phase Attacks
Attacks during training time attempt to influence or corrupt the model directly by altering the
dataset used for training.
• Data Injection: The adversary does not have any access to the training data as well as to
the learning algorithm but has ability to augment a new data to the training set. He can
corrupt the target model by inserting adversarial samples into the training dataset.
• Data Modification: The adversary does not have access to the learning algorithm but has
full access to the training data. He poisons the training data directly by modifying the
data before it is used for training the target model.
• Logic Corruption: The adversary has the ability to meddle with the learning algorithm.
These attacks are referred as logic corruption. Apparently, it becomes very difficult to
design counter strategy against these adversaries who can alter the learning logic, thereby
controlling the model itself.
115

Testing Phase Attacks
Adversarial attacks at the testing time do not tamper with the targeted model but rather forces
it to produce incorrect outputs. The eﬀectiveness of such attacks is determined mainly by the
amount of information available to the adversary about the model.
Testing phase attacks can be broadly classiﬁed into either White-Box or Black-Box attacks.
116

Testing Phase Attacks: White-Box Attacks
In white-box attack on a machine learning model, an adversary has total knowledge about the
model used for classiﬁcation (e.g., type of neural network along with number of layers).
The attacker has information about the algorithm used in training (e.g., gradient-descent
optimization) and can access the training data distribution.
He also knows the parameters of the fully trained model architecture.
The adversary utilizes available information to identify the feature space where the model may
be vulnerable, i.e, for which the model has a high error rate.
Then the model is exploited by altering an input using adversarial example crafting method
(more later on).
117

Black-box Attacks: transferrability
Adversarial sample transferability is the property that adversarial samples produced by training on a
specific model can affect another model, even if they have different architectures.
In black-box attacks, the adversary does not have access to the target model F, and thus train a
substitute model F′
locally to generate adversarial example X + δX which then can be transfered to
the victim model.
1. Intra-technique transferability: If models F and F′
are both trained using same machine learning
technique (e.g. both are NN or SVM)
2. Cross-technique transferability: If learning technique in F and F′
are different, for example, F is a
neural network and F′
is a SVM.
The attacks have been shown to generalize to non-differentiable target models, like SVMs. Therefore,
differentiable models such as neural networks or logistic regression can be used to learn a substitute
model for models trained with SVM or nearest neighbours.
¶
Papernot, Nicolas, Patrick McDaniel, and Ian Goodfellow. “Transferability in machine learning: from
phenomena to black-box attacks using adversarial samples.” arXiv preprint arXiv:1605.07277 (2016).
118

Testing Phase Attacks: Black-Box Attacks
Non-Adaptive Black-Box Attack
For a target model (f ), a non-adaptive black-box adversary only gets access to the target
model’s training data distribution µ.
The adversary then chooses a training procedure for a model architecture f ′
and trains a local
model over samples from the data distribution µ to approximate the model learned by the
target classiﬁer.
The adversary crafts adversarial examples on the local model f ′
using white-box attack
strategies and applies these crafted inputs to the target model to force mis-classiﬁcations.
119

Adaptive Black-Box Attack
For a target model (f ), an adaptive black-box adversary does not have any information
regarding the training process but can access the target model as an oracle (analogous to
chosen-plaintext attack in cryptography).
The adversary issues adaptive oracle queries to the target model and labels a carefully selected
dataset, i.e., for any arbitrarily chosen x the adversary obtains its label y by querying the target
model f .
The adversary then chooses a procedure train′
and model architecture f ′
to train a surrogate
model over tuples (x, y) obtained from querying the target model.
The surrogate model then produces adversarial samples by following white-box attack
technique for forcing the target model to mis-classify malicious data.
120

Strict Black-Box Attack
A black-box adversary sometimes may not contain the data distribution µ but has the ability to
collect the input-output pairs (x, y) from the target classiﬁer.
However, he can not change the inputs to observe the changes in output like an adaptive
attack procedure.
This strategy is analogous to the known-plaintext attack in cryptography and would most likely
to be successful for a large set of input-output pairs.
121

Adversary Goals
• Confidence Reduction: The adversary tries to reduce the confidence of prediction for the
target model. For example, a legitimate image of a ‘stop’ sign can be predicted with a
lower confidence having a lesser probability of class belongingness.
• Misclassification: The adversary tries to alter the output classification of an input example
to any class different from the original class. For example, a legitimate image of a ‘stop’
sign will be predicted as any other class different from the class of stop sign.
• Targeted Misclassification: The adversary tries to produce inputs that force the output of
the classification model to be a specific target class. For example, any input image to the
classification model will be predicted as a class of images having ‘go’ sign.
• Source/Target Misclassification: The adversary attempts to force the output of
classification for a specific input to be a particular target class. For example, the input
image of ‘stop’ sign will be predicted as ‘go’ sign by the classification model.
122

• Exploratory Attack: These attacks do not inﬂuence training dataset. Given black box
access to the model, they try to gain as much knowledge as possible about the learning
algorithm of the underlying system and pattern in training data.
• Evasion Attack: This is the most common type of attack in the adversarial setting. The
adversary tries to evade the system by adjusting malicious samples during testing phase.
This setting does not assume any inﬂuence over the training data.
• Poisoning Attack: This type of attack, known as contamination of the training data, takes
place during the training time of the machine learning model. An adversary tries to poison
the training data by injecting carefully designed samples to compromise the whole learning
process eventually.
123

Exploratory Attacks: Model Inversion Attack
Fredrikson et al. consider a linear regression model f that predicted drug dosage using patient
information, medical history and genetic markers; given white-box access to model f and an
instance of data (X = {x1, x2, ..., xn}, y), model inversion infers genetic marker x1.
¶
Fredrikson, Matthew, et al. “Privacy in pharmacogenetics: An end-to-end case study of personalized
warfarin dosing.” 23rd USENIX Security Symposium (USENIX Security 14). 2014.
124

An attacker can produce a recognizable image of a person, given only API access to a facial
recognition system and the name of the person whose face is recognized by it.
¶
Fredrikson, Matt, Somesh Jha, and Thomas Ristenpart. “Model inversion attacks that exploit conﬁdence
information and basic countermeasures.” Proceedings of the 22nd ACM SIGSAC Conference on Computer and
Communications Security. 2015.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies
bear this notice and the full citation.
125

https://pixabay.com/illustrations/poison-bottle-medicine-old-symbol-1481596/

Poisoning Attacks: Adversarial Examples Generation
• Label Manipulation: The adversary has the capability to modify the training labels only,
and he obtains the most vulnerable label given the full or partial knowledge of the learning
model.
A basic approach is to randomly perturb the labels, i.e., select new labels for a subset of
training data by picking from a random distribution.
• Input Manipulation: In this scenario, the adversary is more powerful and can corrupt the
input features of training points analyzed by the learning algorithm, in addition to its
labels.
This scenario also assumes that the adversary has the knowledge of the learning algorithm.
127

Evasion Attacks
• White-box attacks: two steps
1. Direction Sensitivity Estimation
2. Perturbation Selection
• Black-bock attacks
129

White Box attacks. Step 1: Direction Sensitivity Estimation
The adversary evaluates the sensitivity of a class change to each input feature by identifying
directions in the data manifold around sample X in which the model F is most sensitive and
likely to result in a class change
130

Fast Gradient Method (FGM): calculates the gradient of the cost function with respect to the
input of the neural network. The adversarial examples are generated using the following
equation:
X∗ = X + ǫ ∗ sign(∇x J(X, ytrue))
Here, J is the cost function of the trained model, ∇x denotes the gradient of the model with
respect to a normal sample X with correct label ytrue, and ǫ denotes the input variation
parameter which controls the perturbation’s amplitude.
¶
I. Goodfellow, J. Shlens, C. Szegedy 2015 Explaining and Harnessing Adversarial Examples. In ICLR 2015.
131

White Box attacks. Step 2: Perturbation Selection
The adversary then exploits the knowledge of sensitive information to select a perturbation δX
among the input dimensions in order to obtain an adversarial perturbation which is most
eﬃcient.
132

Perturb all the input dimensions with a small quantity in the direction of the sign of the
gradient calculated using the FGM method.
This method eﬃciently minimizes the Euclidian distance between the original and the
corresponding adversarial samples.
133

Perturb a selected input dimensions: select only a limited number of input dimensions to
perturb by identifying which combination of input dimensions, if perturbed, will contribute to
the adversarial goals.
This method eﬀectively reduces the number of input features perturbed while crafting
adversarial examples.
For choosing the input dimensions which forms the perturbations, all the dimensions are sorted
in decreasing order of their contribution to the adversarial goal.
Input components are added to perturbation δx in the decreasing order until the resulting
sample x∗ = x + δx is misclassiﬁed by the model F.
134

LISA-CNN could interpret this as Speed Limit 45
¶
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash,
Tadayoshi Kohno, Dawn Song Robust Physical-World Attacks on Deep Learning Visual Classiﬁcation Computer
Vision and Pattern Recognition (CVPR 2018)
Permission is granted to use and reproduce the images for publications or for research with acknowledgement
135

Adversarial Training
Inject adversarial examples into the training set: the defender generates a lot of adversarial
examples and augments these perturbed data while training the targeted model.
137

Gradient Hiding
A natural defense against gradient-based attacks and attacks using adversarial crafting method
such as FGM, could consist in hiding information about the model’s gradient from the
adversary.
For instance, if the model is non-differentiable (e.g, a SVM, or a Nearest Neighbor Classifier),
gradient-based attacks are rendered ineffective.
However, this defense are easily fooled by learning a surrogate Black-Box model having
gradient and crafting examples using it (cf. adversarial sample transferrability).
138

Feature Squeezing
By reducing the complexity of representing the data, the adversarial perturbations disappear
because of low sensitivity.
Examples: reducing the quantization levels, or the sampling frequencies.
Though these techniques work well in preventing adversarial attacks, these have the collateral
eﬀect of worsening the accuracy of the model on true examples
139

Blocking the Transferrability: Null Labelling
The main idea behind the proposed approach is to augment a new NULL label in the dataset
and train the classifier to reject the adversarial examples by classifying them as NULL.
Three steps:
1. Initial Training of the target classifier on the clean dataset;
2. Computing the NULL probabilities: The probability of belonging to the NULL class is then
calculated using a function f for the adversarial examples generated with different amount
of perturbations;
3. Adversarial Training: Each clean sample is then re-trained with the original classifier along
with different perturbed inputs for the sample. The label for the training data is decided
based on the NULL probabilities obtained in the previous step.
¶
Hosseini, Hossein, et al. “Blocking transferability of adversarial examples in black-box learning systems.”
arXiv preprint arXiv:1703.04318 (2017).
140

Uncertainty-Awareness
Change the loss function so to output pieces of evidences in favour of diﬀerent classes that
should then be considered through Bayesian update resulting into a Dirichlet Distribution
¶
Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classiﬁcation
uncertainty.” Advances in Neural Information Processing Systems. 2018.
141

Given the parameters of our model w, we can capture our assumptions about w, before
observing the data, in the form of a prior probability distribution p(w). The eﬀect of the
observed data D = {t1, . . . , tN } is expressed through the conditional p(D|w), hence Bayes
theorem takes the form:
p(w|D) =
likelihood
p(D|w)
prior
p(w)
p(D)
posterior ∝ likelihood · prior
p(D) = p(D|w)p(w)dw
It ensures that the posterior distribution on the left-hand side is a valid probability density and integrates to one.
142

Frequentist paradigm
• w is considered to be a ﬁxed parameter,
whose values is determined by some form
of estimator, e.g. the maximum likelihood
in which w is set to the value that
maximises p(D|w)
• Error bars on this estimate are obtained
by considering the distribution of possible
data sets D.
• The negative log of the likelihood
function is called an error function: the
negative log is a monotonically decreasing
function hence maximising the likelihood
is equivalent to minimising the error.
Bayesian paradigm
• There is only one single data set D (the
one observed) and the uncertainty in the
parameters is expressed through a
probability distribution over w.
• The inclusion of prior knowledge arises
naturally: suppose that a fair-looking coin
is tossed three times and lands heads
each time. A classical maximum
likelihood estimate of the probability of
landing heads would give 1.
There are cases where you want to reduce
the dependence on the prior, hence using
noninformative priors.
143

Binary variable: Bernoulli
Let us consider a single binary random variable x ∈ {0, 1}, e.g. ﬂipping coin, not necessary fair,
hence the probability is conditioned by a parameter 0 ≤ µ ≤ 1:
p(x = 1|µ) = µ
The probability distribution over x is known as the Bernoulli distribution:
Bern(x|µ) = µx
(1 − µ)1−x
E[x] = µ
144

Now suppose that we have a data set of observations x = (x1, . . . , xN )T
drawn independently
from a Bernoulli distribution (iid) whose mean µ is unknown, and we would like to determine
this parameter from the data set.
p(D|µ) =
N
n=1
p(xn|µ) =
N
n=1
µxn
(1 − µ)1−xn
Let’s maximise the (log)-likelihood to identify the parameter (log simpliﬁes and reduces risks of
underﬂow):
ln p(D|µ) =
N
n=1
ln p(xn|µ) =
N
n=1
{xn ln µ + (1 − xn) ln(1 − µ)}
145

The log likelihood depends on the N observations xn only through their sum
n
xn, hence the
sum provides an example of a suﬃcient statistics for the data under this distribution,
No other statistic that can be calculated from the same sample provides any additional information as to
the value of the parameter
146

d
dµ
ln p(D|µ) = 0
N
n=1
xn
µ
−
1 − xn
1 − µ
= 0
N
n=1
xn − µ
µ(1 − µ)
= 0
N
n=1
xn = Nµ
µML =
1
N
N
n=1
xn
aka sample mean. Risk of overﬁt: consider to toss the coin three times and each time is head
147

In order to develop a Bayesian treatment to the overﬁt problem of the maximum likelihood
estimator for the Bernoulli. Since the likelihood takes the form of the product of factors of the
form µx
(1 − µ)1−x
, if we choose a prior to be proportional to powers of µ and (1 − µ) then the
posterior distribution, proportional to the product of the prior and the likelihood, will have the
same functional form as the prior. This property is called conjugacy.
148

Binary variables: Beta distribution
Beta(µ|a, b) =
Γ(a + b)
Γ(a)Γ(b)
µa−1
(1 − µ)b−1
with
Γ(x) ≡
∞
0
ux−1
e−u
du
E[µ] =
a
a + b
var[µ] =
ab
(a + b)2(a + b + 1)
a and b are hyperparameters controlling the distribution of parameter µ.
149

µ
a = 0.1
b = 0.1
0 0.5 1
0
1
2
3
µ
a = 1
b = 1
0 0.5 1
0
1
2
3
µ
a = 2
b = 3
0 0.5 1
0
1
2
3
µ
a = 8
b = 4
0 0.5 1
0
1
2
3
150
Fig. 2.2a-d of C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag.

Considering a beta distribution prior and the binomial likelihood function, and given l = N − m
p(µ|m, l, a, b) ∝ µm+a−1
(1 − µ)l+b−1
Hence p(µ|m, l, a, b) is another beta distribution and we can rearrange the normalisation
coeﬃcient as follows:
p(µ|m, l, a, b) =
Γ(m + a + l + b)
Γ(m + a)Γ(l + b)
µm+a−1
(1 − µ)l+b−1
µ
prior
0 0.5 1
0
1
2
µ
likelihood function
0 0.5 1
0
1
2
µ
posterior
0 0.5 1
0
1
2
151
Fig. 2.3a-c of C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag.

Epistemic vs Aleatoric uncertainty
Aleatoric uncertainty
Variability in the outcome of an experiment
which is due to inherently random eﬀects (e.g.
ﬂipping a fair coin): no additional source of
information but Laplace’s daemon can reduce
such a variability.
Epistemic uncertainty
Epistemic state of the agent using the model,
hence its lack of knowledge that—in
principle—can be reduced on the basis of
additional data samples.
It is a general property of Bayesian learning
that, as we observe more and more data, the
epistemic uncertainty represented by the
posterior distribution will steadily decrease
(the variance decreases).
152

Multinomial variables: categorical distribution
Let us suppose to roll a dice with K = 6 faces. An observation of this variable x equivalent to
x3 = 1 (e.g. the number 3 with face up) can be:
x = (0, 0, 1, 0, 0, 0)T
Note that such vectors must satisfy
K
k=1
xk = 1.
p(x|µ) =
K
k=1
µxk
k
where µ = (µ1, . . . , µK )T
, nad the parameters µk are such that µk ≥ 0 and
k
µk = 1.
Generalisation of the Bernoulli
153

p(D|µ) =
N
n=1
K
k=1
µxnk
k
The likelihood depends on the N datapoints only through the K quantities
mk =
n
xnk
which represent the number of observations of xk = 1 (e.g. with k = 3, the third face of the
dice). These are called the suﬃcient statistics for this distribution.
154

Finding the maximum likelihood requires a Lagrange multiplier that
K
x=1
mk ln µk + λ
K
k=1
µk − 1
Hence
µML
k =
mk
N
which is the fraction of N observations for which xk = 1.
155

Multinomial variables: the Dirichlet distribution
The Dirichlet distribution is the generalisation of the beta distribution to K dimensions.
Dir(µ|α) =
Γ(α0)
Γ(α1) · · · Γ(αK )
K
k=1
µαk −1
k
such that
k
µk = 1, α = (α1, . . . , αK )T
, αk ≥ 0 and
α0 =
K
k=1
αk
156

Considering a Dirichlet distribution prior and the categorical likelihood function, the posterior is
then:
p(µ|D, α) = Dir(µ|α + m) =
=
Γ(α0 + N)
Γ(α1 + m1) · · · Γ(αK + mK )
K
k=1
µαk +mk −1
k
The uniform prior is given by Dir(µ|1) and the Jeﬀreys’ non-informative prior is given by
Dir(µ|(0.5, . . . , 0.5)T
).
The marginals of a Dirichlet distribution are beta distributions.
157

From Evidence to Dirichlet
Let us now assume a Dirichlet distribution over K classes that is the result of Bayesian update
with N observations and starting with a uniform prior:
Dir(µ | α) = Dir(µ | e1 + 1, e2 + 2, . . . , eK + 1 )
where ei is the number of observations (evidence) for the class k, and
k
ek = N.
158

Dirichlet and Epistemic Uncertainty
The epistemic uncertainty associated to a Dirichlet distribution Dir(µ | α) is given by
u =
K
S
with K the number of classes and S = α0 =
K
k=1
αk is the Dirichlet strength.
Note that if the Dirichlet has been computed as the resulting of Bayesian update from a
uniform prior, 0 ≤ u ≤ 1, and u = 1 implies that we are considering the uniform distribution
(an extreme case of Dirichlet distribution).
Let us denote with µk
αk
S
.
159

Loss function
If we then consider Dir(mi | αi ) as the prior for a Multinomial p(yi | µi ), we can then compute the
expected squared error (aka Brier score)
E[ yi − mi
2
2] =
K
k=1
E[y2
i,k − 2yi,k µi,k + µ2
i,k ] =
X
k=1
y2
i,k − 2yi,k E[µi,k ] + E[µ2
i,k ] =
=
K
k=1
y2
i,k − 2yi,k E[µi,k ] + E[µi,k ]2
+ var[µi,k ] =
=
K
k=1
(yi,k − E[µi,k ])2
+ var[µi,k ] =
=
K
k=1
yi,k −
αi,k
Si
2
+
αi,k (Si − αi,k )
S2
i (Si + 1)
=
=
K
k=1
(yi,k − µi,k )2
+
µi,k (1 − µi,k )2
Si + 1
The loss over a batch of training samples is the sum of the loss for each sample in the batch.
160

Learning to say “I don’t know”
To avoid generating evidence for all the classes when the network cannot classify a given
sample (epistemic uncertainty), we introduce a term in the loss function that penalises the
divergence from the uniform distribution:
L =
N
i=1
E[ yi − µi
2
2] + λt
N
i=1
KL ( Dir(µi | αi ) || Dir(µi | 1) )
where:
• λt is another hyperparameter, and the suggestion is to use it parametric on the number of
training epochs, e.g. λt = min 1,
t
CONST
with t the number of current training epoch, so that
the eﬀect of the KL divergence is gradually increased to avoid premature convergence to the
uniform distribution in the early epoch where the learning algorithm still needs to explore the
parameter space;
• αi = yi + (1 − yi ) · αi are the Dirichlet parameters the neural network in a forward pass has put
on the wrong classes, and the idea is to minimise them as much as possible.
161

KL recap
Consider some unknown distribution p(x) and suppose that we have modelled this using q(x).
If we use q(x) instead of p(x) to represent the true values of x, the average additional amount
of information required is:
KL(p||q) = − p(x) ln q(x)dx − − p(x) ln p(x)dx
= − p(x) ln
q(x)
p(x)
dx
= −E ln
q(x)
p(x)
This is known as the relative entropy or Kullback-Leibler divergence, or KL divergence between
the distributions p(x) and q(x).
Properties:
• KL(p||q) ≡ KL(q||p);
• KL(p||q) ≥ 0 and KL(p||q) = 0 if and only if p = q
162

KL ( Dir(µi | αi ) || Dir(µi | 1) ) = ln
Γ(
K
k=1 αi,k )
Γ(K)
K
k=1 Γ(αi,k )
+
K
k=1
(αi,k −1)

ψ(αi,k ) − ψ


K
j=1
αi,j




where ψ(x) =
d
dx
ln ( Γ(x) ) is the digamma function
163

EDL + GAN for adversarial training
M. Sensoy, L. Kaplan, F. Cerutti, M. Saleki, “Uncertainty-Aware Deep Classiﬁers using Generative Models.”
AAAI 2020
165

VAE + GAN
G
D' D
G
D
D
Figure 2: Original training samples (top), samples
166

Robustness against FGS
167

Anomaly detection
(mnist) (cifar10)
168

EDL adopted in industrial settings
169

Conclusions
• GOFAI
• ML
Guidelines
• Rules
Security of AI
• GOFAI
• ML
170

Security of Artificial Intelligence

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Security of Artificial Intelligence

Similar to Security of Artificial Intelligence (20)

More from Federico Cerutti

More from Federico Cerutti (20)

Recently uploaded

Recently uploaded (20)

Security of Artificial Intelligence