SlideShare a Scribd company logo
INTRO TO EDL
Federico Cerutti
Educational material for students at Cardiff University, UK and the University of Brescia,
Italy.
2
a primer in bayesian analysis
Bayesian Probabilities
Bayes theorem
p(Y |X) =
p(X|Y )p(Y )
p(X)
(1)
where
p(X) =
Y
p(X|Y )p(Y ) (2)
4
Suppose we randomly pick one of the boxes and from that
box we randomly select an item of fruit, and having observed
which sort of fruit it is we replace it in the box from which it
came.
We could imagine repeating this process many times. Let us
suppose that in so doing we pick the red box 40% of the time
and we pick the blue box 60% of the time, and that when we
remove an item of fruit from a box we are equally likely to
select any of the pieces of fruit in the box.
We are told that a piece of fruit has been selected and it is an orange.
Which box does it came from?
5
Image from Bishop, C. M. Pattern Recognition and Machine Learning. (Springer-Verlag, 2006).
p(B = r|F = o) =
p(F = o|B = r)p(B = r)
p(F = o)
=
=
6
8 · 4
10
6
8 · 4
10 + 1
4 · 6
10
=
=
3
4
·
2
5
·
20
9
=
2
3
(3)
6
Image from Bishop, C. M. Pattern Recognition and Machine Learning. (Springer-Verlag, 2006).
Given the parameters of our model w, we can capture our assumptions about w, before
observing the data, in the form of a prior probability distribution p(w). The effect of the
observed data D = {t1, . . . , tN } is expressed through the conditional p(D |w), hence Bayes
theorem takes the form:
p(w|D ) =
likelihood
p(D |w)
prior
p(w)
p(D )
(4)
posterior ∝ likelihood · prior (5)
p(D ) = p(D |w)p(w)dw (6)
It ensures that the posterior distribution on the left-hand side is a valid probability density and integrates to one.
7
Frequentist paradigm
• w is considered to be a fixed parameter,
whose values is determined by some
form of estimator, e.g. the maximum
likelihood in which w is set to the value
that maximises p(D |w)
• Error bars on this estimate are obtained
by considering the distribution of
possible data sets D .
• The negative log of the likelihood
function is called an error function: the
negative log is a monotonically
decreasing function hence maximising
the likelihood is equivalent to
minimising the error.
Bayesian paradigm
• There is only one single data set D (the
one observed) and the uncertainty in the
parameters is expressed through a
probability distribution over w.
• The inclusion of prior knowledge arises
naturally: suppose that a fair-looking
coin is tossed three times and lands
heads each time. A classical maximum
likelihood estimate of the probability of
landing heads would give 1.
There are cases where you want to
reduce the dependence on the prior,
hence using noninformative priors.
8
Binary variable: Bernoulli
Let us consider a single binary random variable x ∈ {0, 1}, e.g. flipping coin, not necessary
fair, hence the probability is conditioned by a parameter 0 ≤ µ ≤ 1:
p(x = 1|µ) = µ (7)
The probability distribution over x is known as the Bernoulli distribution:
Bern(x|µ) = µx
(1 − µ)1−x
(8)
E[x] = µ (9)
9
Binomial distribution
The distribution of m observations of x = 1 given the datasize N is given by the Binomial
distribution:
Bin(m|N, µ) =
N
m
µm
(1 − µ)N−m
(10)
with
E[m] ≡
N
m=0
mBin(m|N, µ) = Nµ (11)
and
var[m] ≡
N
m=0
(m − E[m])2
Bin(m|N, µ) = Nµ(1 − µ) (12)
10
Image from Bishop, C. M. Pattern Recognition and Machine Learning. (Springer-Verlag, 2006).
How many times, over N = 10 runs, would you see x = 1 if µ = 0.25?
m
0 1 2 3 4 5 6 7 8 9 10
0
0.1
0.2
0.3
11
Let’s go back to the Bernoulli distribution
Now suppose that we have a data set of observations x = (x1, . . . , xN )T
drawn independently
from a Bernoulli distribution (iid) whose mean µ is unknown, and we would like to
determine this parameter from the data set.
p(D |µ) =
N
n=1
p(xn|µ) =
N
n=1
µxn
(1 − µ)1−xn
(13)
Let’s maximise the (log)-likelihood to identify the parameter (log simplifies and reduces risks
of underflow):
ln p(D |µ) =
N
n=1
ln p(xn|µ) =
N
n=1
{xn ln µ + (1 − xn) ln(1 − µ)} (14)
12
The log likelihood depends on the N observations xn only through their sum
n
xn, hence
the sum provides an example of a sufficient statistics for the data under this distribution,
“
hence no other statistic that can be calculated from the same sample provides
any additional information as to the value of the parameter
Fisher 1922„
13
d
dµ
ln p(D |µ) = 0
N
n=1
xn
µ
−
1 − xn
1 − µ
= 0
N
n=1
xn − µ
µ(1 − µ)
= 0
N
n=1
xn = Nµ
µML =
1
N
N
n=1
xn
aka sample mean. Risk of overfit: consider to toss the coin three times and each time is head
14
In order to develop a Bayesian treatment to the overfit problem of the maximum likelihood
estimator for the Bernoulli. Since the likelihood takes the form of the product of factors of
the form µx
(1 − µ)1−x
, if we choose a prior to be proportional to powers of µ and (1 − µ) then
the posterior distribution, proportional to the product of the prior and the likelihood, will
have the same functional form as the prior. This property is called conjugacy.
15
Binary variables: Beta distribution
Beta(µ|a, b) =
Γ(a + b)
Γ(a)Γ(b)
µa−1
(1 − µ)b−1
with
Γ(x) ≡
∞
0
ux−1
e−u
du
E[µ] =
a
a + b
var[µ] =
ab
(a + b)2(a + b + 1)
a and b are hyperparameters controlling the distribution of parameter µ.
16
µ
a = 0.1
b = 0.1
0 0.5 1
0
1
2
3
µ
a = 1
b = 1
0 0.5 1
0
1
2
3
µ
a = 2
b = 3
0 0.5 1
0
1
2
3
µ
a = 8
b = 4
0 0.5 1
0
1
2
3
17
Images from Bishop, C. M. Pattern Recognition and Machine Learning. (Springer-Verlag, 2006).
Considering a beta distribution prior and the binomial likelihood function, and given
l = N − m
p(µ|m, l, a, b) ∝ µm+a−1
(1 − µ)l+b−1
Hence p(µ|m, l, a, b) is another beta distribution and we can rearrange the normalisation
coefficient as follows:
p(µ|m, l, a, b) =
Γ(m + a + l + b)
Γ(m + a)Γ(l + b)
µm+a−1
(1 − µ)l+b−1
µ
prior
0 0.5 1
0
1
2
µ
likelihood function
0 0.5 1
0
1
2
µ
posterior
0 0.5 1
0
1
2
18
Images from Bishop, C. M. Pattern Recognition and Machine Learning. (Springer-Verlag, 2006).
Epistemic vs Aleatoric uncertainty
Aleatoric uncertainty
Variability in the outcome of an experiment
which is due to inherently random effects
(e.g. flipping a fair coin): no additional
source of information but Laplace’s daemon
can reduce such a variability.
Epistemic uncertainty
Epistemic state of the agent using the model,
hence its lack of knowledge that—in
principle—can be reduced on the basis of
additional data samples.
It is a general property of Bayesian learning
that, as we observe more and more data, the
epistemic uncertainty represented by the
posterior distribution will steadily decrease
(the variance decreases).
19
See notebook at
https://nbviewer.jupyter.org/federicocerutti/UncertaintyAwarenessResources/
blob/master/notebooks/Beta.ipynb
20
Multinomial variables: categorical distribution
Let us suppose to roll a dice with K = 6 faces. An observation of this variable x equivalent
to x3 = 1 (e.g. the number 3 with face up) can be:
x = (0, 0, 1, 0, 0, 0)T
Note that such vectors must satisfy
K
k=1
xk = 1.
p(x|µ) =
K
k=1
µxk
k
where µ = (µ1, . . . , µK )T
, nad the parameters µk are such that µk ≥ 0 and
k
µk = 1.
Generalisation of the Bernoulli
21
p(D |µ) =
N
n=1
K
k=1
µxnk
k
The likelihood depends on the N datapoints only through the K quantities
mk =
n
xnk
which represent the number of observations of xk = 1 (e.g. with k = 3, the third face of the
dice). These are called the sufficient statistics for this distribution.
22
Finding the maximum likelihood requires a Lagrange multiplier that
K
x=1
mk ln µk + λ
K
k=1
µk − 1
Hence
µML
k =
mk
N
which is the fraction of N observations for which xk = 1.
23
Multinomial variables: the Dirichlet distribution
The Dirichlet distribution is the generalisation of the beta distribution to K dimensions.
Dir(µ|α) =
Γ(α0)
Γ(α1) · · · Γ(αK )
K
k=1
µαk −1
k
such that
k
µk = 1, α = (α1, . . . , αK )T
, αk ≥ 0 and
α0 =
K
k=1
αk
24
Considering a Dirichlet distribution prior and the categorical likelihood function, the
posterior is then:
p(µ|D , α) = Dir(µ|α + m) =
=
Γ(α0 + N)
Γ(α1 + m1) · · · Γ(αK + mK )
K
k=1
µαk +mk −1
k
The uniform prior is given by Dir(µ|1) and the Jeffreys’ non-informative prior is given by
Dir(µ|(0.5, . . . , 0.5)T
).
The marginals of a Dirichlet distribution are beta distributions.
25
neural networks and uncertainty awareness
Change the loss function so to output pieces of evidences in favour of different classes that
should then be considered through Bayesian update resulting into a Dirichlet Distribution
Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classification
uncertainty.” Advances in Neural Information Processing Systems. 2018.
27
From Evidence to Dirichlet
Let us now assume a Dirichlet distribution over K classes that is the result of Bayesian
update with N observations and starting with a uniform prior:
Dir(µ | α) = Dir(µ | e1 + 1, e2 + 2, . . . , eK + 1 )
where ei is the number of observations (evidence) for the class k, and
k
ek = N.
28
Dirichlet and Epistemic Uncertainty
The epistemic uncertainty associated to a Dirichlet distribution Dir(µ | α) is given by
u =
K
S
with K the number of classes and S = α0 =
K
k=1
αk is the Dirichlet strength.
Note that if the Dirichlet has been computed as the resulting of Bayesian update from a
uniform prior, 0 ≤ u ≤ 1, and u = 1 implies that we are considering the uniform distribution
(an extreme case of Dirichlet distribution).
Let us denote with µk
αk
S
.
29
Loss function
If we then consider Dir(mi | αi ) as the prior for a Multinomial p(yi | µi ), we can then compute the
expected squared error (aka Brier score)
E[ yi − mi
2
2
] =
K
k=1
E[y2
i,k − 2yi,k µi,k + µ2
i,k ] =
X
k=1
y2
i,k − 2yi,k E[µi,k ] + E[µ2
i,k ] =
=
K
k=1
y2
i,k − 2yi,k E[µi,k ] + E[µi,k ]2
+ var[µi,k ] =
=
K
k=1
(yi,k − E[µi,k ])2
+ var[µi,k ] =
=
K
k=1
yi,k −
αi,k
Si
2
+
αi,k (Si − αi,k )
S2
i (Si + 1)
=
=
K
k=1
(yi,k − µi,k )2
+
µi,k (1 − µi,k )2
Si + 1
The loss over a batch of training samples is the sum of the loss for each sample in the batch.
Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classification
uncertainty.” Advances in Neural Information Processing Systems. 2018.
30
Learning to say “I don’t know”
To avoid generating evidence for all the classes when the network cannot classify a given
sample (epistemic uncertainty), we introduce a term in the loss function that penalises the
divergence from the uniform distribution:
L =
N
i=1
E[ yi − µi
2
2
] + λt
N
i=1
KL ( Dir(µi | αi ) || Dir(µi | 1) )
where:
• λt is another hyperparameter, and the suggestion is to use it parametric on the number of
training epochs, e.g. λt = min 1,
t
CONST
with t the number of current training epoch, so that
the effect of the KL divergence is gradually increased to avoid premature convergence to the
uniform distribution in the early epoch where the learning algorithm still needs to explore the
parameter space;
• αi = yi + (1 − yi ) · αi are the Dirichlet parameters the neural network in a forward pass has put
on the wrong classes, and the idea is to minimise them as much as possible.
Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classification
uncertainty.” Advances in Neural Information Processing Systems. 2018.
31
KL recap
Consider some unknown distribution p(x) and suppose that we have modelled this using
q(x). If we use q(x) instead of p(x) to represent the true values of x, the average additional
amount of information required is:
KL(p||q) = − p(x) ln q(x)dx − − p(x) ln p(x)dx
= − p(x) ln
q(x)
p(x)
dx
= −E ln
q(x)
p(x)
(15)
This is known as the relative entropy or Kullback-Leibler divergence, or KL divergence
between the distributions p(x) and q(x).
Properties:
• KL(p||q) ≡ KL(q||p);
• KL(p||q) ≥ 0 and KL(p||q) = 0 if and only if p = q
32
KL ( Dir(µi | αi ) || Dir(µi | 1) ) = ln
Γ( K
k=1 αi,k )
Γ(K) K
k=1 Γ(αi,k )
+
K
k=1
(αi,k −1)

ψ(αi,k ) − ψ


K
j=1
αi,j




where ψ(x) =
d
dx
ln ( Γ(x) ) is the digamma function
Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classification
uncertainty.” Advances in Neural Information Processing Systems. 2018.
33
EDL and robustness to FGS
Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classification
uncertainty.” Advances in Neural Information Processing Systems. 2018.
34
EDL + GAN for adversarial training
Sensoy, Murat, et al. “Uncertainty-Aware Deep Classifiers using Generative Models.” AAAI 2020
35
aluation
VAE + GAN
G
D' D
For each data point in latent space, we generate a new noisy sample, which is similar
to it to some extent. Hence, we avoid mode-collapse problem.
be trivially predicted without learning the actual structure of the data. Similarly,
if the noise distribution is too close to the data distribution, the density ratio
would be trivially one and the learning will be deprived.
G: Generator in the latent space of VAE
D’: Discriminator in the latent space
D : Discriminator in the input space
Figure 2: Original training samples (top), samples recon-
structed by the VAE (middle), and the samples generated by
the proposed method (bottom) over a number of epochs.
for high dimensional data by maximizingSensoy, Murat, et al. “Uncertainty-Aware Deep Classifiers using Generative Models.” AAAI 2020
36
Robustness against FGS
Sensoy, Murat, et al. “Uncertainty-Aware Deep Classifiers using Generative Models.” AAAI 2020
37
Anomaly detection
(mnist) (cifar10)
Sensoy, Murat, et al. “Uncertainty-Aware Deep Classifiers using Generative Models.” AAAI 2020
38

More Related Content

What's hot

Mrbml004 : Introduction to Information Theory for Machine Learning
Mrbml004 : Introduction to Information Theory for Machine LearningMrbml004 : Introduction to Information Theory for Machine Learning
Mrbml004 : Introduction to Information Theory for Machine Learning
Jaouad Dabounou
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
17 Statistical Models for Networks
17 Statistical Models for Networks17 Statistical Models for Networks
17 Statistical Models for Networks
Duke Network Analysis Center
 
Dimensionality reduction with UMAP
Dimensionality reduction with UMAPDimensionality reduction with UMAP
Dimensionality reduction with UMAP
Jakub Bartczuk
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
freshdatabos
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
Megha Sharma
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
홍배 김
 
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Bang Xiang Yong
 
Monte carlo dropout and variational bound
Monte carlo dropout and variational boundMonte carlo dropout and variational bound
Monte carlo dropout and variational bound
天乐 杨
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
JULIO GONZALEZ SANZ
 
Bayes Theorem.pdf
Bayes Theorem.pdfBayes Theorem.pdf
Bayes Theorem.pdf
Nirmalavenkatachalam
 
Dropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationDropout as a Bayesian Approximation
Dropout as a Bayesian Approximation
Sangwoo Mo
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Cross validation
Cross validationCross validation
Cross validation
RidhaAfrawe
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
hktripathy
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
Md Tajul Islam
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
Tatsuya Yokota
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
ESCOM
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
Ujjawal
 
Probabilistic logic
Probabilistic logicProbabilistic logic
Probabilistic logic
Rushdi Shams
 

What's hot (20)

Mrbml004 : Introduction to Information Theory for Machine Learning
Mrbml004 : Introduction to Information Theory for Machine LearningMrbml004 : Introduction to Information Theory for Machine Learning
Mrbml004 : Introduction to Information Theory for Machine Learning
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
17 Statistical Models for Networks
17 Statistical Models for Networks17 Statistical Models for Networks
17 Statistical Models for Networks
 
Dimensionality reduction with UMAP
Dimensionality reduction with UMAPDimensionality reduction with UMAP
Dimensionality reduction with UMAP
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
 
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
 
Monte carlo dropout and variational bound
Monte carlo dropout and variational boundMonte carlo dropout and variational bound
Monte carlo dropout and variational bound
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
 
Bayes Theorem.pdf
Bayes Theorem.pdfBayes Theorem.pdf
Bayes Theorem.pdf
 
Dropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationDropout as a Bayesian Approximation
Dropout as a Bayesian Approximation
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Cross validation
Cross validationCross validation
Cross validation
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Probabilistic logic
Probabilistic logicProbabilistic logic
Probabilistic logic
 

Similar to Introduction to Evidential Neural Networks

Statistical Method In Economics
Statistical Method In EconomicsStatistical Method In Economics
Statistical Method In Economics
Economics Homework Helper
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
Joachim Gwoke
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
Suvrat Mishra
 
multivariate normal distribution.pdf
multivariate normal distribution.pdfmultivariate normal distribution.pdf
multivariate normal distribution.pdf
rishumaurya10
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
ChinmayeeJonnalagadd2
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
Faeco Bot
 
Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statistics
Tarun Gehlot
 
Multivariate Methods Assignment Help
Multivariate Methods Assignment HelpMultivariate Methods Assignment Help
Multivariate Methods Assignment Help
Statistics Assignment Experts
 
Unit II PPT.pptx
Unit II PPT.pptxUnit II PPT.pptx
Unit II PPT.pptx
VIKASPALEKAR18PHD100
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
Zheng Mengdi
 
chap2.pdf
chap2.pdfchap2.pdf
chap2.pdf
eseinsei
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
Alexander Litvinenko
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
Sung Yub Kim
 
Basic math including gradient
Basic math including gradientBasic math including gradient
Basic math including gradient
Ramesh Kesavan
 
Ica group 3[1]
Ica group 3[1]Ica group 3[1]
Ica group 3[1]
Apoorva Srinivasan
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
Long Beach City College
 
Estimation rs
Estimation rsEstimation rs
Estimation rs
meharahutsham
 
Appendex b
Appendex bAppendex b
Appendex b
swavicky
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2
Padma Metta
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
RayKim51
 

Similar to Introduction to Evidential Neural Networks (20)

Statistical Method In Economics
Statistical Method In EconomicsStatistical Method In Economics
Statistical Method In Economics
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
multivariate normal distribution.pdf
multivariate normal distribution.pdfmultivariate normal distribution.pdf
multivariate normal distribution.pdf
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
 
Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statistics
 
Multivariate Methods Assignment Help
Multivariate Methods Assignment HelpMultivariate Methods Assignment Help
Multivariate Methods Assignment Help
 
Unit II PPT.pptx
Unit II PPT.pptxUnit II PPT.pptx
Unit II PPT.pptx
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
chap2.pdf
chap2.pdfchap2.pdf
chap2.pdf
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Basic math including gradient
Basic math including gradientBasic math including gradient
Basic math including gradient
 
Ica group 3[1]
Ica group 3[1]Ica group 3[1]
Ica group 3[1]
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
Estimation rs
Estimation rsEstimation rs
Estimation rs
 
Appendex b
Appendex bAppendex b
Appendex b
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 

More from Federico Cerutti

Security of Artificial Intelligence
Security of Artificial IntelligenceSecurity of Artificial Intelligence
Security of Artificial Intelligence
Federico Cerutti
 
Argumentation and Machine Learning: When the Whole is Greater than the Sum of...
Argumentation and Machine Learning: When the Whole is Greater than the Sum of...Argumentation and Machine Learning: When the Whole is Greater than the Sum of...
Argumentation and Machine Learning: When the Whole is Greater than the Sum of...
Federico Cerutti
 
Human-Argumentation Experiment Pilot 2013: Technical Material
Human-Argumentation Experiment Pilot 2013: Technical MaterialHuman-Argumentation Experiment Pilot 2013: Technical Material
Human-Argumentation Experiment Pilot 2013: Technical Material
Federico Cerutti
 
Probabilistic Logic Programming with Beta-Distributed Random Variables
Probabilistic Logic Programming with Beta-Distributed Random VariablesProbabilistic Logic Programming with Beta-Distributed Random Variables
Probabilistic Logic Programming with Beta-Distributed Random Variables
Federico Cerutti
 
Supporting Scientific Enquiry with Uncertain Sources
Supporting Scientific Enquiry with Uncertain SourcesSupporting Scientific Enquiry with Uncertain Sources
Supporting Scientific Enquiry with Uncertain Sources
Federico Cerutti
 
Introduction to Formal Argumentation Theory
Introduction to Formal Argumentation TheoryIntroduction to Formal Argumentation Theory
Introduction to Formal Argumentation Theory
Federico Cerutti
 
Handout: Argumentation in Artificial Intelligence: From Theory to Practice
Handout: Argumentation in Artificial Intelligence: From Theory to PracticeHandout: Argumentation in Artificial Intelligence: From Theory to Practice
Handout: Argumentation in Artificial Intelligence: From Theory to Practice
Federico Cerutti
 
Argumentation in Artificial Intelligence: From Theory to Practice
Argumentation in Artificial Intelligence: From Theory to PracticeArgumentation in Artificial Intelligence: From Theory to Practice
Argumentation in Artificial Intelligence: From Theory to Practice
Federico Cerutti
 
Handout for the course Abstract Argumentation and Interfaces to Argumentative...
Handout for the course Abstract Argumentation and Interfaces to Argumentative...Handout for the course Abstract Argumentation and Interfaces to Argumentative...
Handout for the course Abstract Argumentation and Interfaces to Argumentative...
Federico Cerutti
 
Argumentation in Artificial Intelligence: 20 years after Dung's work. Left ma...
Argumentation in Artificial Intelligence: 20 years after Dung's work. Left ma...Argumentation in Artificial Intelligence: 20 years after Dung's work. Left ma...
Argumentation in Artificial Intelligence: 20 years after Dung's work. Left ma...
Federico Cerutti
 
Argumentation in Artificial Intelligence: 20 years after Dung's work. Right m...
Argumentation in Artificial Intelligence: 20 years after Dung's work. Right m...Argumentation in Artificial Intelligence: 20 years after Dung's work. Right m...
Argumentation in Artificial Intelligence: 20 years after Dung's work. Right m...
Federico Cerutti
 
Argumentation in Artificial Intelligence
Argumentation in Artificial IntelligenceArgumentation in Artificial Intelligence
Argumentation in Artificial Intelligence
Federico Cerutti
 
Algorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions EnumerationAlgorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions Enumeration
Federico Cerutti
 
Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an ...
Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an ...Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an ...
Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an ...
Federico Cerutti
 
Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...
Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...
Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...
Federico Cerutti
 
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
Federico Cerutti
 
Cerutti-AT2013-Graphical Subjective Logic
Cerutti-AT2013-Graphical Subjective LogicCerutti-AT2013-Graphical Subjective Logic
Cerutti-AT2013-Graphical Subjective Logic
Federico Cerutti
 
Cerutti-AT2013-Trust and Risk
Cerutti-AT2013-Trust and RiskCerutti-AT2013-Trust and Risk
Cerutti-AT2013-Trust and Risk
Federico Cerutti
 
Cerutti -- TAFA2013
Cerutti -- TAFA2013Cerutti -- TAFA2013
Cerutti -- TAFA2013
Federico Cerutti
 
Cerutti--Introduction to Argumentation (seminar @ University of Aberdeen)
Cerutti--Introduction to Argumentation (seminar @ University of Aberdeen)Cerutti--Introduction to Argumentation (seminar @ University of Aberdeen)
Cerutti--Introduction to Argumentation (seminar @ University of Aberdeen)
Federico Cerutti
 

More from Federico Cerutti (20)

Security of Artificial Intelligence
Security of Artificial IntelligenceSecurity of Artificial Intelligence
Security of Artificial Intelligence
 
Argumentation and Machine Learning: When the Whole is Greater than the Sum of...
Argumentation and Machine Learning: When the Whole is Greater than the Sum of...Argumentation and Machine Learning: When the Whole is Greater than the Sum of...
Argumentation and Machine Learning: When the Whole is Greater than the Sum of...
 
Human-Argumentation Experiment Pilot 2013: Technical Material
Human-Argumentation Experiment Pilot 2013: Technical MaterialHuman-Argumentation Experiment Pilot 2013: Technical Material
Human-Argumentation Experiment Pilot 2013: Technical Material
 
Probabilistic Logic Programming with Beta-Distributed Random Variables
Probabilistic Logic Programming with Beta-Distributed Random VariablesProbabilistic Logic Programming with Beta-Distributed Random Variables
Probabilistic Logic Programming with Beta-Distributed Random Variables
 
Supporting Scientific Enquiry with Uncertain Sources
Supporting Scientific Enquiry with Uncertain SourcesSupporting Scientific Enquiry with Uncertain Sources
Supporting Scientific Enquiry with Uncertain Sources
 
Introduction to Formal Argumentation Theory
Introduction to Formal Argumentation TheoryIntroduction to Formal Argumentation Theory
Introduction to Formal Argumentation Theory
 
Handout: Argumentation in Artificial Intelligence: From Theory to Practice
Handout: Argumentation in Artificial Intelligence: From Theory to PracticeHandout: Argumentation in Artificial Intelligence: From Theory to Practice
Handout: Argumentation in Artificial Intelligence: From Theory to Practice
 
Argumentation in Artificial Intelligence: From Theory to Practice
Argumentation in Artificial Intelligence: From Theory to PracticeArgumentation in Artificial Intelligence: From Theory to Practice
Argumentation in Artificial Intelligence: From Theory to Practice
 
Handout for the course Abstract Argumentation and Interfaces to Argumentative...
Handout for the course Abstract Argumentation and Interfaces to Argumentative...Handout for the course Abstract Argumentation and Interfaces to Argumentative...
Handout for the course Abstract Argumentation and Interfaces to Argumentative...
 
Argumentation in Artificial Intelligence: 20 years after Dung's work. Left ma...
Argumentation in Artificial Intelligence: 20 years after Dung's work. Left ma...Argumentation in Artificial Intelligence: 20 years after Dung's work. Left ma...
Argumentation in Artificial Intelligence: 20 years after Dung's work. Left ma...
 
Argumentation in Artificial Intelligence: 20 years after Dung's work. Right m...
Argumentation in Artificial Intelligence: 20 years after Dung's work. Right m...Argumentation in Artificial Intelligence: 20 years after Dung's work. Right m...
Argumentation in Artificial Intelligence: 20 years after Dung's work. Right m...
 
Argumentation in Artificial Intelligence
Argumentation in Artificial IntelligenceArgumentation in Artificial Intelligence
Argumentation in Artificial Intelligence
 
Algorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions EnumerationAlgorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions Enumeration
 
Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an ...
Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an ...Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an ...
Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an ...
 
Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...
Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...
Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...
 
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
 
Cerutti-AT2013-Graphical Subjective Logic
Cerutti-AT2013-Graphical Subjective LogicCerutti-AT2013-Graphical Subjective Logic
Cerutti-AT2013-Graphical Subjective Logic
 
Cerutti-AT2013-Trust and Risk
Cerutti-AT2013-Trust and RiskCerutti-AT2013-Trust and Risk
Cerutti-AT2013-Trust and Risk
 
Cerutti -- TAFA2013
Cerutti -- TAFA2013Cerutti -- TAFA2013
Cerutti -- TAFA2013
 
Cerutti--Introduction to Argumentation (seminar @ University of Aberdeen)
Cerutti--Introduction to Argumentation (seminar @ University of Aberdeen)Cerutti--Introduction to Argumentation (seminar @ University of Aberdeen)
Cerutti--Introduction to Argumentation (seminar @ University of Aberdeen)
 

Recently uploaded

Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
Chevonnese Chevers Whyte, MBA, B.Sc.
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Leena Ghag-Sakpal
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
dot55audits
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 

Recently uploaded (20)

Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 

Introduction to Evidential Neural Networks

  • 2. Educational material for students at Cardiff University, UK and the University of Brescia, Italy. 2
  • 3. a primer in bayesian analysis
  • 4. Bayesian Probabilities Bayes theorem p(Y |X) = p(X|Y )p(Y ) p(X) (1) where p(X) = Y p(X|Y )p(Y ) (2) 4
  • 5. Suppose we randomly pick one of the boxes and from that box we randomly select an item of fruit, and having observed which sort of fruit it is we replace it in the box from which it came. We could imagine repeating this process many times. Let us suppose that in so doing we pick the red box 40% of the time and we pick the blue box 60% of the time, and that when we remove an item of fruit from a box we are equally likely to select any of the pieces of fruit in the box. We are told that a piece of fruit has been selected and it is an orange. Which box does it came from? 5 Image from Bishop, C. M. Pattern Recognition and Machine Learning. (Springer-Verlag, 2006).
  • 6. p(B = r|F = o) = p(F = o|B = r)p(B = r) p(F = o) = = 6 8 · 4 10 6 8 · 4 10 + 1 4 · 6 10 = = 3 4 · 2 5 · 20 9 = 2 3 (3) 6 Image from Bishop, C. M. Pattern Recognition and Machine Learning. (Springer-Verlag, 2006).
  • 7. Given the parameters of our model w, we can capture our assumptions about w, before observing the data, in the form of a prior probability distribution p(w). The effect of the observed data D = {t1, . . . , tN } is expressed through the conditional p(D |w), hence Bayes theorem takes the form: p(w|D ) = likelihood p(D |w) prior p(w) p(D ) (4) posterior ∝ likelihood · prior (5) p(D ) = p(D |w)p(w)dw (6) It ensures that the posterior distribution on the left-hand side is a valid probability density and integrates to one. 7
  • 8. Frequentist paradigm • w is considered to be a fixed parameter, whose values is determined by some form of estimator, e.g. the maximum likelihood in which w is set to the value that maximises p(D |w) • Error bars on this estimate are obtained by considering the distribution of possible data sets D . • The negative log of the likelihood function is called an error function: the negative log is a monotonically decreasing function hence maximising the likelihood is equivalent to minimising the error. Bayesian paradigm • There is only one single data set D (the one observed) and the uncertainty in the parameters is expressed through a probability distribution over w. • The inclusion of prior knowledge arises naturally: suppose that a fair-looking coin is tossed three times and lands heads each time. A classical maximum likelihood estimate of the probability of landing heads would give 1. There are cases where you want to reduce the dependence on the prior, hence using noninformative priors. 8
  • 9. Binary variable: Bernoulli Let us consider a single binary random variable x ∈ {0, 1}, e.g. flipping coin, not necessary fair, hence the probability is conditioned by a parameter 0 ≤ µ ≤ 1: p(x = 1|µ) = µ (7) The probability distribution over x is known as the Bernoulli distribution: Bern(x|µ) = µx (1 − µ)1−x (8) E[x] = µ (9) 9
  • 10. Binomial distribution The distribution of m observations of x = 1 given the datasize N is given by the Binomial distribution: Bin(m|N, µ) = N m µm (1 − µ)N−m (10) with E[m] ≡ N m=0 mBin(m|N, µ) = Nµ (11) and var[m] ≡ N m=0 (m − E[m])2 Bin(m|N, µ) = Nµ(1 − µ) (12) 10 Image from Bishop, C. M. Pattern Recognition and Machine Learning. (Springer-Verlag, 2006).
  • 11. How many times, over N = 10 runs, would you see x = 1 if µ = 0.25? m 0 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 11
  • 12. Let’s go back to the Bernoulli distribution Now suppose that we have a data set of observations x = (x1, . . . , xN )T drawn independently from a Bernoulli distribution (iid) whose mean µ is unknown, and we would like to determine this parameter from the data set. p(D |µ) = N n=1 p(xn|µ) = N n=1 µxn (1 − µ)1−xn (13) Let’s maximise the (log)-likelihood to identify the parameter (log simplifies and reduces risks of underflow): ln p(D |µ) = N n=1 ln p(xn|µ) = N n=1 {xn ln µ + (1 − xn) ln(1 − µ)} (14) 12
  • 13. The log likelihood depends on the N observations xn only through their sum n xn, hence the sum provides an example of a sufficient statistics for the data under this distribution, “ hence no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter Fisher 1922„ 13
  • 14. d dµ ln p(D |µ) = 0 N n=1 xn µ − 1 − xn 1 − µ = 0 N n=1 xn − µ µ(1 − µ) = 0 N n=1 xn = Nµ µML = 1 N N n=1 xn aka sample mean. Risk of overfit: consider to toss the coin three times and each time is head 14
  • 15. In order to develop a Bayesian treatment to the overfit problem of the maximum likelihood estimator for the Bernoulli. Since the likelihood takes the form of the product of factors of the form µx (1 − µ)1−x , if we choose a prior to be proportional to powers of µ and (1 − µ) then the posterior distribution, proportional to the product of the prior and the likelihood, will have the same functional form as the prior. This property is called conjugacy. 15
  • 16. Binary variables: Beta distribution Beta(µ|a, b) = Γ(a + b) Γ(a)Γ(b) µa−1 (1 − µ)b−1 with Γ(x) ≡ ∞ 0 ux−1 e−u du E[µ] = a a + b var[µ] = ab (a + b)2(a + b + 1) a and b are hyperparameters controlling the distribution of parameter µ. 16
  • 17. µ a = 0.1 b = 0.1 0 0.5 1 0 1 2 3 µ a = 1 b = 1 0 0.5 1 0 1 2 3 µ a = 2 b = 3 0 0.5 1 0 1 2 3 µ a = 8 b = 4 0 0.5 1 0 1 2 3 17 Images from Bishop, C. M. Pattern Recognition and Machine Learning. (Springer-Verlag, 2006).
  • 18. Considering a beta distribution prior and the binomial likelihood function, and given l = N − m p(µ|m, l, a, b) ∝ µm+a−1 (1 − µ)l+b−1 Hence p(µ|m, l, a, b) is another beta distribution and we can rearrange the normalisation coefficient as follows: p(µ|m, l, a, b) = Γ(m + a + l + b) Γ(m + a)Γ(l + b) µm+a−1 (1 − µ)l+b−1 µ prior 0 0.5 1 0 1 2 µ likelihood function 0 0.5 1 0 1 2 µ posterior 0 0.5 1 0 1 2 18 Images from Bishop, C. M. Pattern Recognition and Machine Learning. (Springer-Verlag, 2006).
  • 19. Epistemic vs Aleatoric uncertainty Aleatoric uncertainty Variability in the outcome of an experiment which is due to inherently random effects (e.g. flipping a fair coin): no additional source of information but Laplace’s daemon can reduce such a variability. Epistemic uncertainty Epistemic state of the agent using the model, hence its lack of knowledge that—in principle—can be reduced on the basis of additional data samples. It is a general property of Bayesian learning that, as we observe more and more data, the epistemic uncertainty represented by the posterior distribution will steadily decrease (the variance decreases). 19
  • 21. Multinomial variables: categorical distribution Let us suppose to roll a dice with K = 6 faces. An observation of this variable x equivalent to x3 = 1 (e.g. the number 3 with face up) can be: x = (0, 0, 1, 0, 0, 0)T Note that such vectors must satisfy K k=1 xk = 1. p(x|µ) = K k=1 µxk k where µ = (µ1, . . . , µK )T , nad the parameters µk are such that µk ≥ 0 and k µk = 1. Generalisation of the Bernoulli 21
  • 22. p(D |µ) = N n=1 K k=1 µxnk k The likelihood depends on the N datapoints only through the K quantities mk = n xnk which represent the number of observations of xk = 1 (e.g. with k = 3, the third face of the dice). These are called the sufficient statistics for this distribution. 22
  • 23. Finding the maximum likelihood requires a Lagrange multiplier that K x=1 mk ln µk + λ K k=1 µk − 1 Hence µML k = mk N which is the fraction of N observations for which xk = 1. 23
  • 24. Multinomial variables: the Dirichlet distribution The Dirichlet distribution is the generalisation of the beta distribution to K dimensions. Dir(µ|α) = Γ(α0) Γ(α1) · · · Γ(αK ) K k=1 µαk −1 k such that k µk = 1, α = (α1, . . . , αK )T , αk ≥ 0 and α0 = K k=1 αk 24
  • 25. Considering a Dirichlet distribution prior and the categorical likelihood function, the posterior is then: p(µ|D , α) = Dir(µ|α + m) = = Γ(α0 + N) Γ(α1 + m1) · · · Γ(αK + mK ) K k=1 µαk +mk −1 k The uniform prior is given by Dir(µ|1) and the Jeffreys’ non-informative prior is given by Dir(µ|(0.5, . . . , 0.5)T ). The marginals of a Dirichlet distribution are beta distributions. 25
  • 26. neural networks and uncertainty awareness
  • 27. Change the loss function so to output pieces of evidences in favour of different classes that should then be considered through Bayesian update resulting into a Dirichlet Distribution Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classification uncertainty.” Advances in Neural Information Processing Systems. 2018. 27
  • 28. From Evidence to Dirichlet Let us now assume a Dirichlet distribution over K classes that is the result of Bayesian update with N observations and starting with a uniform prior: Dir(µ | α) = Dir(µ | e1 + 1, e2 + 2, . . . , eK + 1 ) where ei is the number of observations (evidence) for the class k, and k ek = N. 28
  • 29. Dirichlet and Epistemic Uncertainty The epistemic uncertainty associated to a Dirichlet distribution Dir(µ | α) is given by u = K S with K the number of classes and S = α0 = K k=1 αk is the Dirichlet strength. Note that if the Dirichlet has been computed as the resulting of Bayesian update from a uniform prior, 0 ≤ u ≤ 1, and u = 1 implies that we are considering the uniform distribution (an extreme case of Dirichlet distribution). Let us denote with µk αk S . 29
  • 30. Loss function If we then consider Dir(mi | αi ) as the prior for a Multinomial p(yi | µi ), we can then compute the expected squared error (aka Brier score) E[ yi − mi 2 2 ] = K k=1 E[y2 i,k − 2yi,k µi,k + µ2 i,k ] = X k=1 y2 i,k − 2yi,k E[µi,k ] + E[µ2 i,k ] = = K k=1 y2 i,k − 2yi,k E[µi,k ] + E[µi,k ]2 + var[µi,k ] = = K k=1 (yi,k − E[µi,k ])2 + var[µi,k ] = = K k=1 yi,k − αi,k Si 2 + αi,k (Si − αi,k ) S2 i (Si + 1) = = K k=1 (yi,k − µi,k )2 + µi,k (1 − µi,k )2 Si + 1 The loss over a batch of training samples is the sum of the loss for each sample in the batch. Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classification uncertainty.” Advances in Neural Information Processing Systems. 2018. 30
  • 31. Learning to say “I don’t know” To avoid generating evidence for all the classes when the network cannot classify a given sample (epistemic uncertainty), we introduce a term in the loss function that penalises the divergence from the uniform distribution: L = N i=1 E[ yi − µi 2 2 ] + λt N i=1 KL ( Dir(µi | αi ) || Dir(µi | 1) ) where: • λt is another hyperparameter, and the suggestion is to use it parametric on the number of training epochs, e.g. λt = min 1, t CONST with t the number of current training epoch, so that the effect of the KL divergence is gradually increased to avoid premature convergence to the uniform distribution in the early epoch where the learning algorithm still needs to explore the parameter space; • αi = yi + (1 − yi ) · αi are the Dirichlet parameters the neural network in a forward pass has put on the wrong classes, and the idea is to minimise them as much as possible. Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classification uncertainty.” Advances in Neural Information Processing Systems. 2018. 31
  • 32. KL recap Consider some unknown distribution p(x) and suppose that we have modelled this using q(x). If we use q(x) instead of p(x) to represent the true values of x, the average additional amount of information required is: KL(p||q) = − p(x) ln q(x)dx − − p(x) ln p(x)dx = − p(x) ln q(x) p(x) dx = −E ln q(x) p(x) (15) This is known as the relative entropy or Kullback-Leibler divergence, or KL divergence between the distributions p(x) and q(x). Properties: • KL(p||q) ≡ KL(q||p); • KL(p||q) ≥ 0 and KL(p||q) = 0 if and only if p = q 32
  • 33. KL ( Dir(µi | αi ) || Dir(µi | 1) ) = ln Γ( K k=1 αi,k ) Γ(K) K k=1 Γ(αi,k ) + K k=1 (αi,k −1)  ψ(αi,k ) − ψ   K j=1 αi,j     where ψ(x) = d dx ln ( Γ(x) ) is the digamma function Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classification uncertainty.” Advances in Neural Information Processing Systems. 2018. 33
  • 34. EDL and robustness to FGS Sensoy, Murat, Lance Kaplan, and Melih Kandemir. “Evidential deep learning to quantify classification uncertainty.” Advances in Neural Information Processing Systems. 2018. 34
  • 35. EDL + GAN for adversarial training Sensoy, Murat, et al. “Uncertainty-Aware Deep Classifiers using Generative Models.” AAAI 2020 35
  • 36. aluation VAE + GAN G D' D For each data point in latent space, we generate a new noisy sample, which is similar to it to some extent. Hence, we avoid mode-collapse problem. be trivially predicted without learning the actual structure of the data. Similarly, if the noise distribution is too close to the data distribution, the density ratio would be trivially one and the learning will be deprived. G: Generator in the latent space of VAE D’: Discriminator in the latent space D : Discriminator in the input space Figure 2: Original training samples (top), samples recon- structed by the VAE (middle), and the samples generated by the proposed method (bottom) over a number of epochs. for high dimensional data by maximizingSensoy, Murat, et al. “Uncertainty-Aware Deep Classifiers using Generative Models.” AAAI 2020 36
  • 37. Robustness against FGS Sensoy, Murat, et al. “Uncertainty-Aware Deep Classifiers using Generative Models.” AAAI 2020 37
  • 38. Anomaly detection (mnist) (cifar10) Sensoy, Murat, et al. “Uncertainty-Aware Deep Classifiers using Generative Models.” AAAI 2020 38