SlideShare a Scribd company logo
bayesian deep learning
김규래
January 18, 2019
Sogang University SPS Lab.
Bayesian Deep Learning Preview
∙ Weights are random variables instead of scalars
1
Classic Deep Learning
∙ A classification model is expressed as f(x) = p(y ∈ c|x, θ)
”The probability that y belongs to the class c predicted from the
observation x”
∙ Training a model is defined as θ∗
= arg minθ
1
N
∑N
i L(xi, yi, θ)
”Finding the parameter θ∗
that minimizes the loss metric L”
2
Likelihood
A dataset is denoted as {(x, y)} = D
L(D, θ) = − log p(D|θ)
∙ How likely is the distribution p to fit the data.
∙ minimizing L is maximum likelihood estimation (MLE)
∙ The log negative probability density function (PDF) of p is often
used as MLE
∙ binary cross entropy (BCE) loss
∙ Ordinary Least Squares (OLS) loss
3
Maximum Likelihood Estimation
1
1https://blogs.sas.com/content/iml/2011/10/12/maximum-likelihood-estimation-in-
sasiml.html 4
Maximum Likelihood Estimation
For fitting a gaussian distribution to the data,
minimize L(x, y, θ)θ
= −logp(x, y | θ, σ)
= −log(
1
√
2πσ
exp −
(f(x, θ) − y)2
2σ2
)
= −log(
1
√
2πσ
) −
1
2σ2
(f(x, θ) − y)2
∝ −(f(x, θ) − y)2
L(X, Y, θ) = || f(X, θ) − Y ||2
2
5
Bayes Rule
6
Regularized Log Likelihood
L(D, θ) = −(log p(D|θ) + logp(θ))
∙ The use of Bayes’ rule to incorporate ’prior knowledge’ into the
problem
∙ Also called maximum a posteriori estimation (MAP)
p(θ|D) =
p(D|θ)p(θ)
p(D)
∝ p(D|θ)p(θ)
L(x, y, θ) = − log p(θ|D)
∝ − log (p(D|θ)p(θ))
= −(log p(D|θ) + logp(θ))
7
MAP and MLE Estimation
θ∗
MAP = arg min
θ
[− log p(D|θ) − logp(θ)]
θ∗
MLE = arg min
θ
[− log p(D|θ)]
∙ MLE and MAP estimation only estimate a fixed θ
∙ The resulting predictions are a fixed probability value
∙ In reality, θ might be better expressed as a ’distribution’
f(x) = p(y|xθ∗
MAP) ∈ R
8
Bayesian Inference
Eθ[ p(y|x, D) ] =
∫
p(y|x, D, θ)p(θ|D)dθ
∙ Integrating across all probable values of θ (Marginalization)
∙ Solving the integral treats θ as a distribution
∙ For a typical modern deep learning network, θ ∈ R1000000...
∙ Integrating for all possible values of θ is intractable (impossible)
9
Bayesian Methods
Instead of directly solving the integral,
p(y|x, D) =
∫
p(y|x, D, θ)p(θ|D)dθ
we approximate the integral and compute
∙ The expectation E[ p(y|x, D) ]
∙ The variance V[ p(y|x, D) ]
using...
∙ Monte Carlo Sampling
∙ Variational Inference (VI)
10
Output Distribution
Predicted distribution of p(y|x, D) can be visualized as
∙ Grey region is the confidence interval computed from V[ p(y|x, D) ]
∙ Blue line is the mean of the prediction E[ p(y|x, D) ]
11
Why Bayesian Inference?
Modelling uncertainty is becoming important in failure critical
domains
∙ Autonomous driving
∙ Medical diagnostics
∙ Algorithmic stock trading
∙ Public security
12
Decision Boundary and Misprediction
∙ MLE and MAP estimations lead to a fixed decision boundary
∙ ’Distant samples’ are often mispredicted with very high confidence
∙ Learning a ’distribution’ can fix this problem
13
Adversarial Attacks
∙ Changing even a single pixel can lead to misprediction
∙ These mispredictions have a very high confidence
2
2Su, Jiawei, Danilo Vasconcellos Vargas, and Sakurai Kouichi. ”One pixel attack for
fooling deep neural networks.” arXiv preprint arXiv:1710.08864 (2017).
14
Autonomous Driving
3
3Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in bayesian deep
learning for computer vision?.” Advances in neural information processing systems.
2017. 15
Monte Carlo Intergration
p(y|x, D) =
∫
p(y|x, D, θ)p(θ|D)dθ
≈
1
S
S∑
s=0
p(y|x, D, θs)
where θs are samples from p(θ|D)
∙ Samples are directly pulled from p(θ|D)
∙ In case sampling from p is not possible, use MCMC
16
Monte Carlo Integration
17
Variational Inference
∙ Variational Inference converts an inference problem into an
optimization problem.
∙ instead of using a complicated distribution such as p(θ | D) we
find a tractable approximation q(θ, λ) parameterized with λ
∙ This is equivalent to minimizing the KL divergence of p and q
∙ Using a distribution q very different to p leads to bad solutions
minimize
λ
KL(q(x; λ) || p(x))
18
Variational Inference
KL(q(θ; λ)||p(θ|D))
= −
∫
q(θ; λ) log
p(θ|D)
q(θ; λ)
dθ
= −
∫
q(θ; λ) log p(θ|D)dθ +
∫
q(θ; λ) log q(θ; λ)dθ
= −
∫
q(θ; λ) log
p(θ, D)
p(D)
dθ +
∫
q(θ; λ) log q(θ; λ)dθ
= −
∫
q(θ; λ) log p(θ, D)dθ +
∫
q(θ; λ) log p(D)dθ +
∫
q(θ; λ) log q(θ; λ)dθ
= Eq[− log p(θ, D) + log q(θ; λ)] + log p(D)
where p(D) =
∫
p(θ|D)p(θ)dθ
19
Evidence Lower Bound (ELBO)
Because of the evidence term p(D) is intractable, optimizing the KL
divergence directly is hard.
However By reformulating the problem,
KL(q(θ; λ)||p(θ|D)) = Eq[− log p(θ, D) + log q(θ; p)] + log p(D)
log p(D) = KL(q(θ; λ)||p(θ|D)) − Eq[− log p(θ, D) + log q(θ; λ)]
log p(D) ≥ Eq[log p(θ, D) − log q(θ; λ)]
∵ KL(q(θ, λ)||p(θ|D)) ≥ 0
20
Evidence Lower Bound (ELBO)
maximizeλ L[q(θ; λ)] = Eq[log p(θ, D) − log q(θ; λ)]
∙ Maximizing the evidence lower bound is equivalent of minimizing
the KL divergence
∙ ELBO and KL divergence become equal at the optimum
21
Variational Inference
varitional inference (VI) and monte carlo methods, or even
combining both can yield very powerful solutions
22
Dropout Regularization
∙ Very popular deep learning regularization method before batch
normalization (9000 citations!)
∙ Make weight Wij = 0 following a Bernoulli(p) distribution
4
4Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from
overfitting.” The Journal of Machine Learning Research 15.1 (2014): 1929-1958. 23
Dropout Regularization
∙ Regularization effect, less prone to over fitting
∙ Distribution of weight is much sparser. Good for network
compression. 24
Dropout As Variational Approximation
Solving MLE or MAP using dropout is variational inference.
Yarin Gal, PhD Thesis, 2016
The distribution of the weights p(W|D) is approximated using q(p, W)
q(p) is the distribution of the weight W with dropout applied
yi = (Wiyi−1 + bi) ri where ri ∼ Bern(p)
Since L2 loss and L2 regularization assumes W ∼ N(µ, σ2
), the
resulting distribution q is,
q(Wij; p) ∼ p N(µij, σ2
ij) + (1 − p) N(0, σ2
ij)
25
Dropout As Variational Approximation
Since the ELBO is given as,
maximizeW,p L[q(W; p)]
= Eq[ log p(W, D) − log q(W; p) ]
∝ Eq[ log p(W|D) −
p
2
|| W ||2
2 ]
=
1
N
N∑
i∈D
log p(W|xi, yi) −
p
2σ2
|| W ||2
2
is the optimization objective.
∙ if p approaches 1 or 0, q(W; p) becomes a constant distribution.
26
Monte Carlo Inference
Eθ[ p(y|x, D)] =
∫
p(y|x, D, θ)p(θ)dθ
≈
∫
p(y|x, D, θ)q(θ; p)dθ
= Eq[p(y|x, D)]
≈
1
T
T∑
t
p(y|x, D, θt) θt ∼ q(θ; p)
∙ Prediction is done with dropout turned on and averaging multiple
evaluations.
∙ This is equivalent to monte carlo integration by sampling from the
variational distribution.
27
Monte Carlo Inference
Vθ[ p(y|x, D)] ≈
1
S
S∑
s
( p(y|x, D, θs) − Eθ[p(y|x, D)] )2
Uncertainty is the variance of the samples taken from the variational
distribution.
28
Monte Carlo Dropout
Examples from the mauna loa CO2 dataset 6
6Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approximation:
Representing model uncertainty in deep learning.” ICML 2016.
29
Monte Carlo Dropout Example
Prediction using only 10 samples 7
7Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approximation:
Representing model uncertainty in deep learning.” ICML 2016.
30
Monte Carlo Dropout Example
Semantic class segmentation 8
8Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in bayesian deep
learning for computer vision?.” NIPS 2017.
31
Monte Carlo Dropout Example
Spatial depth regression 9
9Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in bayesian deep
learning for computer vision?.” NIPS 2017.
32
Medical Diagnostics Example
∙ Green: True positive, Red: False Positive
10
10DeVries, Terrance, and Graham W. Taylor. ”Leveraging Uncertainty Estimates for
Predicting Segmentation Quality.” arXiv preprint arXiv:1807.00502 (2018).
33
Medical Diagnostics Example
11
∙ Green: True positive, Blue: False Negative
11DeVries, Terrance, and Graham W. Taylor. ”Leveraging Uncertainty Estimates for
Predicting Segmentation Quality.” arXiv:1807.00502 (2018).
34
Possible Medical Applications
∙ Statistically correct uncertainty quantification
∙ Bandit setting clinical treatment planning (reinforcement learning)
35
Possible Applications: Bandit Setting
Maximizing outcome from multiple slot machines
with estimated distribution.
36
Possible Applications: Bandit Setting
Highest predicted outcome? or Lowest prediction uncertainty?
Choose highest predicted outcome? or explore more samples?
(Exploitation-exploration tradeoff)
37
Mice Skin Tumor Treatment
Mice with induced cancer tumors.
Treatment options:
∙ No threatment
∙ 5-FU (100mg/kg)
∙ imiquimod (8mg/kg)
∙ combination of imiquimod and 5-FU 38
Upper Confidence Bound
Treatment selection policy
at = arg max
a∈A
[µa(xt) + βσ2
a(xt)]
Quality measure
R(T) =
T∑
t
[max
a∈A
µa(xt) − µa(xt)]
where A is the set of possible treatments
µ(x), σ2
(x) is the predicted mean, variance at x
39
Upper Confidence Bound
Treatment based on a Bayesian method (Gaussian Process) lead to
longest life expectancy.
12
12Contextual Bandits for Adapting Treatment in a Mouse Model of de Novo
Carcinogenesis, A. Durand, C. Achilleos, D. Iacovides, K. Strati, G. D. Mitsis, and J.
Pineau, MLHC 2018
40
References
∙ Murphy, Kevin P. ”Machine learning: a probabilistic perspective.”
(2012).
∙ Yarin Gal, ”Uncertainty in Deep Learning”, Ph.D Thesis (2016)
∙ Blundell, Charles, et al. ”Weight uncertainty in neural networks.”
arXiv preprint arXiv:1505.05424 (2015).
∙ Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian
approximation: Representing model uncertainty in deep learning.”
international conference on machine learning. 2016.
∙ Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in
bayesian deep learning for computer vision?.” Advances in neural
information processing systems. 2017.
41
References
∙ Leibig, Christian, et al. ”Leveraging uncertainty information from
deep neural networks for disease detection.” Scientific reports 7.1
(2017): 17816.
∙ Contextual Bandits for Adapting Treatment in a Mouse Model of de
Novo Carcinogenesis A. Durand, C. Achilleos, D. Iacovides, K. Strati,
G. D. Mitsis, and J. Pineau Machine Learning for Healthcare
Conference (MLHC)
∙ Su, Jiawei, Danilo Vasconcellos Vargas, and Sakurai Kouichi. ”One
pixel attack forfooling deep neural networks.” arXiv preprint
arXiv:1710.08864 (2017).
42
43

More Related Content

What's hot

Predictive uncertainty of deep models and its applications
Predictive uncertainty of deep models and its applicationsPredictive uncertainty of deep models and its applications
Predictive uncertainty of deep models and its applications
NAVER Engineering
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
Rogier Geertzema
 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep Learning
Christian Perone
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
freshdatabos
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Modeling uncertainty in deep learning
Modeling uncertainty in deep learning Modeling uncertainty in deep learning
Modeling uncertainty in deep learning
Sungjoon Choi
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
Rodion Kiryukhin
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
Christopher Marker
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
Tushar Tank
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
Massimiliano Patacchiola
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
CloudxLab
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
Amir Al-Ansary
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Databricks
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
Sangwoo Mo
 

What's hot (20)

Predictive uncertainty of deep models and its applications
Predictive uncertainty of deep models and its applicationsPredictive uncertainty of deep models and its applications
Predictive uncertainty of deep models and its applications
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep Learning
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Modeling uncertainty in deep learning
Modeling uncertainty in deep learning Modeling uncertainty in deep learning
Modeling uncertainty in deep learning
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Random forest
Random forestRandom forest
Random forest
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 

Similar to Bayesian Deep Learning

Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
Christian Robert
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
Frank Nielsen
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
mehdi Cherti
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCC
Stéphanie Roger
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
Christian Robert
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learning
Yujiro Katagiri
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
Daisuke Yoneoka
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
Umberto Picchini
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
Tomasz Kusmierczyk
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
Federico Cerutti
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
Elvis DOHMATOB
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
Alexander Litvinenko
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
KSChidanandKumarJSSS
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
Altoros
 

Similar to Bayesian Deep Learning (20)

Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCC
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learning
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
sada_pres
sada_pressada_pres
sada_pres
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 

Bayesian Deep Learning

  • 1. bayesian deep learning 김규래 January 18, 2019 Sogang University SPS Lab.
  • 2. Bayesian Deep Learning Preview ∙ Weights are random variables instead of scalars 1
  • 3. Classic Deep Learning ∙ A classification model is expressed as f(x) = p(y ∈ c|x, θ) ”The probability that y belongs to the class c predicted from the observation x” ∙ Training a model is defined as θ∗ = arg minθ 1 N ∑N i L(xi, yi, θ) ”Finding the parameter θ∗ that minimizes the loss metric L” 2
  • 4. Likelihood A dataset is denoted as {(x, y)} = D L(D, θ) = − log p(D|θ) ∙ How likely is the distribution p to fit the data. ∙ minimizing L is maximum likelihood estimation (MLE) ∙ The log negative probability density function (PDF) of p is often used as MLE ∙ binary cross entropy (BCE) loss ∙ Ordinary Least Squares (OLS) loss 3
  • 6. Maximum Likelihood Estimation For fitting a gaussian distribution to the data, minimize L(x, y, θ)θ = −logp(x, y | θ, σ) = −log( 1 √ 2πσ exp − (f(x, θ) − y)2 2σ2 ) = −log( 1 √ 2πσ ) − 1 2σ2 (f(x, θ) − y)2 ∝ −(f(x, θ) − y)2 L(X, Y, θ) = || f(X, θ) − Y ||2 2 5
  • 8. Regularized Log Likelihood L(D, θ) = −(log p(D|θ) + logp(θ)) ∙ The use of Bayes’ rule to incorporate ’prior knowledge’ into the problem ∙ Also called maximum a posteriori estimation (MAP) p(θ|D) = p(D|θ)p(θ) p(D) ∝ p(D|θ)p(θ) L(x, y, θ) = − log p(θ|D) ∝ − log (p(D|θ)p(θ)) = −(log p(D|θ) + logp(θ)) 7
  • 9. MAP and MLE Estimation θ∗ MAP = arg min θ [− log p(D|θ) − logp(θ)] θ∗ MLE = arg min θ [− log p(D|θ)] ∙ MLE and MAP estimation only estimate a fixed θ ∙ The resulting predictions are a fixed probability value ∙ In reality, θ might be better expressed as a ’distribution’ f(x) = p(y|xθ∗ MAP) ∈ R 8
  • 10. Bayesian Inference Eθ[ p(y|x, D) ] = ∫ p(y|x, D, θ)p(θ|D)dθ ∙ Integrating across all probable values of θ (Marginalization) ∙ Solving the integral treats θ as a distribution ∙ For a typical modern deep learning network, θ ∈ R1000000... ∙ Integrating for all possible values of θ is intractable (impossible) 9
  • 11. Bayesian Methods Instead of directly solving the integral, p(y|x, D) = ∫ p(y|x, D, θ)p(θ|D)dθ we approximate the integral and compute ∙ The expectation E[ p(y|x, D) ] ∙ The variance V[ p(y|x, D) ] using... ∙ Monte Carlo Sampling ∙ Variational Inference (VI) 10
  • 12. Output Distribution Predicted distribution of p(y|x, D) can be visualized as ∙ Grey region is the confidence interval computed from V[ p(y|x, D) ] ∙ Blue line is the mean of the prediction E[ p(y|x, D) ] 11
  • 13. Why Bayesian Inference? Modelling uncertainty is becoming important in failure critical domains ∙ Autonomous driving ∙ Medical diagnostics ∙ Algorithmic stock trading ∙ Public security 12
  • 14. Decision Boundary and Misprediction ∙ MLE and MAP estimations lead to a fixed decision boundary ∙ ’Distant samples’ are often mispredicted with very high confidence ∙ Learning a ’distribution’ can fix this problem 13
  • 15. Adversarial Attacks ∙ Changing even a single pixel can lead to misprediction ∙ These mispredictions have a very high confidence 2 2Su, Jiawei, Danilo Vasconcellos Vargas, and Sakurai Kouichi. ”One pixel attack for fooling deep neural networks.” arXiv preprint arXiv:1710.08864 (2017). 14
  • 16. Autonomous Driving 3 3Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in bayesian deep learning for computer vision?.” Advances in neural information processing systems. 2017. 15
  • 17. Monte Carlo Intergration p(y|x, D) = ∫ p(y|x, D, θ)p(θ|D)dθ ≈ 1 S S∑ s=0 p(y|x, D, θs) where θs are samples from p(θ|D) ∙ Samples are directly pulled from p(θ|D) ∙ In case sampling from p is not possible, use MCMC 16
  • 19. Variational Inference ∙ Variational Inference converts an inference problem into an optimization problem. ∙ instead of using a complicated distribution such as p(θ | D) we find a tractable approximation q(θ, λ) parameterized with λ ∙ This is equivalent to minimizing the KL divergence of p and q ∙ Using a distribution q very different to p leads to bad solutions minimize λ KL(q(x; λ) || p(x)) 18
  • 20. Variational Inference KL(q(θ; λ)||p(θ|D)) = − ∫ q(θ; λ) log p(θ|D) q(θ; λ) dθ = − ∫ q(θ; λ) log p(θ|D)dθ + ∫ q(θ; λ) log q(θ; λ)dθ = − ∫ q(θ; λ) log p(θ, D) p(D) dθ + ∫ q(θ; λ) log q(θ; λ)dθ = − ∫ q(θ; λ) log p(θ, D)dθ + ∫ q(θ; λ) log p(D)dθ + ∫ q(θ; λ) log q(θ; λ)dθ = Eq[− log p(θ, D) + log q(θ; λ)] + log p(D) where p(D) = ∫ p(θ|D)p(θ)dθ 19
  • 21. Evidence Lower Bound (ELBO) Because of the evidence term p(D) is intractable, optimizing the KL divergence directly is hard. However By reformulating the problem, KL(q(θ; λ)||p(θ|D)) = Eq[− log p(θ, D) + log q(θ; p)] + log p(D) log p(D) = KL(q(θ; λ)||p(θ|D)) − Eq[− log p(θ, D) + log q(θ; λ)] log p(D) ≥ Eq[log p(θ, D) − log q(θ; λ)] ∵ KL(q(θ, λ)||p(θ|D)) ≥ 0 20
  • 22. Evidence Lower Bound (ELBO) maximizeλ L[q(θ; λ)] = Eq[log p(θ, D) − log q(θ; λ)] ∙ Maximizing the evidence lower bound is equivalent of minimizing the KL divergence ∙ ELBO and KL divergence become equal at the optimum 21
  • 23. Variational Inference varitional inference (VI) and monte carlo methods, or even combining both can yield very powerful solutions 22
  • 24. Dropout Regularization ∙ Very popular deep learning regularization method before batch normalization (9000 citations!) ∙ Make weight Wij = 0 following a Bernoulli(p) distribution 4 4Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting.” The Journal of Machine Learning Research 15.1 (2014): 1929-1958. 23
  • 25. Dropout Regularization ∙ Regularization effect, less prone to over fitting ∙ Distribution of weight is much sparser. Good for network compression. 24
  • 26. Dropout As Variational Approximation Solving MLE or MAP using dropout is variational inference. Yarin Gal, PhD Thesis, 2016 The distribution of the weights p(W|D) is approximated using q(p, W) q(p) is the distribution of the weight W with dropout applied yi = (Wiyi−1 + bi) ri where ri ∼ Bern(p) Since L2 loss and L2 regularization assumes W ∼ N(µ, σ2 ), the resulting distribution q is, q(Wij; p) ∼ p N(µij, σ2 ij) + (1 − p) N(0, σ2 ij) 25
  • 27. Dropout As Variational Approximation Since the ELBO is given as, maximizeW,p L[q(W; p)] = Eq[ log p(W, D) − log q(W; p) ] ∝ Eq[ log p(W|D) − p 2 || W ||2 2 ] = 1 N N∑ i∈D log p(W|xi, yi) − p 2σ2 || W ||2 2 is the optimization objective. ∙ if p approaches 1 or 0, q(W; p) becomes a constant distribution. 26
  • 28. Monte Carlo Inference Eθ[ p(y|x, D)] = ∫ p(y|x, D, θ)p(θ)dθ ≈ ∫ p(y|x, D, θ)q(θ; p)dθ = Eq[p(y|x, D)] ≈ 1 T T∑ t p(y|x, D, θt) θt ∼ q(θ; p) ∙ Prediction is done with dropout turned on and averaging multiple evaluations. ∙ This is equivalent to monte carlo integration by sampling from the variational distribution. 27
  • 29. Monte Carlo Inference Vθ[ p(y|x, D)] ≈ 1 S S∑ s ( p(y|x, D, θs) − Eθ[p(y|x, D)] )2 Uncertainty is the variance of the samples taken from the variational distribution. 28
  • 30. Monte Carlo Dropout Examples from the mauna loa CO2 dataset 6 6Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approximation: Representing model uncertainty in deep learning.” ICML 2016. 29
  • 31. Monte Carlo Dropout Example Prediction using only 10 samples 7 7Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approximation: Representing model uncertainty in deep learning.” ICML 2016. 30
  • 32. Monte Carlo Dropout Example Semantic class segmentation 8 8Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in bayesian deep learning for computer vision?.” NIPS 2017. 31
  • 33. Monte Carlo Dropout Example Spatial depth regression 9 9Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in bayesian deep learning for computer vision?.” NIPS 2017. 32
  • 34. Medical Diagnostics Example ∙ Green: True positive, Red: False Positive 10 10DeVries, Terrance, and Graham W. Taylor. ”Leveraging Uncertainty Estimates for Predicting Segmentation Quality.” arXiv preprint arXiv:1807.00502 (2018). 33
  • 35. Medical Diagnostics Example 11 ∙ Green: True positive, Blue: False Negative 11DeVries, Terrance, and Graham W. Taylor. ”Leveraging Uncertainty Estimates for Predicting Segmentation Quality.” arXiv:1807.00502 (2018). 34
  • 36. Possible Medical Applications ∙ Statistically correct uncertainty quantification ∙ Bandit setting clinical treatment planning (reinforcement learning) 35
  • 37. Possible Applications: Bandit Setting Maximizing outcome from multiple slot machines with estimated distribution. 36
  • 38. Possible Applications: Bandit Setting Highest predicted outcome? or Lowest prediction uncertainty? Choose highest predicted outcome? or explore more samples? (Exploitation-exploration tradeoff) 37
  • 39. Mice Skin Tumor Treatment Mice with induced cancer tumors. Treatment options: ∙ No threatment ∙ 5-FU (100mg/kg) ∙ imiquimod (8mg/kg) ∙ combination of imiquimod and 5-FU 38
  • 40. Upper Confidence Bound Treatment selection policy at = arg max a∈A [µa(xt) + βσ2 a(xt)] Quality measure R(T) = T∑ t [max a∈A µa(xt) − µa(xt)] where A is the set of possible treatments µ(x), σ2 (x) is the predicted mean, variance at x 39
  • 41. Upper Confidence Bound Treatment based on a Bayesian method (Gaussian Process) lead to longest life expectancy. 12 12Contextual Bandits for Adapting Treatment in a Mouse Model of de Novo Carcinogenesis, A. Durand, C. Achilleos, D. Iacovides, K. Strati, G. D. Mitsis, and J. Pineau, MLHC 2018 40
  • 42. References ∙ Murphy, Kevin P. ”Machine learning: a probabilistic perspective.” (2012). ∙ Yarin Gal, ”Uncertainty in Deep Learning”, Ph.D Thesis (2016) ∙ Blundell, Charles, et al. ”Weight uncertainty in neural networks.” arXiv preprint arXiv:1505.05424 (2015). ∙ Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approximation: Representing model uncertainty in deep learning.” international conference on machine learning. 2016. ∙ Kendall, Alex, and Yarin Gal. ”What uncertainties do we need in bayesian deep learning for computer vision?.” Advances in neural information processing systems. 2017. 41
  • 43. References ∙ Leibig, Christian, et al. ”Leveraging uncertainty information from deep neural networks for disease detection.” Scientific reports 7.1 (2017): 17816. ∙ Contextual Bandits for Adapting Treatment in a Mouse Model of de Novo Carcinogenesis A. Durand, C. Achilleos, D. Iacovides, K. Strati, G. D. Mitsis, and J. Pineau Machine Learning for Healthcare Conference (MLHC) ∙ Su, Jiawei, Danilo Vasconcellos Vargas, and Sakurai Kouichi. ”One pixel attack forfooling deep neural networks.” arXiv preprint arXiv:1710.08864 (2017). 42
  • 44. 43