SlideShare a Scribd company logo
1 of 21
Download to read offline
Monte Carlo Methods
Lecture slides for Chapter 17 of Deep Learning
www.deeplearningbook.org
Ian Goodfellow
Last updated 2017-12-29
(Goodfellow 2017)
Roadmap
• Basics of Monte Carlo methods
• Importance Sampling
• Markov Chains
(Goodfellow 2017)
Randomized Algorithms
Las Vegas Monte Carlo
Type of
Answer
Exact
Random amount
of error
Runtime
Random (until
answer found)
Chosen by user
(longer runtime
gives lesss error)
(Goodfellow 2017)
Estimating sums / integrals
with samples
has an exponential number of terms, and no exact simplification is known), it is
often possible to approximate it using Monte Carlo sampling. The idea is to view
the sum or integral as if it were an expectation under some distribution and to
approximate the expectation by a corresponding average. Let
s =
X
x
p(x)f(x) = Ep[f(x)] (17.1)
or
s =
Z
p(x)f(x)dx = Ep[f(x)] (17.2)
be the sum or integral to estimate, rewritten as an expectation, with the constraint
that p is a probability distribution (for the sum) or a probability density (for the
integral) over random variable x.
We can approximate s by drawing n samples x(1), . . . , x(n) from p and then
forming the empirical average
ŝn =
1
n
n
X
i=1
f(x(i)
). (17.3)
This approximation is justified by a few different properties. The first trivial
observation is that the estimator ŝ is unbiased, since
Monte Carlo Sampling
egral cannot be computed exactly (for example, the sum
mber of terms, and no exact simplification is known), it is
ximate it using Monte Carlo sampling. The idea is to view
if it were an expectation under some distribution and to
ation by a corresponding average. Let
s =
X
x
p(x)f(x) = Ep[f(x)] (17.1)
s =
Z
p(x)f(x)dx = Ep[f(x)] (17.2)
o estimate, rewritten as an expectation, with the constraint
distribution (for the sum) or a probability density (for the
variable x.
te s by drawing n samples x(1), . . . , x(n) from p and then
average
ŝn =
1
n
n
X
i=1
f(x(i)
). (17.3)
justified by a few different properties. The first trivial
estimator ŝ is unbiased, since
17.1.2 Basics of Monte Carlo Sampling
When a sum or an integral cannot be computed exactly (for example, the sum
has an exponential number of terms, and no exact simplification is known), it is
often possible to approximate it using Monte Carlo sampling. The idea is to view
the sum or integral as if it were an expectation under some distribution and to
approximate the expectation by a corresponding average. Let
s =
X
x
p(x)f(x) = Ep[f(x)] (17.1)
or
s =
Z
p(x)f(x)dx = Ep[f(x)] (17.2)
be the sum or integral to estimate, rewritten as an expectation, with the constraint
that p is a probability distribution (for the sum) or a probability density (for the
integral) over random variable x.
We can approximate s by drawing n samples x(1), . . . , x(n) from p and then
forming the empirical average
ŝn =
1
n
n
X
i=1
f(x(i)
). (17.3)
This approximation is justified by a few different properties. The first trivial
observation is that the estimator ŝ is unbiased, since
(Goodfellow 2017)
Justification
• Unbiased:
• The expected value for finite n is equal to the correct value
• The value for any specific n samples will have random
error, but the errors for different sample sets cancel out
• Low variance:
• Variance is O(1/n)
• For very large n, the error converges “almost surely” to 0
(Goodfellow 2017)
Roadmap
• Basics of Monte Carlo methods
• Importance Sampling
• Markov Chains
(Goodfellow 2017)
Non-unique decomposition
ectation by a corresponding average. Let
s =
X
x
p(x)f(x) = Ep[f(x)] (17.1)
s =
Z
p(x)f(x)dx = Ep[f(x)] (17.2)
al to estimate, rewritten as an expectation, with the constraint
ty distribution (for the sum) or a probability density (for the
m variable x.
mate s by drawing n samples x(1), . . . , x(n) from p and then
al average
ŝn =
1
n
n
X
i=1
f(x(i)
). (17.3)
Say we want to compute
Z
a(x)b(x)c(x)dx.
Which part is p? Which part is f ?
p=a and f=bc? p=ab and f=c? etc.
No unique decomposition.
We can always pull part of any p into f.
(Goodfellow 2017)
Importance Sampling
n equation 17.2 is deciding which part of the integrand should
bility p(x) and which part of the integrand should play the
x) whose expected value (under that probability distribution)
here is no unique decomposition because p(x)f(x) can always
p(x)f(x) = q(x)
p(x)f(x)
q(x)
, (17.8)
from q and average pf
q . In many cases, we wish to compute
given p and an f, and the fact that the problem is specified
expectation suggests that this p and f would be a natural
ator of the variance is often preferred, in which the sum of squared
n 1 instead of n.
589
This is our new p,
meaning it is the
distribution we will draw
samples from
This ratio is our new f,
meaning we will evaluate
it at each sample
(Goodfellow 2017)
Why use importance sampling?
• Maybe it is feasible to sample from q but not from p
• This is how GANs work
• A good q can reduce the variance of the estimate
• Importance sampling is still unbiased for every q
(Goodfellow 2017)
Optimal q
• Determining the optimal q requires solving the original integral,
so not useful in practice
• Useful to understand intuition behind importance sampling
• This q minimizes the variance
• Places more mass on points where the weighted function is larger
Var[ŝq] = Var[
p(x)f(x)
q(x)
]/n. (17.12)
occurs when q is
q⇤
(x) =
p(x)|f(x)|
Z
, (17.13)
tion constant, chosen so that q⇤(x) sums or integrates to
importance sampling distributions put more weight where
In fact, when f(x) does not change sign, Var[ŝq⇤ ] = 0,
ample is sufficient when the optimal distribution is used.
ecause the computation of q⇤ has essentially solved the
usually not practical to use this approach of drawing a
ptimal distribution.
ing distribution q is valid (in the sense of yielding the
⇤
(Goodfellow 2017)
Roadmap
• Basics of Monte Carlo methods
• Importance Sampling
• Markov Chains
(Goodfellow 2017)
Sampling from p or q
• So far we have assumed we can sample from p or q
easily
• This is true when p or q has a directed graphical
model representation
• Use ancestral sampling
• Sample each node given its parents, moving from
roots to leaves
(Goodfellow 2017)
Sampling from undirected
models
• Sampling from undirected models is more difficult
• Can’t get a fair sample in one pass
• Use a Monte Carlo algorithm that incrementally
updates samples, comes closer to sampling from the
right distribution at each step
• This is called a Markov Chain
(Goodfellow 2017)
Simple Markov Chain: Gibbs
sampling
• Repeatedly cycle through all variables
• For each variable, randomly sample that variable
given its Markov blanket
• For an undirected model, the Markov blanket is
just the neighbors in the graph
• Block Gibbs trick: conditionally independent
variables may be sampled simultaneously
(Goodfellow 2017)
Gibbs sampling example
• Initialize a, s, and b
• For n repetitions
• Sample a from P(a|s) and b from P(b|s)
• Sample s from P(s|a,b)
CHAPTER 16. STRUCTURED PROBABILISTIC
a s b
(a)
Figure 16.6: (a) The path between random varia
active, because s is not observed. This means th
is shaded in, to indicate that it is observed. Be
through s, and that path is inactive, we can con
16.2.5 Separation and D-Separatio
Block Gibbs trick lets us
sample a and b in parallel
(Goodfellow 2017)
Equilibrium
• Running a Markov Chain long enough causes it to
mix
• After mixing, it samples from an equilibrium
distribution
• Sample before update comes from distribution π(x)
• Sample after update is a different sample, but still
from distribution π(x)
(Goodfellow 2017)
Downsides
• Generally infeasible to…
• …know ahead of time how long mixing will take
• …know how far a chain is from equilibrium
• …know whether a chain is at equilibrium
• Usually in deep learning we just run for n steps, for
some n that we think will be big enough, and hope for
the best
(Goodfellow 2017)
Trouble in Practice
• Mixing can take an infeasibly long time
• This is especially true for
• High-dimensional distributions
• Distributions with strong correlations between
variables
• Distributions with multiple highly separated modes
(Goodfellow 2017)
Difficult Mixing
Figure 17.1: Paths followed by Gibbs sampling for three distributions, with the Markov
chain initialized at the mode in both cases. (Left)A multivariate normal distribution
with two independent variables. Gibbs sampling mixes well because the variables are
independent. (Center)A multivariate normal distribution with highly correlated variables.
The correlation between variables makes it difficult for the Markov chain to mix. Because
the update for each variable must be conditioned on the other variable, the correlation
reduces the rate at which the Markov chain can move away from the starting point.
(Right)A mixture of Gaussians with widely separated modes that are not axis aligned.
Gibbs sampling mixes very slowly because it is difficult to change modes while altering
only one variable at a time.
(Goodfellow 2017)
Difficult Mixing in Deep
Generative Models
Energy-based models may be augmented with an extra parameter controlling
how sharply peaked the distribution is:
p (x) / exp ( E(x)) . (17.26)
The parameter is often described as being the reciprocal of the temperature,
reflecting the origin of energy-based models in statistical physics. When the
temperature falls to zero, and rises to infinity, the energy-based model becomes
Figure 17.2: An illustration of the slow mixing problem in deep probabilistic models.
Each panel should be read left to right, top to bottom. (Left)Consecutive samples from
Gibbs sampling applied to a deep Boltzmann machine trained on the MNIST dataset.
Consecutive samples are similar to each other. Because the Gibbs sampling is performed
in a deep graphical model, this similarity is based more on semantic than raw visual
features, but it is still difficult for the Gibbs chain to transition from one mode of the
distribution to another, for example, by changing the digit identity. (Right)Consecutive
ancestral samples from a generative adversarial network. Because ancestral sampling
generates each sample independently from the others, there is no mixing problem.
600
(Goodfellow 2017)
For more information…

More Related Content

What's hot

Variational Inference
Variational InferenceVariational Inference
Variational InferenceTushar Tank
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmologyChristian Robert
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20Yuta Kashino
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Christian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 

What's hot (20)

Variational Inference
Variational InferenceVariational Inference
Variational Inference
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 

Similar to 17_monte_carlo.pdf

Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxHaibinSu2
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
 
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...Kohei Hayashi
 
Averaging Schemes For Solving Fixed Point And Variational Inequality Problems
Averaging Schemes For Solving Fixed Point And Variational Inequality ProblemsAveraging Schemes For Solving Fixed Point And Variational Inequality Problems
Averaging Schemes For Solving Fixed Point And Variational Inequality ProblemsLaurie Smith
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:Sean Golliher
 
2 borgs
2 borgs2 borgs
2 borgsYandex
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...SSA KPI
 
High-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHigh-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHPCC Systems
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite IntegralRich Elle
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3MuhannadSaleh
 
Continuous random variables and probability distribution
Continuous random variables and probability distributionContinuous random variables and probability distribution
Continuous random variables and probability distributionpkwilambo
 
proposal_pura
proposal_puraproposal_pura
proposal_puraErick Lin
 
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization ProblemElementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problemjfrchicanog
 
Computer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC AlgorithmComputer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC Algorithmallyn joy calcaben
 

Similar to 17_monte_carlo.pdf (20)

Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Climate Extremes Workshop - Assessing models for estimation and methods for u...
Climate Extremes Workshop - Assessing models for estimation and methods for u...Climate Extremes Workshop - Assessing models for estimation and methods for u...
Climate Extremes Workshop - Assessing models for estimation and methods for u...
 
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
Averaging Schemes For Solving Fixed Point And Variational Inequality Problems
Averaging Schemes For Solving Fixed Point And Variational Inequality ProblemsAveraging Schemes For Solving Fixed Point And Variational Inequality Problems
Averaging Schemes For Solving Fixed Point And Variational Inequality Problems
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
 
2 borgs
2 borgs2 borgs
2 borgs
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
 
Quadrature
QuadratureQuadrature
Quadrature
 
High-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHigh-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECL
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
 
10.1.1.630.8055
10.1.1.630.805510.1.1.630.8055
10.1.1.630.8055
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Continuous random variables and probability distribution
Continuous random variables and probability distributionContinuous random variables and probability distribution
Continuous random variables and probability distribution
 
proposal_pura
proposal_puraproposal_pura
proposal_pura
 
A bit about мcmc
A bit about мcmcA bit about мcmc
A bit about мcmc
 
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization ProblemElementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
 
Computer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC AlgorithmComputer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC Algorithm
 

More from KSChidanandKumarJSSS (8)

12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
 
16_graphical_models.pdf
16_graphical_models.pdf16_graphical_models.pdf
16_graphical_models.pdf
 
10_rnn.pdf
10_rnn.pdf10_rnn.pdf
10_rnn.pdf
 
15_representation.pdf
15_representation.pdf15_representation.pdf
15_representation.pdf
 
14_autoencoders.pdf
14_autoencoders.pdf14_autoencoders.pdf
14_autoencoders.pdf
 
Adversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdfAdversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdf
 
Adversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdfAdversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdf
 
13_linear_factors.pdf
13_linear_factors.pdf13_linear_factors.pdf
13_linear_factors.pdf
 

Recently uploaded

WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.Christina Parmionova
 
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...narwatsonia7
 
Vip Vaishali Escorts Service Call -> 9999965857 Available 24x7 ^ Call Girls G...
Vip Vaishali Escorts Service Call -> 9999965857 Available 24x7 ^ Call Girls G...Vip Vaishali Escorts Service Call -> 9999965857 Available 24x7 ^ Call Girls G...
Vip Vaishali Escorts Service Call -> 9999965857 Available 24x7 ^ Call Girls G...ankitnayak356677
 
Call Girls Service AECS Layout Just Call 7001305949 Enjoy College Girls Service
Call Girls Service AECS Layout Just Call 7001305949 Enjoy College Girls ServiceCall Girls Service AECS Layout Just Call 7001305949 Enjoy College Girls Service
Call Girls Service AECS Layout Just Call 7001305949 Enjoy College Girls Servicenarwatsonia7
 
EDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxEDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxaaryamanorathofficia
 
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...Christina Parmionova
 
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...CedZabala
 
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...Suhani Kapoor
 
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012rehmti665
 
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalore
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service BangaloreCall Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalore
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Take action for a healthier planet and brighter future.
Take action for a healthier planet and brighter future.Take action for a healthier planet and brighter future.
Take action for a healthier planet and brighter future.Christina Parmionova
 
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up NumberMs Riya
 
Panet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILPanet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILChristina Parmionova
 
(多少钱)Dal毕业证国外本科学位证
(多少钱)Dal毕业证国外本科学位证(多少钱)Dal毕业证国外本科学位证
(多少钱)Dal毕业证国外本科学位证mbetknu
 
DNV publication: China Energy Transition Outlook 2024
DNV publication: China Energy Transition Outlook 2024DNV publication: China Energy Transition Outlook 2024
DNV publication: China Energy Transition Outlook 2024Energy for One World
 
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...narwatsonia7
 
Greater Noida Call Girls 9711199012 WhatsApp No 24x7 Vip Escorts in Greater N...
Greater Noida Call Girls 9711199012 WhatsApp No 24x7 Vip Escorts in Greater N...Greater Noida Call Girls 9711199012 WhatsApp No 24x7 Vip Escorts in Greater N...
Greater Noida Call Girls 9711199012 WhatsApp No 24x7 Vip Escorts in Greater N...ankitnayak356677
 
Call Girls Service Race Course Road Just Call 7001305949 Enjoy College Girls ...
Call Girls Service Race Course Road Just Call 7001305949 Enjoy College Girls ...Call Girls Service Race Course Road Just Call 7001305949 Enjoy College Girls ...
Call Girls Service Race Course Road Just Call 7001305949 Enjoy College Girls ...narwatsonia7
 
2024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 282024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 28JSchaus & Associates
 

Recently uploaded (20)

WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.
 
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...
 
Vip Vaishali Escorts Service Call -> 9999965857 Available 24x7 ^ Call Girls G...
Vip Vaishali Escorts Service Call -> 9999965857 Available 24x7 ^ Call Girls G...Vip Vaishali Escorts Service Call -> 9999965857 Available 24x7 ^ Call Girls G...
Vip Vaishali Escorts Service Call -> 9999965857 Available 24x7 ^ Call Girls G...
 
Call Girls Service AECS Layout Just Call 7001305949 Enjoy College Girls Service
Call Girls Service AECS Layout Just Call 7001305949 Enjoy College Girls ServiceCall Girls Service AECS Layout Just Call 7001305949 Enjoy College Girls Service
Call Girls Service AECS Layout Just Call 7001305949 Enjoy College Girls Service
 
EDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxEDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptx
 
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
 
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
 
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
 
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
 
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012
 
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalore
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service BangaloreCall Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalore
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalore
 
Take action for a healthier planet and brighter future.
Take action for a healthier planet and brighter future.Take action for a healthier planet and brighter future.
Take action for a healthier planet and brighter future.
 
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up Number
 
Panet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILPanet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRIL
 
(多少钱)Dal毕业证国外本科学位证
(多少钱)Dal毕业证国外本科学位证(多少钱)Dal毕业证国外本科学位证
(多少钱)Dal毕业证国外本科学位证
 
DNV publication: China Energy Transition Outlook 2024
DNV publication: China Energy Transition Outlook 2024DNV publication: China Energy Transition Outlook 2024
DNV publication: China Energy Transition Outlook 2024
 
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...
 
Greater Noida Call Girls 9711199012 WhatsApp No 24x7 Vip Escorts in Greater N...
Greater Noida Call Girls 9711199012 WhatsApp No 24x7 Vip Escorts in Greater N...Greater Noida Call Girls 9711199012 WhatsApp No 24x7 Vip Escorts in Greater N...
Greater Noida Call Girls 9711199012 WhatsApp No 24x7 Vip Escorts in Greater N...
 
Call Girls Service Race Course Road Just Call 7001305949 Enjoy College Girls ...
Call Girls Service Race Course Road Just Call 7001305949 Enjoy College Girls ...Call Girls Service Race Course Road Just Call 7001305949 Enjoy College Girls ...
Call Girls Service Race Course Road Just Call 7001305949 Enjoy College Girls ...
 
2024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 282024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 28
 

17_monte_carlo.pdf

  • 1. Monte Carlo Methods Lecture slides for Chapter 17 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2017-12-29
  • 2. (Goodfellow 2017) Roadmap • Basics of Monte Carlo methods • Importance Sampling • Markov Chains
  • 3. (Goodfellow 2017) Randomized Algorithms Las Vegas Monte Carlo Type of Answer Exact Random amount of error Runtime Random (until answer found) Chosen by user (longer runtime gives lesss error)
  • 4. (Goodfellow 2017) Estimating sums / integrals with samples has an exponential number of terms, and no exact simplification is known), it is often possible to approximate it using Monte Carlo sampling. The idea is to view the sum or integral as if it were an expectation under some distribution and to approximate the expectation by a corresponding average. Let s = X x p(x)f(x) = Ep[f(x)] (17.1) or s = Z p(x)f(x)dx = Ep[f(x)] (17.2) be the sum or integral to estimate, rewritten as an expectation, with the constraint that p is a probability distribution (for the sum) or a probability density (for the integral) over random variable x. We can approximate s by drawing n samples x(1), . . . , x(n) from p and then forming the empirical average ŝn = 1 n n X i=1 f(x(i) ). (17.3) This approximation is justified by a few different properties. The first trivial observation is that the estimator ŝ is unbiased, since Monte Carlo Sampling egral cannot be computed exactly (for example, the sum mber of terms, and no exact simplification is known), it is ximate it using Monte Carlo sampling. The idea is to view if it were an expectation under some distribution and to ation by a corresponding average. Let s = X x p(x)f(x) = Ep[f(x)] (17.1) s = Z p(x)f(x)dx = Ep[f(x)] (17.2) o estimate, rewritten as an expectation, with the constraint distribution (for the sum) or a probability density (for the variable x. te s by drawing n samples x(1), . . . , x(n) from p and then average ŝn = 1 n n X i=1 f(x(i) ). (17.3) justified by a few different properties. The first trivial estimator ŝ is unbiased, since 17.1.2 Basics of Monte Carlo Sampling When a sum or an integral cannot be computed exactly (for example, the sum has an exponential number of terms, and no exact simplification is known), it is often possible to approximate it using Monte Carlo sampling. The idea is to view the sum or integral as if it were an expectation under some distribution and to approximate the expectation by a corresponding average. Let s = X x p(x)f(x) = Ep[f(x)] (17.1) or s = Z p(x)f(x)dx = Ep[f(x)] (17.2) be the sum or integral to estimate, rewritten as an expectation, with the constraint that p is a probability distribution (for the sum) or a probability density (for the integral) over random variable x. We can approximate s by drawing n samples x(1), . . . , x(n) from p and then forming the empirical average ŝn = 1 n n X i=1 f(x(i) ). (17.3) This approximation is justified by a few different properties. The first trivial observation is that the estimator ŝ is unbiased, since
  • 5. (Goodfellow 2017) Justification • Unbiased: • The expected value for finite n is equal to the correct value • The value for any specific n samples will have random error, but the errors for different sample sets cancel out • Low variance: • Variance is O(1/n) • For very large n, the error converges “almost surely” to 0
  • 6. (Goodfellow 2017) Roadmap • Basics of Monte Carlo methods • Importance Sampling • Markov Chains
  • 7. (Goodfellow 2017) Non-unique decomposition ectation by a corresponding average. Let s = X x p(x)f(x) = Ep[f(x)] (17.1) s = Z p(x)f(x)dx = Ep[f(x)] (17.2) al to estimate, rewritten as an expectation, with the constraint ty distribution (for the sum) or a probability density (for the m variable x. mate s by drawing n samples x(1), . . . , x(n) from p and then al average ŝn = 1 n n X i=1 f(x(i) ). (17.3) Say we want to compute Z a(x)b(x)c(x)dx. Which part is p? Which part is f ? p=a and f=bc? p=ab and f=c? etc. No unique decomposition. We can always pull part of any p into f.
  • 8. (Goodfellow 2017) Importance Sampling n equation 17.2 is deciding which part of the integrand should bility p(x) and which part of the integrand should play the x) whose expected value (under that probability distribution) here is no unique decomposition because p(x)f(x) can always p(x)f(x) = q(x) p(x)f(x) q(x) , (17.8) from q and average pf q . In many cases, we wish to compute given p and an f, and the fact that the problem is specified expectation suggests that this p and f would be a natural ator of the variance is often preferred, in which the sum of squared n 1 instead of n. 589 This is our new p, meaning it is the distribution we will draw samples from This ratio is our new f, meaning we will evaluate it at each sample
  • 9. (Goodfellow 2017) Why use importance sampling? • Maybe it is feasible to sample from q but not from p • This is how GANs work • A good q can reduce the variance of the estimate • Importance sampling is still unbiased for every q
  • 10. (Goodfellow 2017) Optimal q • Determining the optimal q requires solving the original integral, so not useful in practice • Useful to understand intuition behind importance sampling • This q minimizes the variance • Places more mass on points where the weighted function is larger Var[ŝq] = Var[ p(x)f(x) q(x) ]/n. (17.12) occurs when q is q⇤ (x) = p(x)|f(x)| Z , (17.13) tion constant, chosen so that q⇤(x) sums or integrates to importance sampling distributions put more weight where In fact, when f(x) does not change sign, Var[ŝq⇤ ] = 0, ample is sufficient when the optimal distribution is used. ecause the computation of q⇤ has essentially solved the usually not practical to use this approach of drawing a ptimal distribution. ing distribution q is valid (in the sense of yielding the ⇤
  • 11. (Goodfellow 2017) Roadmap • Basics of Monte Carlo methods • Importance Sampling • Markov Chains
  • 12. (Goodfellow 2017) Sampling from p or q • So far we have assumed we can sample from p or q easily • This is true when p or q has a directed graphical model representation • Use ancestral sampling • Sample each node given its parents, moving from roots to leaves
  • 13. (Goodfellow 2017) Sampling from undirected models • Sampling from undirected models is more difficult • Can’t get a fair sample in one pass • Use a Monte Carlo algorithm that incrementally updates samples, comes closer to sampling from the right distribution at each step • This is called a Markov Chain
  • 14. (Goodfellow 2017) Simple Markov Chain: Gibbs sampling • Repeatedly cycle through all variables • For each variable, randomly sample that variable given its Markov blanket • For an undirected model, the Markov blanket is just the neighbors in the graph • Block Gibbs trick: conditionally independent variables may be sampled simultaneously
  • 15. (Goodfellow 2017) Gibbs sampling example • Initialize a, s, and b • For n repetitions • Sample a from P(a|s) and b from P(b|s) • Sample s from P(s|a,b) CHAPTER 16. STRUCTURED PROBABILISTIC a s b (a) Figure 16.6: (a) The path between random varia active, because s is not observed. This means th is shaded in, to indicate that it is observed. Be through s, and that path is inactive, we can con 16.2.5 Separation and D-Separatio Block Gibbs trick lets us sample a and b in parallel
  • 16. (Goodfellow 2017) Equilibrium • Running a Markov Chain long enough causes it to mix • After mixing, it samples from an equilibrium distribution • Sample before update comes from distribution π(x) • Sample after update is a different sample, but still from distribution π(x)
  • 17. (Goodfellow 2017) Downsides • Generally infeasible to… • …know ahead of time how long mixing will take • …know how far a chain is from equilibrium • …know whether a chain is at equilibrium • Usually in deep learning we just run for n steps, for some n that we think will be big enough, and hope for the best
  • 18. (Goodfellow 2017) Trouble in Practice • Mixing can take an infeasibly long time • This is especially true for • High-dimensional distributions • Distributions with strong correlations between variables • Distributions with multiple highly separated modes
  • 19. (Goodfellow 2017) Difficult Mixing Figure 17.1: Paths followed by Gibbs sampling for three distributions, with the Markov chain initialized at the mode in both cases. (Left)A multivariate normal distribution with two independent variables. Gibbs sampling mixes well because the variables are independent. (Center)A multivariate normal distribution with highly correlated variables. The correlation between variables makes it difficult for the Markov chain to mix. Because the update for each variable must be conditioned on the other variable, the correlation reduces the rate at which the Markov chain can move away from the starting point. (Right)A mixture of Gaussians with widely separated modes that are not axis aligned. Gibbs sampling mixes very slowly because it is difficult to change modes while altering only one variable at a time.
  • 20. (Goodfellow 2017) Difficult Mixing in Deep Generative Models Energy-based models may be augmented with an extra parameter controlling how sharply peaked the distribution is: p (x) / exp ( E(x)) . (17.26) The parameter is often described as being the reciprocal of the temperature, reflecting the origin of energy-based models in statistical physics. When the temperature falls to zero, and rises to infinity, the energy-based model becomes Figure 17.2: An illustration of the slow mixing problem in deep probabilistic models. Each panel should be read left to right, top to bottom. (Left)Consecutive samples from Gibbs sampling applied to a deep Boltzmann machine trained on the MNIST dataset. Consecutive samples are similar to each other. Because the Gibbs sampling is performed in a deep graphical model, this similarity is based more on semantic than raw visual features, but it is still difficult for the Gibbs chain to transition from one mode of the distribution to another, for example, by changing the digit identity. (Right)Consecutive ancestral samples from a generative adversarial network. Because ancestral sampling generates each sample independently from the others, there is no mixing problem. 600
  • 21. (Goodfellow 2017) For more information…