SlideShare a Scribd company logo
Gradient Estimation Using Stochastic
Computation Graphs
Yoonho Lee
Department of Computer Science and Engineering
Pohang University of Science and Technology
May 9, 2017
Outline
Stochastic Gradient Estimation
Stochastic Computation Graphs
Stochastic Computation Graphs in RL
Other Examples of Stochastic Computation Graphs
Outline
Stochastic Gradient Estimation
Stochastic Computation Graphs
Stochastic Computation Graphs in RL
Other Examples of Stochastic Computation Graphs
Stochastic Gradient Estimation
R(θ) = Ew∼Pθ(dw)[f (θ, w)]
Approximate θR(θ) can be used for:
Computational Finance
Reinforcement Learning
Variational Inference
Explaining the brain with neural networks
Stochastic Gradient Estimation
R(θ) = Ew∼Pθ(dw)[f (θ, w)] =
Ω
f (θ, w)Pθ(dw)
Stochastic Gradient Estimation
R(θ) = Ew∼Pθ(dw)[f (θ, w)] =
Ω
f (θ, w)Pθ(dw)
=
Ω
f (θ, w)
Pθ(dw)
G(dw)
G(dw)
Stochastic Gradient Estimation
R(θ) = Ew∼Pθ(dw)[f (θ, w)] =
Ω
f (θ, w)Pθ(dw)
=
Ω
f (θ, w)
Pθ(dw)
G(dw)
G(dw)
θR(θ) = θ
Ω
f (θ, w)
Pθ(dw)
G(dw)
G(dw)
=
Ω
θ(f (θ, w)
Pθ(dw)
G(dw)
)G(dw)
= Ew∼G(dw)[ θf (θ, w)
Pθ(dw)
G(dw)
+ f (θ, w)
θPθ(dw)
G(dw)
]
Stochastic Gradient Estimation
So far we have
θR(θ) = Ew∼G(dw)[ θf (θ, w)
Pθ(dw)
G(dw)
+ f (θ, w)
θPθ(dw)
G(dw)
]
Stochastic Gradient Estimation
So far we have
θR(θ) = Ew∼G(dw)[ θf (θ, w)
Pθ(dw)
G(dw)
+ f (θ, w)
θPθ(dw)
G(dw)
]
Letting G = Pθ0 and evaluating at θ = θ0, we get
θR(θ)
θ=θ0
= Ew∼Pθ0
(dw)[ θf (θ, w)
Pθ(dw)
Pθ0 (dw)
+ f (θ, w)
θPθ(dw)
Pθ0 (dw)
]
θ=θ0
= Ew∼Pθ0
(dw)[ θf (θ, w) + f (θ, w) θlnPθ(dw)]
θ=θ0
Stochastic Gradient Estimation
So far we have
R(θ) = Ew∼Pθ(dw)[f (θ, w)]
θR(θ)
θ=θ0
= Ew∼Pθ0
(dw)[ θf (θ, w) + f (θ, w) θlnPθ(dw)]
θ=θ0
Stochastic Gradient Estimation
So far we have
R(θ) = Ew∼Pθ(dw)[f (θ, w)]
θR(θ)
θ=θ0
= Ew∼Pθ0
(dw)[ θf (θ, w) + f (θ, w) θlnPθ(dw)]
θ=θ0
1. Assuming w fully determines f :
θR(θ)
θ=θ0
= Ew∼Pθ0
(dw)[f (w) θlnPθ(dw)]
θ=θ0
2. Assuming Pθ(dw) is independent of θ:
θR(θ) = Ew∼P(dw)[ θf (θ, w)]
Stochastic Gradient Estimation
SF vs PD
R(θ) = Ew∼Pθ(dw)[f (θ, w)]
Score Function (SF) Estimator:
θR(θ)
θ=θ0
= Ew∼Pθ0
(dw)[f (w) θlnPθ(dw)]
θ=θ0
Pathwise Derivative (PD) Estimator:
θR(θ) = Ew∼P(dw)[ θf (θ, w)]
Stochastic Gradient Estimation
SF vs PD
R(θ) = Ew∼Pθ(dw)[f (θ, w)]
Score Function (SF) Estimator:
θR(θ)
θ=θ0
= Ew∼Pθ0
(dw)[f (w) θlnPθ(dw)]
θ=θ0
Pathwise Derivative (PD) Estimator:
θR(θ) = Ew∼P(dw)[ θf (θ, w)]
SF allows discontinuous f .
SF requires sample values f ; PD requires derivatives f .
SF takes derivatives over the probability distribution of w; PD
takes derivatives over the function f .
SF has high variance on high-dimensional w.
PD has high variance on rough f .
Stochastic Gradient Estimation
Choices for PD estimator
blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/
Outline
Stochastic Gradient Estimation
Stochastic Computation Graphs
Stochastic Computation Graphs in RL
Other Examples of Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation
Graphs (NIPS 2015)
John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel
Stochastic Computation Graphs
SF estimator:
θR(θ)
θ=θ0
= Ex∼Pθ0
(dx)[f (x) θlnPθ(dx)]
θ=θ0
PD estimator:
θR(θ) = Ez∼P(dz)[ θf (x(θ, z))]
Stochastic Computation Graphs
Stochastic computation graphs are a design choice rather
than an intrinsic property of a problem
Stochastic nodes ’block’ gradient flow
Stochastic Computation Graphs
Neural Networks
Neural Networks are a special case of stochastic computation
graphs
Stochastic Computation Graphs
Formal Definition
Stochastic Computation Graphs
Deriving Unbiased Estimators
Condition 1. (Differentiability Requirements) Given input node
θ ∈ Θ, for all edges (v, w) which satisfy θ D v and θ D w, the
following condition holds: if w is deterministic, ∂w
∂v exists, and if w
is stochastic, ∂
∂v p(w|PARENTSw ) exists.
Theorem 1. Suppose θ ∈ Θ satisfies Condition 1. The following
equations hold:
∂
∂θ
E
c∈C
c = E
w∈S
θ D w
∂
∂θ
logp(w|DEPSw ) ˆQw +
c∈C
θ D c
∂
∂θ
c(DEPSc)
= E
c∈C
ˆc
w∈S
θ D w
∂
∂θ
logp(w|DEPSw ) +
c∈C
θ D c
∂
∂θ
c(DEPSc)
Stochastic Computation Graphs
Surrogate loss functions
We have this equation from Theorem 1:
∂
∂θ
E
c∈C
c = E
w∈S
θ D w
∂
∂θ
logp(w|DEPSw ) ˆQw +
c∈C
θ D c
∂
∂θ
c(DEPSc)
Note that by defining
L(Θ, S) :=
w∈S
logp(w|DEPSw ) ˆQw +
c∈C
c(DEPSc)
we have
∂
∂θ
E
c∈C
c = E
∂
∂θ
L(Θ, S)
Stochastic Computation Graphs
Surrogate loss functions
Stochastic Computation Graphs
Implementations
Making stochastic computation graphs with neural network
packages is easy
tensorflow.org/api_docs/python/tf/contrib/bayesflow/stochastic_gradient_estimators
github.com/pytorch/pytorch/blob/6a69f7007b37bb36d8a712cdf5cebe3ee0cc1cc8/torch/autograd/
variable.py
github.com/blei-lab/edward/blob/master/edward/inferences/klqp.py
Outline
Stochastic Gradient Estimation
Stochastic Computation Graphs
Stochastic Computation Graphs in RL
Other Examples of Stochastic Computation Graphs
MDP
The MDP formulation itself is a stochastic computation graph
Policy Gradient
Surrogate Loss
1
The policy gradient algorithm is Theorem 1 applied to the
MDP graph
1
Richard S Sutton et al. “Policy Gradient Methods for Reinforcement Learning with Function Approximation”.
In: NIPS (1999).
Stochastic Value Gradient
SVG(0)
Learn Qφ, and update2
∆θ ∝ θ
T
t=1
Qφ(st, πθ(st, t))
2
Nicolas Heess et al. “Learning Continuous Control Policies by Stochastic Value Gradients”. In: NIPS (2015).
Stochastic Value Gradient
SVG(1)
Learn Vφ and dynamics model f such that st+1 = f (st, at) + δt.
Given transition (st, at, st+1), infer noise δt.3
∆θ ∝ θ
T
t=1
rt + γVφ(f (st, at) + δt)
3
Nicolas Heess et al. “Learning Continuous Control Policies by Stochastic Value Gradients”. In: NIPS (2015).
Stochastic Value Gradient
SVG(∞)
Learn f , infer all noise variables δt. Differentiate through whole
graph.4
4
Nicolas Heess et al. “Learning Continuous Control Policies by Stochastic Value Gradients”. In: NIPS (2015).
Deterministic Policy Gradient
5
5
David Silver et al. “Deterministic Policy Gradient Algorithms”. In: ICML (2014).
Evolution Strategies
6
6
Tim Salimans et al. “Evolution Strategies as a Scalable Alternative to Reinforcement Learning”. In: arXiv
(2017).
Outline
Stochastic Gradient Estimation
Stochastic Computation Graphs
Stochastic Computation Graphs in RL
Other Examples of Stochastic Computation Graphs
Neural Variational Inference
7
Where
L(x, θ, φ) = Eh∼Q(h|x)[logPθ(x, h) − logQφ(h|x)]
Score function estimator
7
Andriy Mnih and Karol Gregor. “Neural Variational Inference and Learning in Belief Networks”. In: ICML
(2014).
Variational Autoencoder
8
Pathwise derivative estimator applied to the exact same graph
as NVIL
8
Diederik P Kingma and Max Welling. “Auto-Encoding Variational Bayes”. In: ICLR (2014).
DRAW
9
Sequential version of VAE
Same graph as SVG(∞) with (known) deterministic
dynamics(e=state, z=action, L=reward)
Similarly, VAE can be seen as SVG(∞) with 1 timestep and
deterministic dynamics
9
Karol Gregor et al. “DRAW: A Recurrent Neural Network For Image Generation”. In: ICML (2015).
GAN
10
Connection to SVG(0) has been established in [10]
Actor has no access to state. Observation is a random choice
between real or fake image. Reward is 1 if real, 0 if fake. D
learns value function of observation. G’s learning signal is D.
10
David Pfau and Oriol Vinyals. “Connecting Generative Adversarial Networks and Actor-Critic Methods”. In:
(2016).
Bayes by backprop
11
and our cost function f is:
f (D, θ) = Ew∼q(w|θ)[f (w, θ)]
= Ew∼q(w|θ)[−logp(D|w) + logq(w|θ) − logp(w)]
From the point of view of stochastic computation graphs,
weights and activations are not different
Estimate ELBO gradient of activation with PD estimator:
VAE
Estimate ELBO gradient of weight with PD estimator: Bayes
by Backprop
11
Charles Blundell et al. “Weight Uncertainty in Neural Networks”. In: ICML (2015).
Summary
Stochastic computation graphs explain many different
algorithms with one gradient estimator(Theorem 1)
Stochastic computation graphs give us insight into the
behavior of estimators
Stochastic computation graphs reveal equivalencies:
NVIL[7]=Policy gradient[12]
VAE[6]/DRAW[4]=SVG(∞)[5]
GAN[3]=SVG(0)[5]
References I
[1] Charles Blundell et al. “Weight Uncertainty in Neural
Networks”. In: ICML (2015).
[2] Michael Fu. “Stochastic Gradient Estimation”. In: (2005).
[3] Ian J. Goodfellow et al. “Generative Adversarial Networks”.
In: NIPS (2014).
[4] Karol Gregor et al. “DRAW: A Recurrent Neural Network
For Image Generation”. In: ICML (2015).
[5] Nicolas Heess et al. “Learning Continuous Control Policies
by Stochastic Value Gradients”. In: NIPS (2015).
[6] Diederik P Kingma and Max Welling. “Auto-Encoding
Variational Bayes”. In: ICLR (2014).
[7] Andriy Mnih and Karol Gregor. “Neural Variational Inference
and Learning in Belief Networks”. In: ICML (2014).
References II
[8] David Pfau and Oriol Vinyals. “Connecting Generative
Adversarial Networks and Actor-Critic Methods”. In: (2016).
[9] Tim Salimans et al. “Evolution Strategies as a Scalable
Alternative to Reinforcement Learning”. In: arXiv (2017).
[10] John Schulman et al. “Gradient Estimation Using Stochastic
Computation Graphs”. In: NIPS (2015).
[11] David Silver et al. “Deterministic Policy Gradient
Algorithms”. In: ICML (2014).
[12] Richard S Sutton et al. “Policy Gradient Methods for
Reinforcement Learning with Function Approximation”. In:
NIPS (1999).
Thank You

More Related Content

What's hot

ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
tthonet
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
Masahiro Suzuki
 
Nearly optimal average case complexity of counting bicliques under seth
Nearly optimal average case complexity of counting bicliques under sethNearly optimal average case complexity of counting bicliques under seth
Nearly optimal average case complexity of counting bicliques under seth
Nobutaka Shimizu
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
guest3550292
 
Sparse Kernel Learning for Image Annotation
Sparse Kernel Learning for Image AnnotationSparse Kernel Learning for Image Annotation
Sparse Kernel Learning for Image AnnotationSean Moran
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
Mila, Université de Montréal
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance
charlesmartin14
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
Sebastian Ruder
 
Iwsm2014 an analogy-based approach to estimation of software development ef...
Iwsm2014   an analogy-based approach to estimation of software development ef...Iwsm2014   an analogy-based approach to estimation of software development ef...
Iwsm2014 an analogy-based approach to estimation of software development ef...
Nesma
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
Masahiro Suzuki
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
Parameswaran Raman
 
Problem Understanding through Landscape Theory
Problem Understanding through Landscape TheoryProblem Understanding through Landscape Theory
Problem Understanding through Landscape Theory
jfrchicanog
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
Elvis DOHMATOB
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
Kazuki Fujikawa
 
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
subhashchandra197
 
Approximation algorithms
Approximation algorithmsApproximation algorithms
Approximation algorithms
Ganesh Solanke
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes Family
Kota Matsui
 
Simplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution AlgorithmsSimplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution Algorithms
PK Lehre
 

What's hot (19)

ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
Nearly optimal average case complexity of counting bicliques under seth
Nearly optimal average case complexity of counting bicliques under sethNearly optimal average case complexity of counting bicliques under seth
Nearly optimal average case complexity of counting bicliques under seth
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
Sparse Kernel Learning for Image Annotation
Sparse Kernel Learning for Image AnnotationSparse Kernel Learning for Image Annotation
Sparse Kernel Learning for Image Annotation
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Iwsm2014 an analogy-based approach to estimation of software development ef...
Iwsm2014   an analogy-based approach to estimation of software development ef...Iwsm2014   an analogy-based approach to estimation of software development ef...
Iwsm2014 an analogy-based approach to estimation of software development ef...
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
 
Problem Understanding through Landscape Theory
Problem Understanding through Landscape TheoryProblem Understanding through Landscape Theory
Problem Understanding through Landscape Theory
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
 
Approximation algorithms
Approximation algorithmsApproximation algorithms
Approximation algorithms
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes Family
 
Simplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution AlgorithmsSimplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution Algorithms
 

Similar to Gradient Estimation Using Stochastic Computation Graphs

Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scg
Ronald Teo
 
Solving inverse problems via non-linear Bayesian Update of PCE coefficients
Solving inverse problems via non-linear Bayesian Update of PCE coefficientsSolving inverse problems via non-linear Bayesian Update of PCE coefficients
Solving inverse problems via non-linear Bayesian Update of PCE coefficients
Alexander Litvinenko
 
Litvinenko nlbu2016
Litvinenko nlbu2016Litvinenko nlbu2016
Litvinenko nlbu2016
Alexander Litvinenko
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
vsssuresh
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Ono Shigeru
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Alexander Litvinenko
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Jack Clark
 
Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space
Eliezer Silva
 
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
The Statistical and Applied Mathematical Sciences Institute
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
Tomasz Kusmierczyk
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
Satish Gupta
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 
Nonparametric testing for exogeneity with discrete regressors and instruments
Nonparametric testing for exogeneity with discrete regressors and instrumentsNonparametric testing for exogeneity with discrete regressors and instruments
Nonparametric testing for exogeneity with discrete regressors and instruments
GRAPE
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningbutest
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
Christian Robert
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
Iosif Itkin
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
Masahiro Suzuki
 
Triggering patterns of topology changes in dynamic attributed graphs
Triggering patterns of topology changes in dynamic attributed graphsTriggering patterns of topology changes in dynamic attributed graphs
Triggering patterns of topology changes in dynamic attributed graphs
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
The Magic of Auto Differentiation
The Magic of Auto DifferentiationThe Magic of Auto Differentiation
The Magic of Auto Differentiation
Sanyam Kapoor
 
Connection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problemsConnection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problems
Alexander Litvinenko
 

Similar to Gradient Estimation Using Stochastic Computation Graphs (20)

Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scg
 
Solving inverse problems via non-linear Bayesian Update of PCE coefficients
Solving inverse problems via non-linear Bayesian Update of PCE coefficientsSolving inverse problems via non-linear Bayesian Update of PCE coefficients
Solving inverse problems via non-linear Bayesian Update of PCE coefficients
 
Litvinenko nlbu2016
Litvinenko nlbu2016Litvinenko nlbu2016
Litvinenko nlbu2016
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space
 
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Nonparametric testing for exogeneity with discrete regressors and instruments
Nonparametric testing for exogeneity with discrete regressors and instrumentsNonparametric testing for exogeneity with discrete regressors and instruments
Nonparametric testing for exogeneity with discrete regressors and instruments
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
Triggering patterns of topology changes in dynamic attributed graphs
Triggering patterns of topology changes in dynamic attributed graphsTriggering patterns of topology changes in dynamic attributed graphs
Triggering patterns of topology changes in dynamic attributed graphs
 
The Magic of Auto Differentiation
The Magic of Auto DifferentiationThe Magic of Auto Differentiation
The Magic of Auto Differentiation
 
Connection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problemsConnection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problems
 

More from Yoonho Lee

On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning Algorithms
Yoonho Lee
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
Yoonho Lee
 
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceGradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Yoonho Lee
 
Parameter Space Noise for Exploration
Parameter Space Noise for ExplorationParameter Space Noise for Exploration
Parameter Space Noise for Exploration
Yoonho Lee
 
Meta Learning Shared Hierarchies
Meta Learning Shared HierarchiesMeta Learning Shared Hierarchies
Meta Learning Shared Hierarchies
Yoonho Lee
 
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Yoonho Lee
 
The Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningThe Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and Planning
Yoonho Lee
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
Yoonho Lee
 
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Yoonho Lee
 
Modular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy SketchesModular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy Sketches
Yoonho Lee
 
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement LearningEvolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Yoonho Lee
 

More from Yoonho Lee (11)

On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning Algorithms
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceGradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
 
Parameter Space Noise for Exploration
Parameter Space Noise for ExplorationParameter Space Noise for Exploration
Parameter Space Noise for Exploration
 
Meta Learning Shared Hierarchies
Meta Learning Shared HierarchiesMeta Learning Shared Hierarchies
Meta Learning Shared Hierarchies
 
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
 
The Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningThe Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and Planning
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
 
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
 
Modular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy SketchesModular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy Sketches
 
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement LearningEvolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
 

Recently uploaded

María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
eCommerce Institute
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
Vladimir Samoylov
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
IP ServerOne
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
Howard Spence
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Sebastiano Panichella
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
gharris9
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
faizulhassanfaiz1670
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
OWASP Beja
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Matjaž Lipuš
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
khadija278284
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
Access Innovations, Inc.
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
Faculty of Medicine And Health Sciences
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Sebastiano Panichella
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Orkestra
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
Sebastiano Panichella
 

Recently uploaded (17)

María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
 

Gradient Estimation Using Stochastic Computation Graphs

  • 1. Gradient Estimation Using Stochastic Computation Graphs Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology May 9, 2017
  • 2. Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs
  • 3. Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs
  • 4. Stochastic Gradient Estimation R(θ) = Ew∼Pθ(dw)[f (θ, w)] Approximate θR(θ) can be used for: Computational Finance Reinforcement Learning Variational Inference Explaining the brain with neural networks
  • 5. Stochastic Gradient Estimation R(θ) = Ew∼Pθ(dw)[f (θ, w)] = Ω f (θ, w)Pθ(dw)
  • 6. Stochastic Gradient Estimation R(θ) = Ew∼Pθ(dw)[f (θ, w)] = Ω f (θ, w)Pθ(dw) = Ω f (θ, w) Pθ(dw) G(dw) G(dw)
  • 7. Stochastic Gradient Estimation R(θ) = Ew∼Pθ(dw)[f (θ, w)] = Ω f (θ, w)Pθ(dw) = Ω f (θ, w) Pθ(dw) G(dw) G(dw) θR(θ) = θ Ω f (θ, w) Pθ(dw) G(dw) G(dw) = Ω θ(f (θ, w) Pθ(dw) G(dw) )G(dw) = Ew∼G(dw)[ θf (θ, w) Pθ(dw) G(dw) + f (θ, w) θPθ(dw) G(dw) ]
  • 8. Stochastic Gradient Estimation So far we have θR(θ) = Ew∼G(dw)[ θf (θ, w) Pθ(dw) G(dw) + f (θ, w) θPθ(dw) G(dw) ]
  • 9. Stochastic Gradient Estimation So far we have θR(θ) = Ew∼G(dw)[ θf (θ, w) Pθ(dw) G(dw) + f (θ, w) θPθ(dw) G(dw) ] Letting G = Pθ0 and evaluating at θ = θ0, we get θR(θ) θ=θ0 = Ew∼Pθ0 (dw)[ θf (θ, w) Pθ(dw) Pθ0 (dw) + f (θ, w) θPθ(dw) Pθ0 (dw) ] θ=θ0 = Ew∼Pθ0 (dw)[ θf (θ, w) + f (θ, w) θlnPθ(dw)] θ=θ0
  • 10. Stochastic Gradient Estimation So far we have R(θ) = Ew∼Pθ(dw)[f (θ, w)] θR(θ) θ=θ0 = Ew∼Pθ0 (dw)[ θf (θ, w) + f (θ, w) θlnPθ(dw)] θ=θ0
  • 11. Stochastic Gradient Estimation So far we have R(θ) = Ew∼Pθ(dw)[f (θ, w)] θR(θ) θ=θ0 = Ew∼Pθ0 (dw)[ θf (θ, w) + f (θ, w) θlnPθ(dw)] θ=θ0 1. Assuming w fully determines f : θR(θ) θ=θ0 = Ew∼Pθ0 (dw)[f (w) θlnPθ(dw)] θ=θ0 2. Assuming Pθ(dw) is independent of θ: θR(θ) = Ew∼P(dw)[ θf (θ, w)]
  • 12. Stochastic Gradient Estimation SF vs PD R(θ) = Ew∼Pθ(dw)[f (θ, w)] Score Function (SF) Estimator: θR(θ) θ=θ0 = Ew∼Pθ0 (dw)[f (w) θlnPθ(dw)] θ=θ0 Pathwise Derivative (PD) Estimator: θR(θ) = Ew∼P(dw)[ θf (θ, w)]
  • 13. Stochastic Gradient Estimation SF vs PD R(θ) = Ew∼Pθ(dw)[f (θ, w)] Score Function (SF) Estimator: θR(θ) θ=θ0 = Ew∼Pθ0 (dw)[f (w) θlnPθ(dw)] θ=θ0 Pathwise Derivative (PD) Estimator: θR(θ) = Ew∼P(dw)[ θf (θ, w)] SF allows discontinuous f . SF requires sample values f ; PD requires derivatives f . SF takes derivatives over the probability distribution of w; PD takes derivatives over the function f . SF has high variance on high-dimensional w. PD has high variance on rough f .
  • 14. Stochastic Gradient Estimation Choices for PD estimator blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/
  • 15. Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs
  • 16. Gradient Estimation Using Stochastic Computation Graphs (NIPS 2015) John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel
  • 17. Stochastic Computation Graphs SF estimator: θR(θ) θ=θ0 = Ex∼Pθ0 (dx)[f (x) θlnPθ(dx)] θ=θ0 PD estimator: θR(θ) = Ez∼P(dz)[ θf (x(θ, z))]
  • 18. Stochastic Computation Graphs Stochastic computation graphs are a design choice rather than an intrinsic property of a problem Stochastic nodes ’block’ gradient flow
  • 19. Stochastic Computation Graphs Neural Networks Neural Networks are a special case of stochastic computation graphs
  • 21. Stochastic Computation Graphs Deriving Unbiased Estimators Condition 1. (Differentiability Requirements) Given input node θ ∈ Θ, for all edges (v, w) which satisfy θ D v and θ D w, the following condition holds: if w is deterministic, ∂w ∂v exists, and if w is stochastic, ∂ ∂v p(w|PARENTSw ) exists. Theorem 1. Suppose θ ∈ Θ satisfies Condition 1. The following equations hold: ∂ ∂θ E c∈C c = E w∈S θ D w ∂ ∂θ logp(w|DEPSw ) ˆQw + c∈C θ D c ∂ ∂θ c(DEPSc) = E c∈C ˆc w∈S θ D w ∂ ∂θ logp(w|DEPSw ) + c∈C θ D c ∂ ∂θ c(DEPSc)
  • 22. Stochastic Computation Graphs Surrogate loss functions We have this equation from Theorem 1: ∂ ∂θ E c∈C c = E w∈S θ D w ∂ ∂θ logp(w|DEPSw ) ˆQw + c∈C θ D c ∂ ∂θ c(DEPSc) Note that by defining L(Θ, S) := w∈S logp(w|DEPSw ) ˆQw + c∈C c(DEPSc) we have ∂ ∂θ E c∈C c = E ∂ ∂θ L(Θ, S)
  • 24. Stochastic Computation Graphs Implementations Making stochastic computation graphs with neural network packages is easy tensorflow.org/api_docs/python/tf/contrib/bayesflow/stochastic_gradient_estimators github.com/pytorch/pytorch/blob/6a69f7007b37bb36d8a712cdf5cebe3ee0cc1cc8/torch/autograd/ variable.py github.com/blei-lab/edward/blob/master/edward/inferences/klqp.py
  • 25. Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs
  • 26. MDP The MDP formulation itself is a stochastic computation graph
  • 27. Policy Gradient Surrogate Loss 1 The policy gradient algorithm is Theorem 1 applied to the MDP graph 1 Richard S Sutton et al. “Policy Gradient Methods for Reinforcement Learning with Function Approximation”. In: NIPS (1999).
  • 28. Stochastic Value Gradient SVG(0) Learn Qφ, and update2 ∆θ ∝ θ T t=1 Qφ(st, πθ(st, t)) 2 Nicolas Heess et al. “Learning Continuous Control Policies by Stochastic Value Gradients”. In: NIPS (2015).
  • 29. Stochastic Value Gradient SVG(1) Learn Vφ and dynamics model f such that st+1 = f (st, at) + δt. Given transition (st, at, st+1), infer noise δt.3 ∆θ ∝ θ T t=1 rt + γVφ(f (st, at) + δt) 3 Nicolas Heess et al. “Learning Continuous Control Policies by Stochastic Value Gradients”. In: NIPS (2015).
  • 30. Stochastic Value Gradient SVG(∞) Learn f , infer all noise variables δt. Differentiate through whole graph.4 4 Nicolas Heess et al. “Learning Continuous Control Policies by Stochastic Value Gradients”. In: NIPS (2015).
  • 31. Deterministic Policy Gradient 5 5 David Silver et al. “Deterministic Policy Gradient Algorithms”. In: ICML (2014).
  • 32. Evolution Strategies 6 6 Tim Salimans et al. “Evolution Strategies as a Scalable Alternative to Reinforcement Learning”. In: arXiv (2017).
  • 33. Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs
  • 34. Neural Variational Inference 7 Where L(x, θ, φ) = Eh∼Q(h|x)[logPθ(x, h) − logQφ(h|x)] Score function estimator 7 Andriy Mnih and Karol Gregor. “Neural Variational Inference and Learning in Belief Networks”. In: ICML (2014).
  • 35. Variational Autoencoder 8 Pathwise derivative estimator applied to the exact same graph as NVIL 8 Diederik P Kingma and Max Welling. “Auto-Encoding Variational Bayes”. In: ICLR (2014).
  • 36. DRAW 9 Sequential version of VAE Same graph as SVG(∞) with (known) deterministic dynamics(e=state, z=action, L=reward) Similarly, VAE can be seen as SVG(∞) with 1 timestep and deterministic dynamics 9 Karol Gregor et al. “DRAW: A Recurrent Neural Network For Image Generation”. In: ICML (2015).
  • 37. GAN 10 Connection to SVG(0) has been established in [10] Actor has no access to state. Observation is a random choice between real or fake image. Reward is 1 if real, 0 if fake. D learns value function of observation. G’s learning signal is D. 10 David Pfau and Oriol Vinyals. “Connecting Generative Adversarial Networks and Actor-Critic Methods”. In: (2016).
  • 38. Bayes by backprop 11 and our cost function f is: f (D, θ) = Ew∼q(w|θ)[f (w, θ)] = Ew∼q(w|θ)[−logp(D|w) + logq(w|θ) − logp(w)] From the point of view of stochastic computation graphs, weights and activations are not different Estimate ELBO gradient of activation with PD estimator: VAE Estimate ELBO gradient of weight with PD estimator: Bayes by Backprop 11 Charles Blundell et al. “Weight Uncertainty in Neural Networks”. In: ICML (2015).
  • 39. Summary Stochastic computation graphs explain many different algorithms with one gradient estimator(Theorem 1) Stochastic computation graphs give us insight into the behavior of estimators Stochastic computation graphs reveal equivalencies: NVIL[7]=Policy gradient[12] VAE[6]/DRAW[4]=SVG(∞)[5] GAN[3]=SVG(0)[5]
  • 40. References I [1] Charles Blundell et al. “Weight Uncertainty in Neural Networks”. In: ICML (2015). [2] Michael Fu. “Stochastic Gradient Estimation”. In: (2005). [3] Ian J. Goodfellow et al. “Generative Adversarial Networks”. In: NIPS (2014). [4] Karol Gregor et al. “DRAW: A Recurrent Neural Network For Image Generation”. In: ICML (2015). [5] Nicolas Heess et al. “Learning Continuous Control Policies by Stochastic Value Gradients”. In: NIPS (2015). [6] Diederik P Kingma and Max Welling. “Auto-Encoding Variational Bayes”. In: ICLR (2014). [7] Andriy Mnih and Karol Gregor. “Neural Variational Inference and Learning in Belief Networks”. In: ICML (2014).
  • 41. References II [8] David Pfau and Oriol Vinyals. “Connecting Generative Adversarial Networks and Actor-Critic Methods”. In: (2016). [9] Tim Salimans et al. “Evolution Strategies as a Scalable Alternative to Reinforcement Learning”. In: arXiv (2017). [10] John Schulman et al. “Gradient Estimation Using Stochastic Computation Graphs”. In: NIPS (2015). [11] David Silver et al. “Deterministic Policy Gradient Algorithms”. In: ICML (2014). [12] Richard S Sutton et al. “Policy Gradient Methods for Reinforcement Learning with Function Approximation”. In: NIPS (1999).