Introduction to Bayesian modelling and inference with Pyro for meetup group. Part of the presentation is a hands on, with some examples available here: https://github.com/ahmadsalim/2019-meetup-pyro-intro
13. Limitations
● Only a single value is given as output 𝛉’
● Not possible to incorporate domain knowledge
● Deep neural networks are not humanly interpretable and require lots of data
17. Probabilistic Programming Language Frameworks
i
HackPPL - Facebook
Pyro - Uber and Linux Foundation
TensorFlow Probability - Google
Infer.Net - Microsoft
PyMC3 - NumFOCUS
26. Variational Inference
● Posit a guide q(θ;λ) to approximate
posterior p(θ|d)
● Find optimal λ’ so that q(θ;λ’) has a good
fit w.r.t. p(θ|d)
27. Stochastic Variational Inference (Hoffman et al. 2013)
● Minimize exclusive divergence by maximizing ELBO:
KL(q(θ;λ)|| p(θ|d)) = log p(d|θ) - ELBO
● ELBO can be estimated by stochastic sampling of parameters
ELBO = E[log q(θ;λ) - log p(d,θ)]
● Optimization works well if samples have low variance
○ Relies on tricks like reparametrization, dependency structure handling and
Rao-Blackwellization
28. Amortized Inference (Ritchie et al. 2016)
● Assume that guide can be factorized into local and global parameters:
q(θ;λ) = q(θG;λG) ∏i q(θi;λi, λG)
● Learn a function f that maps local parameters from data
q(θ|d;λ) = q(θG;λG) ∏i q(θi;f(di), λG)
● The function f can be a deep neural network!
29. Reflections on Variational Inference
Pros
● Work with expressive
probabilistic programming
models
● Quantifies uncertainty
● Scalable to large datasets
Cons
● Discrete variables can introduce
high variance
● Can be over-confident in
prediction
● No guarantees on approximation
correctness
30. Sampling
● Estimate posterior p(θ|d) by
drawing many high-probability
samples of parameters θ1, …, θn
● Samples can be proposed by
custom distribution q(θ) and
accepted when p(θi|d) is
sufficiently high
31. Hamiltonian Monte Carlo (Duane et al. 1987)
● Markov chain Monte Carlo-based
● Relies on gradient information
● No U-Turn Sampler (NUTS)
allows automated step-size
adaptation (Hoffman and
Gelman 2011)
θ0 θ1 θ2 θ3
Animation https://chi-feng.github.io/mcmc-demo/
32. Reflections on Sampling
Pros
● Most flexible technique for
inference
● Captures precise correlations
between parameters
● Provides precise
characterization of true posterior
with enough samples
Cons
● Not efficiently scalable to large
datasets
● Sensitive to landscape of true
posterior distribution
● Latent variables can pose a
problem for differentiation
34. Probabilistic Programming
● Probabilistic Programming is important because it allows
incorporating domain knowledge and quantifying
uncertainty
● Pyro allows specifying both classical and deep
probabilistic models
● Modern inference algorithms are very powerful, and easily
accessible in Pyro