Introduction to probabilistic programming with pyro

Introduction to Probabilistic
Programming with Pyro
Ahmad Salim Al-Sibahi

Could be worse. Not sure
how, but it could be.
CC BY Kassy at SketchPort

Disposition
● What is probabilistic programming?
● Probabilistic models in Pyro
● Techniques for Bayesian Inference

Probabilistic Programming
is Machine Learning...

Inference is answering questions about data!
Image
Observed Data (d)
Cat or
Dog?
Category
Latent Variable (𝛉)

Sequence of Amino Acids Folded Structure
Observed Data (d) Latent Variable (𝛉)

Voucher
Observed Data (d)
Payment Information
Latent Variable (𝛉)
Sender: Asterix SARL
Recipient: Hamlet-Kierkegaard ApS
Issue Date: April 1st 2015
Total Amount: EUR 98,691.00

Modelling as joint distributions
p(d,𝛉)

Traditional Inference
ℓ(𝛉’|d) ∝ p(d, 𝛉’)
Find 𝛉’ maximizing
likelihood:

Traditional Inference
ℓ(𝛉’|d) ∝ p(d| 𝛉’)
Find 𝛉’ maximizing
likelihood:

Limitations
● Only a single value is given as output 𝛉’
● Not possible to incorporate domain knowledge
● Deep neural networks are not humanly interpretable and require lots of data

Bayesian Inference
p(𝛉|d) ∝ p(d, 𝛉)
Find probability distribution

Bayesian Inference
p(𝛉|d) ∝ p(d|𝛉)p(𝛉)
Find probability distribution

Bayesian inference allows quantiﬁcation of uncertainty

Probabilistic Programming Language Frameworks
i
HackPPL - Facebook
Pyro - Uber and Linux Foundation
TensorFlow Probability - Google
Infer.Net - Microsoft
PyMC3 - NumFOCUS

Probabilistic Programming with
Pyro

Logistic Regression
● Given height and weight
of person, then infer
whether male or female
● Use logistic regression!

Techniques for Bayesian Inference

The challenge of inference
p(𝛉|d) ∝ p(d|𝛉)p(𝛉)

p(𝛉|d) =
p(d|𝛉)p(𝛉)
p(d)

p(𝛉|d) =
p(d|𝛉)p(𝛉)
∫𝛉 p(d|𝛉)
Can be arbitrarily complex
(Multi-modal, saddlepoints, discontinuous)
Often
intractable

PreciseApproximate
Fixed
Flexible
Symbolic
Disintegration
Sampling
Message Passing
Variational Inference

Variational Inference
● Posit a guide q(θ;λ) to approximate
posterior p(θ|d)
● Find optimal λ’ so that q(θ;λ’) has a good
ﬁt w.r.t. p(θ|d)

Stochastic Variational Inference (Hoffman et al. 2013)
● Minimize exclusive divergence by maximizing ELBO:
KL(q(θ;λ)|| p(θ|d)) = log p(d|θ) - ELBO
● ELBO can be estimated by stochastic sampling of parameters
ELBO = E[log q(θ;λ) - log p(d,θ)]
● Optimization works well if samples have low variance
○ Relies on tricks like reparametrization, dependency structure handling and
Rao-Blackwellization

Amortized Inference (Ritchie et al. 2016)
● Assume that guide can be factorized into local and global parameters:
q(θ;λ) = q(θG;λG) ∏i q(θi;λi, λG)
● Learn a function f that maps local parameters from data
q(θ|d;λ) = q(θG;λG) ∏i q(θi;f(di), λG)
● The function f can be a deep neural network!

Reflections on Variational Inference
Pros
● Work with expressive
probabilistic programming
models
● Quantifies uncertainty
● Scalable to large datasets
Cons
● Discrete variables can introduce
high variance
● Can be over-confident in
prediction
● No guarantees on approximation
correctness

Sampling
● Estimate posterior p(θ|d) by
drawing many high-probability
samples of parameters θ1, …, θn
● Samples can be proposed by
custom distribution q(θ) and
accepted when p(θi|d) is
suﬃciently high

Hamiltonian Monte Carlo (Duane et al. 1987)
● Markov chain Monte Carlo-based
● Relies on gradient information
● No U-Turn Sampler (NUTS)
allows automated step-size
adaptation (Hoffman and
Gelman 2011)
θ0 θ1 θ2 θ3
Animation https://chi-feng.github.io/mcmc-demo/

Reflections on Sampling
Pros
● Most flexible technique for
inference
● Captures precise correlations
between parameters
● Provides precise
characterization of true posterior
with enough samples
Cons
● Not efficiently scalable to large
datasets
● Sensitive to landscape of true
posterior distribution
● Latent variables can pose a
problem for differentiation

Probabilistic Programming
● Probabilistic Programming is important because it allows
incorporating domain knowledge and quantifying
uncertainty
● Pyro allows specifying both classical and deep
probabilistic models
● Modern inference algorithms are very powerful, and easily
accessible in Pyro

Introduction to probabilistic programming with pyro

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to probabilistic programming with pyro

Similar to Introduction to probabilistic programming with pyro (20)

Recently uploaded

Recently uploaded (20)

Introduction to probabilistic programming with pyro