Into to prob_prog_hari

Intro to Probabilistic
Programming –
PyMC3 & Edward
HARIHARAN CHANDRASEKARAN

Frequentists vs Bayesian
 In a debugging problem, frequentist function with argument ‘My code
passed all X tests, is my code bug free?’ would return a YES
 Bayesian function with argument ‘Often my code has bugs. My code
passed all X tests; is my code bug free?’ would return Yes with prob 0.8
and NO with prob 0.2
 Additional argument in Bayesian – Often my code has bugs.
 This is called prior
 Prior is our belief about the situation

Why probabilistic
 Number of instances as evidence – N
 As N -> ∞ Bayesian (often) aligns with frequentist results
 For small N, inference is unstable. Frequentist estimates have more
variance and larger confidence intervals. Bayesian inference excels
 For large N, quote Andrew Gelman – N can never be large enough
 Bayes Formula fundamental

MCMC
MCMC
INPUT PROCESS OUTPUT

MCMC
INPUT
PROCESS
OUTPUT
data
model
sampler
prior

Metropolis Rule
High P region
P0, θ0
P1, θ1
• If P1 > P0 accept the
jump
• If P1 < P0 we make the
jump with a probability
P1/P0
• Successful jumps form
a chain called Markov
Chain
• Algorithm is endless –
Orbit true solution but
never stop at it
• How to make jumps?
θ1 = θ0 + Ń(0, Δθ)
Metropolis Hastings – Jump with probability min(q, 1) where
q = P(θ1)/J(θ1 | θ0)
----------------------------------
P(θ0)/J(θ0 | θ1)

Gibbs Sampling
Assuming θ1, θ2, θ3 are parameters
of posterior
• We need to know this
conditional probability
distributions
• Gibbs sampling thus is
a special case of
metropolis rule
• Successful jumps form
a chain called Markov
Chain
• Algorithm is endless –
Orbit true solution but
never stop at it
• Practically not always
feasible
Mathematically proven that this algorithm
asymptotically converges to the solution
• Define P(θ1, θ2, θ3)
• Sample θ0
1, θ0
2, θ0
3 from prior
• For t in 1:T
• Θt
1 ~ P(θ1 | θt-1
2, θt-1
3)
• Θt
2 ~ P(θ2 | θt
1, θt-1
3)
• Θt
3 ~ P(θ3 | θt
1, θt
2)

Variational Inference
Information Theory
• Information = - log(p(x))
• Entropy = - ∑ p(x)log(p(x))
• Differential Entropy = - ∫ p(x)log(p(x))dx
KL Divergence
• Measures distance between 2 probability
distributions
• KL(p||q) = [- ∑ p(x)log(q(x))] - [- ∑
p(x)log(p(x))] = - ∑ p(x)log(q(x)/p(x))
• KL ≥ 0
• KL(p||q) <> KL(q||p)
Use KL Divergence
• We have P(x, z) but we want to know P(z|x) -
> P’
• So we create q(z)
• KL(p’||q) + L = log(p(x))
• L is a function of p(x, z) and q(z)
• Minimizing KL same as maximizing L
• After some neat math, we get q to be an
exponential distribution -> Neat and
convenient
VI vs MCMC
• VI is deterministic and is an approximation
• MCMC is a sampling solution and is an
approximation
• Generally, MCMC solutions considered to be
more accurate

Coin Toss
 >>> x_train
array([1, 1, 0, 1, 0, 1, 1,……., 0,
1], dtype=int32)
 >>> sum(x_train == 0)
40
 >>> sum(x_train == 1)
60
You toss a coin
100 times and
see 60 heads. Is it
a fair coin?

How do we build a model in
Probabilistic models?
 Posit a generative model
 Start with a simple story about how the data is generated
 What probability distribution could explain the seeing coin tosses like the ones
observed? – Forward Thinking
 Infer Model Parameters
 Infer specifics about story based on observations
 Given model and data, how likely is it that the coin has a particular fairness? –
Backward Thinking
 Criticize the model
 Can simple story explain the observations? Can we improve the story?

Model Building
 Process of stating our beliefs about how the data could have been
generated
 Models are simplified descriptions of the data
 Models can be declared as abstract mathematical descriptions or as code
 Models allow for simulation of data

Generative Model for a coin toss
 Model expressed in terms of probability distribution
 P(params, data) = p(params) x p(data | params)
 params: fairness of the coin
 data: coin tosses
 p(params): prior probability of a certain fairness
 p(data|params): conditional probability that the data is observed given that coin
has a certain fairness
 p(params, data): Joint Probability that botht the data is observed and coin has a
certain fairness

Generative
Model for
coin toss  >>> pheads =
Uniform(low=0., high=1.)
 >>> c =
Bernoulli(probs=pheads,
sample_shape=100)
Our story in Edward
Uniform: Prior
Bernoulli: conditional

Real Life Scenarios
 We take an open dataset on Climate of Major cities across the world from
Kaggle. Click here for dataset
 Using Bayesian Inference, we try to answer the following questions:
 Are the major cities having temperature variability higher than a threshold in
the weather pattern?
 What is the probability that any randomly chosen year has high temperature
increase greater than a threshold?
 PPL framework – PyMC3
 Solution available in a Jupyter Notebook – Click here for Notebook
 Notebook will continue to be edited

Real Life Scenarios (Contd.)
 We take an open dataset on district-wise education metrics in India from
Kaggle. Click here for dataset
 Using Bayesian ML, we try to do Regression and Classification on above
problem
 PPL framework – PyMC3 and Edward
 Solution available in Jupyter Notebooks – Click here and here for
Notebooks
 Notebooks will continue to be edited

Probabilistic – Pros /
Criticism
 Can work with small / medium
data
 Research on black box
interpretability
 Nuanced risk functions – Good
inference and decision theory, not
just prediction
• No Free Lunch
• Bad Throughput
Skill sets – Statistical Analysis, ML/DL, Advanced
Probability and Statistics

Closing tips for
theoretical and
practical start points
 Best is to build models from scratch
 Or use weights of existing model as
priors and continue
 Hypothesis testing in probabilistic way
 Outlier analysis
 Comparison of a NN traditional vs
probabilistic
 PyMC3 and Edward
 Stan getting popular
 Edward 2.0 will be compatible with latest
TF
 Pymc4 will be built on TF
• Learn MCMC
• MCMC algorithms like:
• Gibbs Sampling
• Metropolis Hastings
• Variational Inference
• How to create priors
• Model evaluation

Thank You!
 Questions / Comments

Into to prob_prog_hari

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Into to prob_prog_hari

Similar to Into to prob_prog_hari (20)

Recently uploaded

Recently uploaded (20)

Into to prob_prog_hari