This document introduces probabilistic programming and model-based machine learning. It provides an example case study of using these approaches to analyze the daily travel times of a bicyclist. The key steps are: (1) defining the variables and their relationships using a probabilistic graphical model, (2) specifying the joint distribution as the product of priors and likelihoods, (3) choosing appropriate distributions for the average travel time and uncertainty, and (4) using conjugate priors and normal likelihoods for inference. The goal is to learn the distributions of average travel time and day-to-day variability from real travel time data.
tapal brand analysis PPT slide for comptetive data
Introduction to Model-Based Machine Learning for Transportation
1. Introduction Probabilistic Programming Case Study
Introduction to Model-Based Machine
Learning
A Webinar to TRB ADB40 Big Data Initiative
by
Daniel Emaasit1
1Ph.D. Student
Department of Civil and Environmental Engineering
University of Nevada, Las Vegas, USA
emaasit@unlv.nevada.edu
September 27 2016
1 / 21
2. Introduction Probabilistic Programming Case Study
Acknowledgments1
Prof. Francisco C. Pereira Dr. Filipe Rodrigues
1
Machine Learning for Mobility group, DTU: Tutorial from Summer
school on Big Data, Mobility Patterns, Transport Analytics, July 1-3, 2016,
Filipe Rodrigues and Francisco Pereira 2 / 21
4. Introduction Probabilistic Programming Case Study
Current Challenges in Adopting Machine Learning
Generally, current challenges in adopting ML:
Overwhelming number of traditional ML methods to learn
Deciding which algorithm to use or why
Some custom problems may not fit with any existing
algorithm
4 / 21
5. Introduction Probabilistic Programming Case Study
What is Model-Based Machine Learning?
A different viewpoint for machine learning proposed by
Bishop (2013)2, Winn et al. (2015)3
* Goal: + Provide a single development framework which
supports the creation of a wide range of bespoke models
* The core idea: + all assumptions about the problem domain
are made explicit in the form of a model
2
Bishop, C. M. (2013). Model-Based Machine Learning. Philosophical
Transactions of the Royal Society A, 371, pp 1–17
3
Winn, J., Bishop, C. M., Diethe, T. (2015). Model-Based Machine
Learning. Microsoft Research Cambridge. http://www.mbmlbook.com.
5 / 21
6. Introduction Probabilistic Programming Case Study
What is a Model in MBML?
A Model:
is a set of assumptions, expressed in mathematical/graphical
form
expresses all parameters, variables as random variables
shows the dependency between variables
Figure 2: Description of a model
6 / 21
7. Introduction Probabilistic Programming Case Study
Key Ideas of MBML?
MBML is built upon 3 key ideas
the use of Probabilistic Graphical Models (PGM)
the adoption of Bayesian ML
the application of fast, deterministic inference algorithms
7 / 21
8. Introduction Probabilistic Programming Case Study
Key Idea 1: Probabilistic Graphical Models
Combine probability theory with graphs (e.g Factor Graphs)
8 / 21
9. Introduction Probabilistic Programming Case Study
Key Idea 2: Bayesian Machine Learning
Everything follows from two simple rules of probability
theory
9 / 21
10. Introduction Probabilistic Programming Case Study
Key Idea 3: Inference Algorithms
the application of fast, approximate inference algorithms by
local message passing
Variational Bayes
Belief Propagation, Loopy Belief Propagation
Expectation Propagation
Learning by local message passing
Inference algorithms
Figure 3: MCMC vs Approximate methods
10 / 21
11. Introduction Probabilistic Programming Case Study
Stages of MBML
3 stages of MBML
Build the model: Joint probability distribution of all the
relevant variables (e.g as a graph)
Incorporate the observed data
Perform inference to learn parameters of the latent
variables
11 / 21
13. Introduction Probabilistic Programming Case Study
Benefits of MBML
Potential benefits of this approach
Provides a systematic process of creating ML solutions
Allows for incorporation of prior knowledge
Allows for handling uncertainity in a principled manner
Does not suffer from overfitting
Custom solutions are built for specific problems
Allows for quick building of several alternative models
Easy to compare those alternatives
It’s general purpose: No need to learn the 1000s of existing
ML algorithms
Separates model from inference/training code
13 / 21
15. Introduction Probabilistic Programming Case Study
What is Probabilistic Programming?
A software package that takes the model and then
automatically generate inference routines (even source code!)
to solve a wide variety of models
Takes programming languages and adds support for:
random variables
constraints on variables
inference
Examples of PP software packages
Infer.Net (C#, C++)
Stan (R, python, C++)
BUGS
church
PyMC (python)
15 / 21
18. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Analysing the distribution of an individual cyclist’s daily
travel time to work
18 / 21
19. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Analysing the distribution of an individual cyclist’s daily
travel time to work
Identify the variables of interest
ttn - travel time in the
nth
day
at - average travel-time
tu - uncertainty
ttn
at
tu
N
18 / 21
20. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Analysing the distribution of an individual cyclist’s daily
travel time to work
Specify relationships between variables
ttn - travel time in the
nth
day
at - average travel-time
tu - uncertainty
ttn
at
tu
N
19 / 21
21. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Analysing the distribution of an individual cyclist’s daily
travel time to work
Specify relationships between variables
ttn - travel time in the
nth
day
at - average travel-time
tu - uncertainty
ttn
at
tu
N
Joint distribution is given by
p(tt, at, tu) = p(at) p(tu)
priors
×
N
n=1
p(ttn|at, tu)
likelihood
19 / 21
22. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Analysing the distribution of an individual cyclist’s daily
travel time to work
Joint distribution is given by
p(tt, as, tu) = p(at) p(tu)
priors
×
N
n=1
p(ttn|at, tu)
likelihood
How should we define the likelihood p(ttn|at, tu)?
20 / 21
23. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Analysing the distribution of an individual cyclist’s daily
travel time to work
Joint distribution is given by
p(tt, as, tu) = p(at) p(tu)
priors
×
N
n=1
p(ttn|at, tu)
likelihood
How should we define the likelihood p(ttn|at, tu)?
the distribution’s mean is the cyclist’s average travel time
the distribution’s variance determines how much the travel
time varies from day to day (e.g. variations in traffic
conditions)
20 / 21
24. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Analysing the distribution of an individual cyclist’s daily
travel time to work
Joint distribution is given by
p(tt, as, tu) = p(at) p(tu)
priors
×
N
n=1
p(ttn|at, tu)
likelihood
How should we define the likelihood p(ttn|at, tu)?
the distribution’s mean is the cyclist’s average travel time
the distribution’s variance determines how much the travel
time varies from day to day (e.g. variations in traffic
conditions)
What distributions should p(at) and p(tu) have?
20 / 21
25. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Analysing the distribution of an individual cyclist’s daily
travel time to work
Joint distribution is given by
p(tt, as, tu) = p(at) p(tu)
priors
×
N
n=1
p(ttn|at, tu)
likelihood
How should we define the likelihood p(ttn|at, tu)?
the distribution’s mean is the cyclist’s average travel time
the distribution’s variance determines how much the travel
time varies from day to day (e.g. variations in traffic
conditions)
What distributions should p(at) and p(tu) have?
conjugate priors!
20 / 21
26. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Likelihood given by
p(ttn|at, tu) = N(ttn|at, tu)
We now know what distribution forms to assign to the
priors...
21 / 21
27. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Likelihood given by
p(ttn|at, tu) = N(ttn|at, tu)
We now know what distribution forms to assign to the
priors...
p(at) = N(at|µ, σ2
)
p(tu) = cauchy(tu|µ, σ2
)
21 / 21
28. Introduction Probabilistic Programming Case Study
A Bicyclist’s Daily Travel
Likelihood given by
p(ttn|at, tu) = N(ttn|at, tu)
We now know what distribution forms to assign to the
priors...
p(at) = N(at|µ, σ2
)
p(tu) = cauchy(tu|µ, σ2
)
The choice of the initial parameters of the prior is significant
only if you have a small number of observations
As the number of observations increases, the influence of the
initial prior on the posterior declines
21 / 21