Practical Probabilistic Programming with Figaro
Avi Pfeffer
Charles River Analytics
MLConf May 20, 2016
 Why Probabilistic Programming?
 Figaro
 Examples and Applications
 Where We’re Going
Overview
 We want to
 Predict the future
 Infer past causes of current observations
 Learn from experience
 With much less effort and expertise than before
What Are We Trying To Do?
Probabilistic Reasoning Lets You Do All These Things
Probabilistic Reasoning: Predicting the Future
Probabilistic Reasoning: Inferring Factors that Caused
Observations
Probabilistic Reasoning: Using the Past to Predict the
Future
Probabilistic Reasoning: Learning from the Past
 You need to
 Implement the representation
 Implement the probabilistic inference algorithm
 Implement the learning algorithm
 Interact with data
 Integrate with an application
But Probabilistic Reasoning Is Hard!
Drastically reduce the work to create
probabilistic reasoning applications
Goal of Probabilistic Programming
1. Expressive programming language for representing models
2. General-purpose inference and learning algorithms apply to
models written in the language
All you have to do is represent the model in code and you
automatically get the application
How Probabilistic Programming Achieves This
 It’s easy to incorporate rich domain knowledge into
probabilistic programs
 Probabilistic programming can work well even when you don’t
have a lot of data
 Probabilistic programming models are explainable and
understandable
 Probabilistic programming can predict outputs belonging to
complex data types of variable size, like social networks
Probabilistic Programming Compared to Deep Learning
 Why Probabilistic Programming?
 Figaro
 Examples and Applications
 Where We’re Going
Overview
Figaro goals
A probabilistic programming system that is:
 Easy to interact with data
 Easy to integrate with applications
 General and expressive representation to capture common
programming patterns
 An extensible library of inference algorithms
 Figaro provides data structures to represent probabilistic
programs
 Scala programs construct the Figaro models
 Inference algorithms implemented in Scala operate on these
models
Figaro as a Scala Library
 Easy interaction with data and integration with applications
 Can embed general-purpose code in probabilistic programs
 Can construct models programmatically
 Figaro inherits functional and object-oriented features of Scala
 Can use Scala functions to specify constraints
 Scala supports extensible library of inference algorithms
Advantages of Scala Embedding
 Hard to reason about models at source level, since arbitrary
Scala code may be embedded in model
 Syntax not as elegant as self-contained languages
 Steeper learning curve
 You need to learn Scala and Figaro
 But we have found that beginners can easily learn to write models
quickly
We have found that the power and practicality of Figaro more
than make up for these disadvantages
Disadvantages of Scala Embedding
 Why Probabilistic Programming?
 Figaro
 Examples and Applications
 Where We’re Going
Overview
Figaro novices were able to quickly build up an
integrated probabilistic reasoning application
Hydrological Terrain Modeling for Army Logistics
We were able to perform a sophisticated analysis far
better than our previous non-probabilistic method
Malware Lineage (DARPA Cyber Genome)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Parent
Correct
Parent
Precision
Parent
Recall
Parent
FMeasure
New Algorithm
Old Algorithm
With New
Features
Phase I IV &V
Result
Tracklet Merging (DARPA PPAML Challenge Problem)
0.3
0.2
0.5
0.3
0.2
0.5
0.7
0.2
0.1
0.7
0.2
0.1
=
We came up with a new algorithm that
we would not have thought of without
probabilistic programming and
expressed it in one slide
class Tracklet(
toCandidates: List[(Double, Tracklet)],
fromCandidates: List[(Double, Tracklet)]
){
val next = Select(toCandidates: _*)
val previous = Select(fromCandidates: _*)
}
for (source <- sources) {
val nextPrevious =
Chain(source.next,
nextTracklet => nextTracklet.previous)
nextPrevious.observe(source)
}
Tracklet Merging in Figaro
 Why Probabilistic Programming?
 Figaro
 Examples and Applications
 Where We’re Going
Overview
 We’ve significantly reduced the effort required to build complex
probabilistic reasoning applications
 But it still requires a fair amount of machine learning expertise
to make these applications work
 You need to know how to represent models
 You need to know how to choose and configure inference algorithms
Current State of the Art
A probabilistic programming framework that domain experts with
little or no machine learning knowledge can use
1. An English-like language for describing a domain
2. A method for automatically filling in the gaps in a model
3. Automated inference techniques that optimally choose and
configure algorithms for a particular problem
Our Goal
1. Decompose an inference problem into many subproblems
2. Optimize the choice an appropriate solver for each
subproblem
3. Combine the subproblem solutions into a solution of the
whole problem
Automated Inference Strategy
 Subproblems are represented as factor graphs
 Factored algorithms are used to solve subproblems
 E.g., variable elimination, belief propagation, Gibbs sampling
 We intelligently choose between the available algorithms on
each subproblem
Structured Factored Inference (SFI)
Compiled Graphical Model of Figaro Program
aMUX
fb
T fb
F
MUX
fc
T fc
F
b c
x1b
T x2b
T
y1b
T y2b
T
x1b
F x2b
F
y1b
F y2b
F
x1c
T x2c
T
y1c
T y2c
T
x1c
F x2c
F
y1c
F y2c
F
Decompose Problem Automatically
aMUX
fb
T fb
F
MUX
fc
T fc
F
b c
x1b
T x2b
T
y1b
T y2b
T
x1b
F x2b
F
y1b
F y2b
F
x1c
T x2c
T
y1c
T y2c
T
x1c
F x2c
F
y1c
F y2c
F
Subproblems
Top level problem
Combine and Reuse Solutions
aMUX
fb
T fb
F
MUX
fc
T fc
F
b c
x1b
T x2b
T
y1b
T y2b
T
x1b
F x2b
F
y1b
F y2b
F
x1c
T x2c
T
y1c
T y2c
T
x1c
F x2c
F
y1c
F y2c
F
Subproblems
Top level problem
pT pF pT pF
Optimize Each Subproblem Individually
Results on a model structure used for medical diagnosis
Number of diseases
L1Error
 It’s easy to write probabilistic programs that define very large or
even infinite factor graphs
32
Challenge
You can’t construct the factor graph
 We can solve problems with infinitely many variables
 Partially expand the problem
 Quantify the effect of the unexpanded part of the program on the
query
 Produces lower and upper bounds on answer to the query
 As you expand more of the problem, the bounds get tighter
Lazy Inference
Grammar with Sentences of Unbounded Length
34
Grammar with Infinite Sentences
35
 Probabilistic reasoning helps you predict, infer, and learn
 Probabilistic programming makes this much easier!
 Figaro is a mature, practical probabilistic programming system
with many applications
 We’re striving to make probabilistic programming even easier!
Conclusion
 This material is based upon work supported by the United
States Air Force under Contract No. FA8750-14-C-0011.
 Any opinions, findings and conclusions or recommendations
expressed in this material are those of the author(s) and do not
necessarily reflect the views of the United States Air Force.
Acknowledgement
More Information
• Figaro is open source
 Contributions welcome!
 Releases can be downloaded from
www.cra.com/figaro
 Figaro source is on GitHub at
www.github.com/p2t2
 Version 4.0 was released in March
• If you have any questions, feel
free to contact me at
apfeffer@cra.com
39% discount code on Manning books: ctwmlconfsea

Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/20/16

  • 1.
    Practical Probabilistic Programmingwith Figaro Avi Pfeffer Charles River Analytics MLConf May 20, 2016
  • 2.
     Why ProbabilisticProgramming?  Figaro  Examples and Applications  Where We’re Going Overview
  • 3.
     We wantto  Predict the future  Infer past causes of current observations  Learn from experience  With much less effort and expertise than before What Are We Trying To Do?
  • 4.
    Probabilistic Reasoning LetsYou Do All These Things
  • 5.
  • 6.
    Probabilistic Reasoning: InferringFactors that Caused Observations
  • 7.
    Probabilistic Reasoning: Usingthe Past to Predict the Future
  • 8.
  • 9.
     You needto  Implement the representation  Implement the probabilistic inference algorithm  Implement the learning algorithm  Interact with data  Integrate with an application But Probabilistic Reasoning Is Hard!
  • 10.
    Drastically reduce thework to create probabilistic reasoning applications Goal of Probabilistic Programming
  • 11.
    1. Expressive programminglanguage for representing models 2. General-purpose inference and learning algorithms apply to models written in the language All you have to do is represent the model in code and you automatically get the application How Probabilistic Programming Achieves This
  • 12.
     It’s easyto incorporate rich domain knowledge into probabilistic programs  Probabilistic programming can work well even when you don’t have a lot of data  Probabilistic programming models are explainable and understandable  Probabilistic programming can predict outputs belonging to complex data types of variable size, like social networks Probabilistic Programming Compared to Deep Learning
  • 13.
     Why ProbabilisticProgramming?  Figaro  Examples and Applications  Where We’re Going Overview
  • 14.
    Figaro goals A probabilisticprogramming system that is:  Easy to interact with data  Easy to integrate with applications  General and expressive representation to capture common programming patterns  An extensible library of inference algorithms
  • 15.
     Figaro providesdata structures to represent probabilistic programs  Scala programs construct the Figaro models  Inference algorithms implemented in Scala operate on these models Figaro as a Scala Library
  • 16.
     Easy interactionwith data and integration with applications  Can embed general-purpose code in probabilistic programs  Can construct models programmatically  Figaro inherits functional and object-oriented features of Scala  Can use Scala functions to specify constraints  Scala supports extensible library of inference algorithms Advantages of Scala Embedding
  • 17.
     Hard toreason about models at source level, since arbitrary Scala code may be embedded in model  Syntax not as elegant as self-contained languages  Steeper learning curve  You need to learn Scala and Figaro  But we have found that beginners can easily learn to write models quickly We have found that the power and practicality of Figaro more than make up for these disadvantages Disadvantages of Scala Embedding
  • 18.
     Why ProbabilisticProgramming?  Figaro  Examples and Applications  Where We’re Going Overview
  • 19.
    Figaro novices wereable to quickly build up an integrated probabilistic reasoning application Hydrological Terrain Modeling for Army Logistics
  • 20.
    We were ableto perform a sophisticated analysis far better than our previous non-probabilistic method Malware Lineage (DARPA Cyber Genome) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Parent Correct Parent Precision Parent Recall Parent FMeasure New Algorithm Old Algorithm With New Features Phase I IV &V Result
  • 21.
    Tracklet Merging (DARPAPPAML Challenge Problem) 0.3 0.2 0.5 0.3 0.2 0.5 0.7 0.2 0.1 0.7 0.2 0.1 = We came up with a new algorithm that we would not have thought of without probabilistic programming and expressed it in one slide
  • 22.
    class Tracklet( toCandidates: List[(Double,Tracklet)], fromCandidates: List[(Double, Tracklet)] ){ val next = Select(toCandidates: _*) val previous = Select(fromCandidates: _*) } for (source <- sources) { val nextPrevious = Chain(source.next, nextTracklet => nextTracklet.previous) nextPrevious.observe(source) } Tracklet Merging in Figaro
  • 23.
     Why ProbabilisticProgramming?  Figaro  Examples and Applications  Where We’re Going Overview
  • 24.
     We’ve significantlyreduced the effort required to build complex probabilistic reasoning applications  But it still requires a fair amount of machine learning expertise to make these applications work  You need to know how to represent models  You need to know how to choose and configure inference algorithms Current State of the Art
  • 25.
    A probabilistic programmingframework that domain experts with little or no machine learning knowledge can use 1. An English-like language for describing a domain 2. A method for automatically filling in the gaps in a model 3. Automated inference techniques that optimally choose and configure algorithms for a particular problem Our Goal
  • 26.
    1. Decompose aninference problem into many subproblems 2. Optimize the choice an appropriate solver for each subproblem 3. Combine the subproblem solutions into a solution of the whole problem Automated Inference Strategy
  • 27.
     Subproblems arerepresented as factor graphs  Factored algorithms are used to solve subproblems  E.g., variable elimination, belief propagation, Gibbs sampling  We intelligently choose between the available algorithms on each subproblem Structured Factored Inference (SFI)
  • 28.
    Compiled Graphical Modelof Figaro Program aMUX fb T fb F MUX fc T fc F b c x1b T x2b T y1b T y2b T x1b F x2b F y1b F y2b F x1c T x2c T y1c T y2c T x1c F x2c F y1c F y2c F
  • 29.
    Decompose Problem Automatically aMUX fb Tfb F MUX fc T fc F b c x1b T x2b T y1b T y2b T x1b F x2b F y1b F y2b F x1c T x2c T y1c T y2c T x1c F x2c F y1c F y2c F Subproblems Top level problem
  • 30.
    Combine and ReuseSolutions aMUX fb T fb F MUX fc T fc F b c x1b T x2b T y1b T y2b T x1b F x2b F y1b F y2b F x1c T x2c T y1c T y2c T x1c F x2c F y1c F y2c F Subproblems Top level problem pT pF pT pF
  • 31.
    Optimize Each SubproblemIndividually Results on a model structure used for medical diagnosis Number of diseases L1Error
  • 32.
     It’s easyto write probabilistic programs that define very large or even infinite factor graphs 32 Challenge You can’t construct the factor graph
  • 33.
     We cansolve problems with infinitely many variables  Partially expand the problem  Quantify the effect of the unexpanded part of the program on the query  Produces lower and upper bounds on answer to the query  As you expand more of the problem, the bounds get tighter Lazy Inference
  • 34.
    Grammar with Sentencesof Unbounded Length 34
  • 35.
  • 36.
     Probabilistic reasoninghelps you predict, infer, and learn  Probabilistic programming makes this much easier!  Figaro is a mature, practical probabilistic programming system with many applications  We’re striving to make probabilistic programming even easier! Conclusion
  • 37.
     This materialis based upon work supported by the United States Air Force under Contract No. FA8750-14-C-0011.  Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force. Acknowledgement
  • 38.
    More Information • Figarois open source  Contributions welcome!  Releases can be downloaded from www.cra.com/figaro  Figaro source is on GitHub at www.github.com/p2t2  Version 4.0 was released in March • If you have any questions, feel free to contact me at apfeffer@cra.com 39% discount code on Manning books: ctwmlconfsea