Probabilistic Programming:
Why, What, How, When
Beau Cronin
@beaucronin
40 Action-Packed Minutes
‣ Why you should care - what’s wrong with what we’ve got?
‣ What probabilistic programming is, and what programs look like
‣ How you can get started today
‣ When will all of this be ready for production use?
Why?
We use data to learn about the world
Traditional!
Machine Learning
Hierarchical
Bayesian Modeling
Large Scale Small
Mature & Robust Tools & frameworks Immature & Spotty
Discard
Structure &
Knowledge
Keep & Leverage
Homogeneous Data Types Heterogeneous
Toolkit,
Theory-light
Philosophical
Approach
Modeling,
Theory-heavy
Why?
G = {V, E}
What order were these links added in?
What messages flow over this link?
What do we know about this user?
Why?
x1 x2 lat1 long1 t1 t2 t3 t4 address1
1 1.2 2 34.0 118.2 2.3 3.4 1.9 10.4 516 61st St,
2 0.1 1 40.7 73.9 -1.5 4.5 8.9 2305 Tustin
3 10.5 0 37.9 122.3 4.7 -2.5 -3.4 1 Market St.
4 8.3 -1 -22.9 43.2 4.2 5.6 1.6 9.5
5 4.9 5 -37.8 -145.0 1600 Pennsyl
6 1.5 1 3.4 4.0 4.6 5.2 650 7th St., S
Positive numbers
Categorical values
Locations Time Series
AddressesMissing values
Why?
Diverse Data
Most real datasets contain compositions of these and
more, but we routinely homogenize in preprocessing
Lorem Ipsum
Trees &
Graphs
Time
Series
Relations
Locations &
Addresses
Images &
Movies
Audio
Sets &
Partitions
Text
Why?
Business Data Is Heterogeneous and
Structured
id: “abcdef”
gender: “Male”
dob: 1978-12-09
twitter_id: 9458201
Profile
2014-01-21 18:41:04, “https://devcenter.heroku.com/articles/quickstart”, …
2014-01-20 12:35:56, “https://devcenter.heroku.com/categories/java”, …
2014-01-20 09:12:52, “https://devcenter.heroku.com/articles/ssl-endpoint”, …
Page Views
Order Date Order ID Title Category ASIN/ISBN Release DateConditionSeller Per Unit Price
1/5/13 002-1139353-0278652 Under Armour Men's Resistor No Show Socks,pack of 6 SocksApparel B003RYQJJW new The Sock Company, Inc.$21.99
1/5/13 002-1139353-0278652 Under Armour Men's Resistor No Show Socks,pack of 6 SocksApparel B004UONNXI new The Sock Company, Inc.$21.99
1/8/13 002-2593752-8837806 CivilWarLand in Bad DeclinePaperback 1573225797 1/31/97 new Amazon.com LLC $8.4
1/8/13 109-0985451-2187421 Nothing to Envy: Ordinary Lives in North KoreaPaperback 385523912 9/20/10 new Amazon.com LLC$10.88
1/12/13 109-8581642-2322617 Excession Mass Market Paperback553575376 2/1/98 new Amazon.com LLC $7.99
Transactions
[
{
text: “key to compelling VR is…”,
retweet_count: 3,
favorites_count: 5,
urls: [ ],
hashtags: [ ],
in_reply_to: 39823792801012
…
},
{
text: “@John4man really liked your piece”,
retweets: 0,
favorites: 0,
…
}
]
Social Posts
[ 657693, 7588892, 9019482, …]
Followers
blocked: False
want_retweets: True
marked_spam: False
since: 2013-09-13
Relationship
Every Domain Is Heterogeneous
‣ Health data: doctor notes, lab results, imaging, family history,
prescriptions
‣ Quantified self: motion sensors, heart rate, GPS tracks, self-
reporting, sleep patterns
‣ Autonomous vehicles: LIDAR, cameras, maps, audio, gyros,
telemetry, GPS
Why?
Mostly, no one even tries
to jointly model these
different kinds of data
Why?
A probabilistic programming system is…
a language + {compiler, interpreter}
	 or 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 that
a {library, framework} for an existing language
- includes random choices as native elements
- and provides a clean separation between probabilistic modeling
and inference
- and may provide automated generation of inference solutions for a
given program
What?
Probabilistic Programming
Systems Model the World
‣ Programs directly represent the data generation process
‣ Measurement processes can be modeled directly, including their
imperfections and the uncertainty that comes with them
‣ Philosophy
‣ DO: capture the essential aspects of real-world processes in a model
‣ DON’T: torture the data into the right form for an algorithm
What?
A Probability Model
✕ N
Fixed
Observable
Unknown
Constant values and !
structural assumptions
Variables that discriminate
between hypotheses
Data and potential data
What?
Obligatory Bayes’ Rule
Pr(H | D, A) ∝ Pr(D | H, A) Pr(H | A)
Data
Hypotheses
Pr(H | D) ∝ Pr(D | H) Pr(H)
Assumptions
What?
!
!
!
fair-prior = .999
!
fair-coin? = flip(fair-prior)
!
if fair-coin?:
weight = 0.5
else:
weight = 0.9
!
observe(repeat(flip(weight), 10)),
[H, H, H, H, H, H, H, H, H, H])
!
query(fair-coin?)
First example: Deciding if a coin is fair based on flips
Assumptions
!
Unknowns
!
Observables
Probabilistic Programming
Systems Are Diverse
‣ Library vs. stand-alone language
‣ Base language: Scala, Lisp, Python
‣ Manual, semi-, or fully-automated inference
‣ Modeling domain: directed/undirected graphical models, relational
data, all programs
‣ Home field: cognitive science, programming languages, databases,
Bayesian statistics, artificial intelligence
What?
PPSs Compared
Type Language Inference
BLOG Stand-alone Custom Fully Auto
BUGS / JAGS Stand-alone Custom Fully Auto
STAN Hybrid R, Python Fully Auto
PyMC Library Python Manual
Infer.net Library C# Semi-auto
Church Stand-alone Lisp Fully Auto
Venture Stand-alone Javascript, Lisp Semi-auto
Figaro Library Scala Semi-auto
factorie Library Scala Semi-auto
What?
infer.net
‣ A C# framework (also F#)
‣ Developed at MSR
‣ Under active development, with good tutorials and many well-
documented examples
How?
VariableArray<bool> controlGroup =
Variable.Observed(new bool[] { false, false, true, false, false });
VariableArray<bool> treatedGroup =
Variable.Observed(new bool[] { true, false, true, true, true });
Range i = controlGroup.Range; Range j = treatedGroup.Range;
!
Variable<bool> isEffective = Variable.Bernoulli(0.5);
!
Variable<double> probIfTreated, probIfControl;
using (Variable.If(isEffective))
{
// Model if treatment is effective
probIfControl = Variable.Beta(1, 1);
controlGroup[i] = Variable.Bernoulli(probIfControl).ForEach(i);
probIfTreated = Variable.Beta(1, 1);
treatedGroup[j] = Variable.Bernoulli(probIfTreated).ForEach(j);
}
!
using (Variable.IfNot(isEffective))
{
// Model if treatment is not effective
Variable<double> probAll = Variable.Beta(1, 1);
controlGroup[i] = Variable.Bernoulli(probAll).ForEach(i);
treatedGroup[j] = Variable.Bernoulli(probAll).ForEach(j);
}
!
InferenceEngine ie = new InferenceEngine();
Console.WriteLine("Probability treatment has an effect = " + ie.Infer(isEffective));
Infer.net example: Is a new treatment effective?
http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Clinical%20trial%20tutorial.aspx
Observations
Unknown
Assumptions &
Unknowns
Query
PyMC
‣ Python (duh)
‣ Go watch Thomas Wiecki’s talk from PyData NY
‣ http://twiecki.github.io/blog/2013/12/12/bayesian-data-analysis-pymc3/
‣ And read Bayesian Methods for Hackers by Cam Davidson-Pilon et al.
How?
Church
‣ A Lisp
‣ Originally created to model cognitive development and human reasoning
‣ Active inference research, several implementations
‣ Connection between functional purity / independence vs. stochastic
memoization / exchangeability
‣ Hypothesis space is possible program executions
‣ “Probabilistic Models of Cognition”
How?
;stochastic memoization generator for class assignments
;sometimes return a previous symbol, sometimes create a new one
(define class-distribution (DP-stochastic-mem 1.0 gensym))
!
;associate a class with an object via memoization
(define object->class
(mem (lambda (object) (class-distribution))))
!
;associate gaussian parameters with a class via memoization
(define class->gaussian-parameters
(mem (lambda (class) (list (gaussian 65 10) (gaussian 0 8)))))
!
;generate observed values for an object
(define (observe object)
(apply gaussian (class->gaussian-parameters (object->class object))))
!
;generate observations for some objects
(map observe '(tom dick harry bill fred))
modified from https://probmods.org/non-parametric-models.html
Church example: Infinite Gaussian Mixture Model
(define kind-distribution (DPmem 1.0 gensym))
!
(define feature->kind
(mem (lambda (feature) (kind-distribution))))
!
(define kind->class-distribution
(mem (lambda (kind) (DPmem 1.0 gensym))))
!
(define feature-kind/object->class
(mem (lambda (kind object)
(sample (kind->class-distribution kind)))))
!
(define class->parameters
(mem (lambda (object-class) (first (beta 1 1)))))
!
(define (observe object feature)
(flip (class->parameters (feature-kind/object->class
(feature->kind feature) object))))
!
(observe 'eggs 'breakfast)
https://probmods.org/non-parametric-models.html
Church example: Cross-categorization (BayesDB)
Churj?
!
Jurch?
How?
So Far
‣ Why
‣ What
‣ How
‣ When
What We Still Need
1. Basic CS: Improved compilers and run-times for more efficient
automatic inference
2. Tooling: Debuggers, optimizers, IDEs, visualization
3. Tribal knowledge: idioms, patterns, best practices
When?
When?
14
• Application
• Code Libraries
• Programming
Language
• Compiler
• Hardware
The Probabilistic Programming Revolution
• Model
• Model Libraries
• Probabilistic
Programming
Language
• Inference Engine
• Hardware
Traditional Programming Probabilistic Programming
Code models capture how the data was
generated using random variables to
represent uncertainty
Libraries contain common model
components: Markov chains, deep
belief networks, etc.
PPL provides probabilistic primitives &
traditional PL constructs so users can
express model, queries, and data
Inference engine analyzes probabilistic
program and chooses appropriate
solver(s) for available hardware
Hardware can include multi-core, GPU,
cloud-based resources, GraphLab,
UPSIDE/Analog Logic results, etc.
High-level programming languages facilitate building complex systems
Probabilistic programming languages facilitate building rich ML applications
Approved for Public Release; Distribution Unlimited
15
• Shorter: Reduce LOC by 100x for machine learning applications
• Seismic Monitoring: 28K LOC in C vs. 25 LOC in BLOG
• Microsoft MatchBox: 15K LOC in C# vs. 300 LOC in Fun
• Faster: Reduce development time by 100x
• Seismic Monitoring: Several years vs. 1 hour
• Microsoft TrueSkill: Six months for competent developer vs. 2 hours with Infer.Net
• Enable quick exploration of many models
• More Informative: Develop models that are 10x more sophisticated
• Enable surprising, new applications
• Incorporate rich domain-knowledge
• Produce more accurate answers
• Require less data
• Increase robustness with respect to noise
• Increase ability to cope with contradiction
• With less expertise: Enable 100x more programmers
• Separate the model (the program) from the solvers (the compiler),
enabling domain experts without machine learning PhDs to write applications
The Promise of Probabilistic Programming Languages
Probabilistic Programming could empower domain experts and ML experts
Sources:
• Bayesian Data Analysis, Gelman, 2003
• Pattern Recognition and Machine Learning,
Bishop, 2007
• Science, Tanenbaum et al, 2011
DISTRIBUTION STATEMENT F. Further dissemination only as directed by DARPA, (February 20, 2013) or higher DoD authority.
Optimizer
“What is happening
when I run this?”
Profiler
“Where is the
time and memory
being used?”
Debugger
“What is the exact
state of my program at
each point in time?”
Visualization
“What is the hidden
structure of my data,
and how certain
should I be?”
http://www.icg.tugraz.at/project/caleydo/
Probabilistic Programming Workflows?
ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
SAS, R, etc. → PMML
business logic in Java,
Clojure, Scala, etc.
sink taps for
Memcached, HBase,
MongoDB, etc.
source taps for
Cassandra, JDBC,
Splunk, etc.
Definition: Data Workflows	

For example, Cascading and related projects implement the following
components, based on 100% open source:
cascading.org
adapted from
Paco Nathan:
Data Workflows
for Machine
Learning
Evolution of PPSs
When?
Bottom Line
‣ Go experiment and learn! - there are several good options
‣ But be realistic about the current state of the art
‣ And keep your ear to the ground - this area is moving fast
Parting Questions
‣ Which projects are good fits for probabilistic programming today?
‣ Exploration and prototyping vs. scaled production deployment?
‣ How long before we have the Python, Ruby, and even PHP of PPSs?
‣ Is there a unification with the log-centric view of big data processing?
‣ Can natively stochastic hardware provide compelling performance
gains?
When?
Resources
‣ probabilistic-programming.org
‣ Probabilistic Programming and Bayesian Methods for Hackers
‣ Probabilistic Models of Cognition
‣ Mathematica Journal article
‣ Thomas Wiecki’s PyData talk on PyMC
People To Watch
Vikash Mansinghka (MIT)
!
Noah Goodman (Stanford)
!
David Wingate (Lyric Labs)
!
Avi Pfeffer (CRA)
Rob Zinkov (USC)
!
Andrew Gordon (MSR)
!
John Winn (MSR)
!
Dan Roy (Cambridge)
Languages and Systems
‣ PyMC
‣ infer.net
‣ STAN
‣ Figaro
!
‣ BLOG
‣ Church
‣ factor.ie
‣ BUGS / JAGS
@beaucronin

Probabilistic Programming: Why, What, How, When?

  • 1.
    Probabilistic Programming: Why, What,How, When Beau Cronin @beaucronin
  • 2.
    40 Action-Packed Minutes ‣Why you should care - what’s wrong with what we’ve got? ‣ What probabilistic programming is, and what programs look like ‣ How you can get started today ‣ When will all of this be ready for production use?
  • 3.
  • 4.
    We use datato learn about the world Traditional! Machine Learning Hierarchical Bayesian Modeling Large Scale Small Mature & Robust Tools & frameworks Immature & Spotty Discard Structure & Knowledge Keep & Leverage Homogeneous Data Types Heterogeneous Toolkit, Theory-light Philosophical Approach Modeling, Theory-heavy Why?
  • 5.
    G = {V,E} What order were these links added in? What messages flow over this link? What do we know about this user? Why?
  • 6.
    x1 x2 lat1long1 t1 t2 t3 t4 address1 1 1.2 2 34.0 118.2 2.3 3.4 1.9 10.4 516 61st St, 2 0.1 1 40.7 73.9 -1.5 4.5 8.9 2305 Tustin 3 10.5 0 37.9 122.3 4.7 -2.5 -3.4 1 Market St. 4 8.3 -1 -22.9 43.2 4.2 5.6 1.6 9.5 5 4.9 5 -37.8 -145.0 1600 Pennsyl 6 1.5 1 3.4 4.0 4.6 5.2 650 7th St., S Positive numbers Categorical values Locations Time Series AddressesMissing values Why?
  • 7.
    Diverse Data Most realdatasets contain compositions of these and more, but we routinely homogenize in preprocessing Lorem Ipsum Trees & Graphs Time Series Relations Locations & Addresses Images & Movies Audio Sets & Partitions Text Why?
  • 8.
    Business Data IsHeterogeneous and Structured id: “abcdef” gender: “Male” dob: 1978-12-09 twitter_id: 9458201 Profile 2014-01-21 18:41:04, “https://devcenter.heroku.com/articles/quickstart”, … 2014-01-20 12:35:56, “https://devcenter.heroku.com/categories/java”, … 2014-01-20 09:12:52, “https://devcenter.heroku.com/articles/ssl-endpoint”, … Page Views Order Date Order ID Title Category ASIN/ISBN Release DateConditionSeller Per Unit Price 1/5/13 002-1139353-0278652 Under Armour Men's Resistor No Show Socks,pack of 6 SocksApparel B003RYQJJW new The Sock Company, Inc.$21.99 1/5/13 002-1139353-0278652 Under Armour Men's Resistor No Show Socks,pack of 6 SocksApparel B004UONNXI new The Sock Company, Inc.$21.99 1/8/13 002-2593752-8837806 CivilWarLand in Bad DeclinePaperback 1573225797 1/31/97 new Amazon.com LLC $8.4 1/8/13 109-0985451-2187421 Nothing to Envy: Ordinary Lives in North KoreaPaperback 385523912 9/20/10 new Amazon.com LLC$10.88 1/12/13 109-8581642-2322617 Excession Mass Market Paperback553575376 2/1/98 new Amazon.com LLC $7.99 Transactions [ { text: “key to compelling VR is…”, retweet_count: 3, favorites_count: 5, urls: [ ], hashtags: [ ], in_reply_to: 39823792801012 … }, { text: “@John4man really liked your piece”, retweets: 0, favorites: 0, … } ] Social Posts [ 657693, 7588892, 9019482, …] Followers blocked: False want_retweets: True marked_spam: False since: 2013-09-13 Relationship
  • 9.
    Every Domain IsHeterogeneous ‣ Health data: doctor notes, lab results, imaging, family history, prescriptions ‣ Quantified self: motion sensors, heart rate, GPS tracks, self- reporting, sleep patterns ‣ Autonomous vehicles: LIDAR, cameras, maps, audio, gyros, telemetry, GPS Why?
  • 10.
    Mostly, no oneeven tries to jointly model these different kinds of data Why?
  • 11.
    A probabilistic programmingsystem is… a language + {compiler, interpreter} or that a {library, framework} for an existing language - includes random choices as native elements - and provides a clean separation between probabilistic modeling and inference - and may provide automated generation of inference solutions for a given program What?
  • 12.
    Probabilistic Programming Systems Modelthe World ‣ Programs directly represent the data generation process ‣ Measurement processes can be modeled directly, including their imperfections and the uncertainty that comes with them ‣ Philosophy ‣ DO: capture the essential aspects of real-world processes in a model ‣ DON’T: torture the data into the right form for an algorithm What?
  • 13.
    A Probability Model ✕N Fixed Observable Unknown Constant values and ! structural assumptions Variables that discriminate between hypotheses Data and potential data What?
  • 14.
    Obligatory Bayes’ Rule Pr(H| D, A) ∝ Pr(D | H, A) Pr(H | A) Data Hypotheses Pr(H | D) ∝ Pr(D | H) Pr(H) Assumptions What?
  • 15.
    ! ! ! fair-prior = .999 ! fair-coin?= flip(fair-prior) ! if fair-coin?: weight = 0.5 else: weight = 0.9 ! observe(repeat(flip(weight), 10)), [H, H, H, H, H, H, H, H, H, H]) ! query(fair-coin?) First example: Deciding if a coin is fair based on flips Assumptions ! Unknowns ! Observables
  • 16.
    Probabilistic Programming Systems AreDiverse ‣ Library vs. stand-alone language ‣ Base language: Scala, Lisp, Python ‣ Manual, semi-, or fully-automated inference ‣ Modeling domain: directed/undirected graphical models, relational data, all programs ‣ Home field: cognitive science, programming languages, databases, Bayesian statistics, artificial intelligence What?
  • 17.
    PPSs Compared Type LanguageInference BLOG Stand-alone Custom Fully Auto BUGS / JAGS Stand-alone Custom Fully Auto STAN Hybrid R, Python Fully Auto PyMC Library Python Manual Infer.net Library C# Semi-auto Church Stand-alone Lisp Fully Auto Venture Stand-alone Javascript, Lisp Semi-auto Figaro Library Scala Semi-auto factorie Library Scala Semi-auto What?
  • 18.
    infer.net ‣ A C#framework (also F#) ‣ Developed at MSR ‣ Under active development, with good tutorials and many well- documented examples How?
  • 19.
    VariableArray<bool> controlGroup = Variable.Observed(newbool[] { false, false, true, false, false }); VariableArray<bool> treatedGroup = Variable.Observed(new bool[] { true, false, true, true, true }); Range i = controlGroup.Range; Range j = treatedGroup.Range; ! Variable<bool> isEffective = Variable.Bernoulli(0.5); ! Variable<double> probIfTreated, probIfControl; using (Variable.If(isEffective)) { // Model if treatment is effective probIfControl = Variable.Beta(1, 1); controlGroup[i] = Variable.Bernoulli(probIfControl).ForEach(i); probIfTreated = Variable.Beta(1, 1); treatedGroup[j] = Variable.Bernoulli(probIfTreated).ForEach(j); } ! using (Variable.IfNot(isEffective)) { // Model if treatment is not effective Variable<double> probAll = Variable.Beta(1, 1); controlGroup[i] = Variable.Bernoulli(probAll).ForEach(i); treatedGroup[j] = Variable.Bernoulli(probAll).ForEach(j); } ! InferenceEngine ie = new InferenceEngine(); Console.WriteLine("Probability treatment has an effect = " + ie.Infer(isEffective)); Infer.net example: Is a new treatment effective? http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Clinical%20trial%20tutorial.aspx Observations Unknown Assumptions & Unknowns Query
  • 20.
    PyMC ‣ Python (duh) ‣Go watch Thomas Wiecki’s talk from PyData NY ‣ http://twiecki.github.io/blog/2013/12/12/bayesian-data-analysis-pymc3/ ‣ And read Bayesian Methods for Hackers by Cam Davidson-Pilon et al. How?
  • 21.
    Church ‣ A Lisp ‣Originally created to model cognitive development and human reasoning ‣ Active inference research, several implementations ‣ Connection between functional purity / independence vs. stochastic memoization / exchangeability ‣ Hypothesis space is possible program executions ‣ “Probabilistic Models of Cognition” How?
  • 22.
    ;stochastic memoization generatorfor class assignments ;sometimes return a previous symbol, sometimes create a new one (define class-distribution (DP-stochastic-mem 1.0 gensym)) ! ;associate a class with an object via memoization (define object->class (mem (lambda (object) (class-distribution)))) ! ;associate gaussian parameters with a class via memoization (define class->gaussian-parameters (mem (lambda (class) (list (gaussian 65 10) (gaussian 0 8))))) ! ;generate observed values for an object (define (observe object) (apply gaussian (class->gaussian-parameters (object->class object)))) ! ;generate observations for some objects (map observe '(tom dick harry bill fred)) modified from https://probmods.org/non-parametric-models.html Church example: Infinite Gaussian Mixture Model
  • 23.
    (define kind-distribution (DPmem1.0 gensym)) ! (define feature->kind (mem (lambda (feature) (kind-distribution)))) ! (define kind->class-distribution (mem (lambda (kind) (DPmem 1.0 gensym)))) ! (define feature-kind/object->class (mem (lambda (kind object) (sample (kind->class-distribution kind))))) ! (define class->parameters (mem (lambda (object-class) (first (beta 1 1))))) ! (define (observe object feature) (flip (class->parameters (feature-kind/object->class (feature->kind feature) object)))) ! (observe 'eggs 'breakfast) https://probmods.org/non-parametric-models.html Church example: Cross-categorization (BayesDB)
  • 24.
  • 25.
    So Far ‣ Why ‣What ‣ How ‣ When
  • 26.
    What We StillNeed 1. Basic CS: Improved compilers and run-times for more efficient automatic inference 2. Tooling: Debuggers, optimizers, IDEs, visualization 3. Tribal knowledge: idioms, patterns, best practices When?
  • 27.
  • 28.
    14 • Application • CodeLibraries • Programming Language • Compiler • Hardware The Probabilistic Programming Revolution • Model • Model Libraries • Probabilistic Programming Language • Inference Engine • Hardware Traditional Programming Probabilistic Programming Code models capture how the data was generated using random variables to represent uncertainty Libraries contain common model components: Markov chains, deep belief networks, etc. PPL provides probabilistic primitives & traditional PL constructs so users can express model, queries, and data Inference engine analyzes probabilistic program and chooses appropriate solver(s) for available hardware Hardware can include multi-core, GPU, cloud-based resources, GraphLab, UPSIDE/Analog Logic results, etc. High-level programming languages facilitate building complex systems Probabilistic programming languages facilitate building rich ML applications Approved for Public Release; Distribution Unlimited
  • 29.
    15 • Shorter: ReduceLOC by 100x for machine learning applications • Seismic Monitoring: 28K LOC in C vs. 25 LOC in BLOG • Microsoft MatchBox: 15K LOC in C# vs. 300 LOC in Fun • Faster: Reduce development time by 100x • Seismic Monitoring: Several years vs. 1 hour • Microsoft TrueSkill: Six months for competent developer vs. 2 hours with Infer.Net • Enable quick exploration of many models • More Informative: Develop models that are 10x more sophisticated • Enable surprising, new applications • Incorporate rich domain-knowledge • Produce more accurate answers • Require less data • Increase robustness with respect to noise • Increase ability to cope with contradiction • With less expertise: Enable 100x more programmers • Separate the model (the program) from the solvers (the compiler), enabling domain experts without machine learning PhDs to write applications The Promise of Probabilistic Programming Languages Probabilistic Programming could empower domain experts and ML experts Sources: • Bayesian Data Analysis, Gelman, 2003 • Pattern Recognition and Machine Learning, Bishop, 2007 • Science, Tanenbaum et al, 2011 DISTRIBUTION STATEMENT F. Further dissemination only as directed by DARPA, (February 20, 2013) or higher DoD authority.
  • 30.
  • 31.
    Profiler “Where is the timeand memory being used?”
  • 32.
    Debugger “What is theexact state of my program at each point in time?”
  • 33.
    Visualization “What is thehidden structure of my data, and how certain should I be?” http://www.icg.tugraz.at/project/caleydo/
  • 34.
    Probabilistic Programming Workflows? ETL data prep predictive model data sources end uses Lingual: DW→ ANSI SQL Pattern: SAS, R, etc. → PMML business logic in Java, Clojure, Scala, etc. sink taps for Memcached, HBase, MongoDB, etc. source taps for Cassandra, JDBC, Splunk, etc. Definition: Data Workflows For example, Cascading and related projects implement the following components, based on 100% open source: cascading.org adapted from Paco Nathan: Data Workflows for Machine Learning
  • 35.
  • 36.
    Bottom Line ‣ Goexperiment and learn! - there are several good options ‣ But be realistic about the current state of the art ‣ And keep your ear to the ground - this area is moving fast
  • 37.
    Parting Questions ‣ Whichprojects are good fits for probabilistic programming today? ‣ Exploration and prototyping vs. scaled production deployment? ‣ How long before we have the Python, Ruby, and even PHP of PPSs? ‣ Is there a unification with the log-centric view of big data processing? ‣ Can natively stochastic hardware provide compelling performance gains? When?
  • 38.
    Resources ‣ probabilistic-programming.org ‣ ProbabilisticProgramming and Bayesian Methods for Hackers ‣ Probabilistic Models of Cognition ‣ Mathematica Journal article ‣ Thomas Wiecki’s PyData talk on PyMC
  • 39.
    People To Watch VikashMansinghka (MIT) ! Noah Goodman (Stanford) ! David Wingate (Lyric Labs) ! Avi Pfeffer (CRA) Rob Zinkov (USC) ! Andrew Gordon (MSR) ! John Winn (MSR) ! Dan Roy (Cambridge)
  • 40.
    Languages and Systems ‣PyMC ‣ infer.net ‣ STAN ‣ Figaro ! ‣ BLOG ‣ Church ‣ factor.ie ‣ BUGS / JAGS
  • 41.