Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BAYESIAN NETWORK MODELING
USING PYTHON AND R
PRAGYANSMITA NAYAK, PH.D.
@SORISHAPRAGYAN
HTTPS://GITHUB.COM/PRAGYANSMITA
OCT...
ABSTRACT
Bayesian Networks are increasingly being applied for real-world data
problems. They provide the much desired comp...
AGENDA
BN
• Applications of Bayesian Network
• Bayes Law and Bayesian Network
Python
• BN ecosystem in Python
R
• BN ecosy...
APPLICATIONS OF BAYESIAN NETWORK
In the early morning of June 1, 2009, Air
France Flight AF 447, carrying 228
passengers a...
VARIED DOMAINS …
• Nate Silver (FiveThirtyEight) and Drew Linzer (Votamatic) used
it to predict correctly the poll outcome...
PYTHON, R AND BAYESIAN NETWORK
• Python
• NumPy
• SciPy
• BayesPy
• Bayes Blocks
• PyMC
• Stan
• OpenBUGS
• BNFinder
• …
•...
BAYES LAW
•Probability theory and Statistics
•Bayes [Theorem | Law | Rule]
| 	∗ 	
•Based on evidence, estimate a “degree
o...
IT INVOLVES …
•Specifying a prior probability can be tricky!
• The subjective element
• Also known as “Reference Class Pro...
Predicting future based on past
observations lends itself naturally to
applications requiring predictive
analytics
 “Beli...
“FREQUENTIST” PARAMETRIC APPROACH
•Contrast “Bayesian” with the alternative “Frequentist”
parametric approach that,
• Mode...
PRIOR, POSTERIOR AND LIKELIHOOD
| 	∗ 	
⇒	
| 	
	∗ 	
⇒
| 	
	∗
Element Terminology
P(A) Prior
P(A|B) Posterior
| 	 Support of...
PRIOR, POSTERIOR AND LIKELIHOOD
Replacing	
• A	with	H	 the	hypothesis
• B	with	O	 the	observed	data
| 	∗ 	
Since Probabili...
EXAMPLE OUTPUT – PHOTOMORPHIC REDSHIFT
ESTIMATION
most probable a posteriori (MAP)
Predicted Value
ConfidenceofPredictedVa...
BAYESIAN “BELIEF” NETWORK (BBN)
Pioneered by Judea Pearl (2011 ACM Turing Award)
Graphical models to represent and
approxi...
NAÏVE BAYES – A SPECIAL CASE
• The classification node (*)
is the parent node of all
the other nodes.
• Assumes all the fe...
NUMERIC DATA?
•Discretization methods
• Numeric data ⇒	Categorical data
• Quantile-based results in equal number of points...
MODEL EVALUATION AND USE
•Characteristics of the network
•Combination of multiple generated networks
•Intelligent aid in f...
Bayesian
Networks
Naïve Bayes
Selective
Naïve Bayes
Semi-Naïve
Bayes
1- or k-
dependence
Bayesian
classifiers
(Tree)
Marko...
ALGORITHM CATEGORIES
• Constraint-based
• Estimate from the data whether conditional independence between the variables ho...
AGENDA SO FAR …
BN
• Graphical Bayesian “Belief” Network (BBN)
• Prior, Likelihood and Posterior
Python
• BN ecosystem in ...
PYTHON, R AND BAYESIAN NETWORK
(SEEN EARLIER IN THE PRESENTATION)
• Python
• NumPy
• SciPy
• BayesPy
• Bayes Blocks
• PyMC...
PYTHON: SCIKIT-LEARN,
NUMPY, MATPLOTLIB
#conda install scikit-learn
from sklearn.naive_bayes import GaussianNB
import nump...
PYTHON: SCIKIT-LEARN
Gaussian Mixture Model Ellipsoids
# Fit a Gaussian mixture with EM using five components
gmm = mixtur...
PYTHON: BAYESPY
• Only variational Bayesian inference for
conjugate-exponential family (variational
message passing) has b...
PYTHON: BAYES BLOCK (1/2)
• C++/Python implementation
• http://research.ics.aalto.fi/bayes/software/examples.py
• Standard...
PYTHON: BAYES BLOCK (2/2)
# Generate the data
data = 1.0 + exp(-0.5) * randn(1000)
# Construct the model
net = PyNet(1000)...
R: CRAN TASK VIEW FOR BAYESIAN
PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 27
R: BNLEARN HILL-CLIMBLING ALGORITHM
(SCORING ALGORITHM) (MODELING PHASE)
# Split the available data into training (2/3rd) ...
R: BNLEARN HILL-CLIMBLING ALGORITHM
(SCORING ALGORITHM) (INFERENCE PHASE)
for (obs in 1:nrow(tsData) ) {
str1 <- paste("("...
R: BAYESIANNETWORK R SHINY APP
PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 30
STAN
PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 31
http://mc-stan.org/
SHARING EXPERIENCES …
• Review package documentation to understand its supported functionalities and thus,
determine it’s ...
https://en.wikipedia.org/wiki/Bayes%27_theorem
Questions?
Thank you
for your
time!
PyDataDC 10/8/2016BAYESIAN NETWORK MODE...
REFERENCES
• Thomas Bayes Biography, Link: http://www-history.mcs.st-and.ac.uk/Biographies/Bayes.html (accessed 10/2/2016)...
Upcoming SlideShare
Loading in …5
×

Bayesian Network Modeling using Python and R

14,809 views

Published on

Pragyansmita Nayak

Published in: Technology
  • If only we knew about this 10 years ago! I wasted a ton of money on garbage 'stop snoring' products like mouth guards, throat sprays, lozenges and nasal strips, to name just a few! None of them worked. My doctor explained to me that the only way I was going to fix my snoring was with an operation, although he did say it was a last resort. I am so glad I didn't risk it because after finding your program my snoring has considerably decreased! If only I knew about this 10 years ago! ■■■ http://ishbv.com/snoringno/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sports Insiders secret ZCode software beats the bookies? ●●● http://ishbv.com/zcodesys/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Bayesian Network Modeling using Python and R

  1. 1. BAYESIAN NETWORK MODELING USING PYTHON AND R PRAGYANSMITA NAYAK, PH.D. @SORISHAPRAGYAN HTTPS://GITHUB.COM/PRAGYANSMITA OCT 8TH, 2016
  2. 2. ABSTRACT Bayesian Networks are increasingly being applied for real-world data problems. They provide the much desired complexity in representing the uncertainty of the predicted results of a model. The networks are easy to follow and better understand the inter-relationships of the different attributes of the dataset. As part of this talk, we will look into the existing R and Python packages that enable BN learning and prediction. The pros and cons of the available packages will be discussed as well as new capabilities that will broaden the application of BN networks. PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 2
  3. 3. AGENDA BN • Applications of Bayesian Network • Bayes Law and Bayesian Network Python • BN ecosystem in Python R • BN ecosystem in R PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 3
  4. 4. APPLICATIONS OF BAYESIAN NETWORK In the early morning of June 1, 2009, Air France Flight AF 447, carrying 228 passengers and crew, disappeared over a remote section of the Atlantic Ocean. PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 4
  5. 5. VARIED DOMAINS … • Nate Silver (FiveThirtyEight) and Drew Linzer (Votamatic) used it to predict correctly the poll outcome of every state in the 2012 U.S. Presidential election • Embedded in Microsoft Office products • Discovering relations between genes, environment, and disease for a population-based study of bladder cancer in New Hampshire, USA • Short-term solar flare level prediction PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 5
  6. 6. PYTHON, R AND BAYESIAN NETWORK • Python • NumPy • SciPy • BayesPy • Bayes Blocks • PyMC • Stan • OpenBUGS • BNFinder • … • R • Bnlearn • BayesianNetwork (Shiny App for bnlearn) • RStan • R2WinBUGS (Bayesian Inference Using Gibbs Sampling) • Rjags JAGS (Just Another Gibbs Sampler) • BayesAB • … PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 6
  7. 7. BAYES LAW •Probability theory and Statistics •Bayes [Theorem | Law | Rule] | ∗ •Based on evidence, estimate a “degree of belief” for the possible outcomes PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 7
  8. 8. IT INVOLVES … •Specifying a prior probability can be tricky! • The subjective element • Also known as “Reference Class Problem” RepresentsEvidence “Prior” Used For Belief “Likelihood” Potential Outcomes Prediction “Posterior” PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 8
  9. 9. Predicting future based on past observations lends itself naturally to applications requiring predictive analytics  “Belief is updated with new evidence” PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 9
  10. 10. “FREQUENTIST” PARAMETRIC APPROACH •Contrast “Bayesian” with the alternative “Frequentist” parametric approach that, • Models the observations as a physical system whose measurements are not exact and thus, needs to account for error • Probability of an event is its relative frequency over time - “proportion of outcome” PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 10
  11. 11. PRIOR, POSTERIOR AND LIKELIHOOD | ∗ ⇒ | ∗ ⇒ | ∗ Element Terminology P(A) Prior P(A|B) Posterior | Support of B for A PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 11
  12. 12. PRIOR, POSTERIOR AND LIKELIHOOD Replacing • A with H the hypothesis • B with O the observed data | ∗ Since Probability of Observed data 1, ⇒ | ∗ ⇒ ∗ Element Terminology P(H) Prior P(H|O) Posterior P(O|H) Likelihood PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 12
  13. 13. EXAMPLE OUTPUT – PHOTOMORPHIC REDSHIFT ESTIMATION most probable a posteriori (MAP) Predicted Value ConfidenceofPredictedValue PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 13
  14. 14. BAYESIAN “BELIEF” NETWORK (BBN) Pioneered by Judea Pearl (2011 ACM Turing Award) Graphical models to represent and approximate acyclic relationships between the different subsets of variables • Inter-connections represents the dependencies among the set of variables * PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 14
  15. 15. NAÏVE BAYES – A SPECIAL CASE • The classification node (*) is the parent node of all the other nodes. • Assumes all the features are independent of each other. * PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 15
  16. 16. NUMERIC DATA? •Discretization methods • Numeric data ⇒ Categorical data • Quantile-based results in equal number of points per category. •Number of data breaks crucial for the right-fit of the model while avoiding an overfit. PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 16
  17. 17. MODEL EVALUATION AND USE •Characteristics of the network •Combination of multiple generated networks •Intelligent aid in fixing missing data •Predictive accuracy •Execution time PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 17
  18. 18. Bayesian Networks Naïve Bayes Selective Naïve Bayes Semi-Naïve Bayes 1- or k- dependence Bayesian classifiers (Tree) Markov blanket-based Bayesian multinets PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 18
  19. 19. ALGORITHM CATEGORIES • Constraint-based • Estimate from the data whether conditional independence between the variables hold via statistical or information theoretic measures. • More efficient than score-based approaches, especially when the number of samples is large • Scoring-based • Identify the network that maximizes a score function indicating how well the network fits the data. • Greedy, local, or heuristic search techniques, such as hill-climbing or simulated annealing • Hybrid methods PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 19
  20. 20. AGENDA SO FAR … BN • Graphical Bayesian “Belief” Network (BBN) • Prior, Likelihood and Posterior Python • BN ecosystem in Python R • BN ecosystem in R PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 20
  21. 21. PYTHON, R AND BAYESIAN NETWORK (SEEN EARLIER IN THE PRESENTATION) • Python • NumPy • SciPy • BayesPy • Bayes Blocks • PyMC • Stan • OpenBUGS • BNFinder • … • R • Bnlearn • BayesianNetwork (Shiny App for bnlearn) • RStan • R2WinBUGS (Bayesian Inference Using Gibbs Sampling) • Rjags JAGS (Just Another Gibbs Sampler) • BayesAB • … PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 21
  22. 22. PYTHON: SCIKIT-LEARN, NUMPY, MATPLOTLIB #conda install scikit-learn from sklearn.naive_bayes import GaussianNB import numpy as np x= np.array([[-3,7],[1,5], [1,2], [-2,0], [2,3], [-4,0], [-1,1], [1,1], [-2,2], [2,7], [-4,1], [-2,7]]) y = np.array(['Y', 'N', 'Y', 'Y', 'Y', 'N', 'Y', 'Y', 'N', 'N', 'Y', 'Y']) #Create a Gaussian Classifier model = GaussianNB() # Train the model using the training sets model.fit(x, y) #Predict Output predicted= model.predict([[1,2],[3,4]]) http://scikit-learn.org/stable/index.html PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 22
  23. 23. PYTHON: SCIKIT-LEARN Gaussian Mixture Model Ellipsoids # Fit a Gaussian mixture with EM using five components gmm = mixture.GaussianMixture(n_components=5, covariance_type='full').fit(X) plot_results(X, gmm.predict(X), gmm.means_, gmm.covariances_, 0, 'Gaussian Mixture') # Fit a Dirichlet process Gaussian mixture using five components dpgmm = mixture.BayesianGaussianMixture(n_components=5, covariance_type='full').fit(X) plot_results(X, dpgmm.predict(X), dpgmm.means_, dpgmm.covariances_, 1, 'Bayesian Gaussian Mixture with a Dirichlet process prior') PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 23
  24. 24. PYTHON: BAYESPY • Only variational Bayesian inference for conjugate-exponential family (variational message passing) has been implemented • Python 3 • http://www.bayespy.org/ PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 24
  25. 25. PYTHON: BAYES BLOCK (1/2) • C++/Python implementation • http://research.ics.aalto.fi/bayes/software/examples.py • Standardised building blocks designed to be used with variational Bayesian learning (data + latent parameters) • “The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including nonlinear and variance models” - http://jmlr.csail.mit.edu/papers/v8/raiko07a.html PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 25
  26. 26. PYTHON: BAYES BLOCK (2/2) # Generate the data data = 1.0 + exp(-0.5) * randn(1000) # Construct the model net = PyNet(1000) f = PyNodeFactory(net) c0 = f.GetConstant("const+0", 0.0) cm5 = f.GetConstant("const-5", -5.0) m = f.GetGaussian("m", c0, cm5) v = f.GetGaussian("v", c0, cm5) x = f.GetGaussianV("x", m, v) # Learn the model x.Clamp(data) for i in range(5): net.UpdateAll() print "%i : %f" % (i, -net.Cost()) # Print the results print "mean = %f +- %fn var = %f +- %f" % ( GetMean(m), GetVar(m), GetMean(v), GetVar(v)) PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 26
  27. 27. R: CRAN TASK VIEW FOR BAYESIAN PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 27
  28. 28. R: BNLEARN HILL-CLIMBLING ALGORITHM (SCORING ALGORITHM) (MODELING PHASE) # Split the available data into training (2/3rd) and test subset (1/3rd) retVal <- splitTrainTest(myData, 0.67) # training = 2/3, test = 1/3 # Build 200 networks using Hill-Climbing “hc” Score-based Algorithm boot.hc.q.col <- boot.strength(data=trData, R=200, algorithm="hc",algorithm.args=list(score="bde")) # Retain the connections that appear in 85% of the generated networks avg.boot.hc.q.col <- averaged.network(boot.hc.q.col, threshold=0.85) # Generate the joint probability distribution fitted.boot.hc.q.col <- bn.fit(cextend(avg.boot.hc.q.col), data=trData) # Review the resulting graphical network p <- graphviz.plot(fitted.boot.hc.q.col, highlight = hlz, layout="fdp") PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 28
  29. 29. R: BNLEARN HILL-CLIMBLING ALGORITHM (SCORING ALGORITHM) (INFERENCE PHASE) for (obs in 1:nrow(tsData) ) { str1 <- paste("(", names(tsData)[-zIndex], "=='", sapply(tsData[obs,-zIndex], as.character), "')", sep = "", collapse = " & ") pTable <- prop.table(table(cpdist(fitted.boot.hc.q.col, nodes="z2_f", evidence=eval(parse(text = str1))))) } # Which outcome has max probability? Zest[obs] <- (as.numeric(names(which.max(pTable)))) Zcertainty[obs] <- max(pTable) } # What is the max probability value? (psfMag_u=='(21.4,21.7]') & (psfMag_g=='(20,20.3]') & (psfMag_r=='(19.2,19.4]') & … PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 29
  30. 30. R: BAYESIANNETWORK R SHINY APP PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 30
  31. 31. STAN PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 31 http://mc-stan.org/
  32. 32. SHARING EXPERIENCES … • Review package documentation to understand its supported functionalities and thus, determine it’s relevance to your problem statement • Important to be aware of a packages support for Python 2 vs. Python 3 • “Quick experiment in R, implement in Python” – depends on use-case • R Shiny application for ease of experiments – particularly for tuning of parameters with visualization to guide the process • Look for existing code examples before re-inventing the wheel – github, CRAN – atleast Google! • Keep experimenting … the best way to learn! PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 32
  33. 33. https://en.wikipedia.org/wiki/Bayes%27_theorem Questions? Thank you for your time! PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 33
  34. 34. REFERENCES • Thomas Bayes Biography, Link: http://www-history.mcs.st-and.ac.uk/Biographies/Bayes.html (accessed 10/2/2016) • Lawrence D. Stone, "In Search of Air France Flight 447", informs, Vol 38, No.4, Link: https://www.informs.org/ORMS-Today/Public- Articles/August-Volume-38-Number-4/In-Search-of-Air-France-Flight-447 (accessed 10/7/2016) • FiveThirtyEight, Link: http://fivethirtyeight.com/ (accessed 10/7/2016) • Votamatic, Link: http://votamatic.org/ (accessed 10/7/2016) • CRAN Task View for Bayesian, Link: https://cran.r-project.org/web/views/Bayesian.html (accessed 10/7/2016) • Statisticat LLC, “Bayesian Inference”, Link: https://cran.r-project.org/web/packages/LaplacesDemon/vignettes/BayesianInference.pdf (accessed 10/7/2016) • Python tool BNFinder, https://launchpad.net/bnfinder (accessed 10/6/2016) • Joseph Rickert, “R and Bayesian Statistics”, R-bloggers, 2013, Link: https://www.r-bloggers.com/r-and-bayesian-statistics/ (accessed 10/6/2016) • Scikit-learn example, Gaussian Mixture Model Ellipsoids, Link: http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm.html (accessed 10/6/2016) • Stan, http://mc-stan.org/ (accessed 10/6/2016) • PyMC, http://pymc-devs.github.io/pymc/index.html (accessed 10/6/2016) PyDataDC 10/8/2016BAYESIAN NETWORK MODELING USING PYTHON AND R 34

×