Bayesian Models in R 10/3/14, 13:37 
Bayesian Models in R 
Vivian Zhang | SupStat Inc. 
Copyright SupStat Inc., All rights reserved 
http://docs.supstat.com/BayesianModelEN/#1 Page 1 of 53
Bayesian Models in R 10/3/14, 13:37 
Outline 
1. Introduction to Bayes and Bayes' Theorem 
2. Distribution estimation 
3. Conditional probability 
4. Bayesian models 
2/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 2 of 53
Bayesian Models in R 10/3/14, 13:37 
Introduction to Bayes and 
Bayes' Theorem 
3/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 3 of 53
Bayesian Models in R 10/3/14, 13:37 
The*story*behind*the*Bayesian*model 
Thomas Bayes 
18th century English statistician 
Most known for the Bayes Theorem 
Essential contributor to early development of probability theory 
· 
· 
· 
Source: http://www.bioquest.org/products/auth_images/422_bayes.gif 
4/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 4 of 53
Bayesian Models in R 10/3/14, 13:37 
The*Model 
1. Models using Bayes' theorem (based on conditional probablity 
· Naive Bayes, Association Rules 
2. Bayes Decision Theory 
· Classical Bayesian model for Decision Theory 
3. Models implementing Bayesian thinking 
· Treat all the parameter as random variables, especially in hierarchical models 
5/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 5 of 53
Bayesian Models in R 10/3/14, 13:37 
Distribution Estimation 
6/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 6 of 53
Bayesian Models in R 10/3/14, 13:37 
Distribu6on*Es6ma6on 
Probablity Density Function 
In statistics, the Probablity Density Function (PDF) of a continous random variable is an output 
discribing this variable, which means the probability around a certain point. 
Example: plot of PDF of the Normal distribution 
· 
· 
7/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 7 of 53
Bayesian Models in R 10/3/14, 13:37 
Distribu6on*Es6ma6on 
Probablity Density Function 
The PDF has an important place in statistics 
- It contains all the information in the random variable 
Knowing the PDF, we can calculate the 
· 
· 
Mean 
Variance 
Median 
etc. 
- 
- 
- 
- 
8/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 8 of 53
Bayesian Models in R 10/3/14, 13:37 
Distribu6on*Es6ma6on 
Probablity Density Function 
Obtain the PDF, get everything from a random variable. This allows you to perform: 
Bayesian Hypothesis Tests 
Bayesian Interval Estimation 
Bayesian Regression Models 
Bayesian Logistic Models 
etc. 
· 
· 
· 
· 
· 
9/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 9 of 53
Bayesian Models in R 10/3/14, 13:37 
Distribu6on*Es6ma6on 
Probablity Density Function 
ExampleBayesian Regression: 
Y = Xβ + ϵ, ϵ ∼ N(0, σ2 ) 
Estimation methods for the regression model 
· 
· 
- 
- β ∼ N((X′ X)−1X′ Y, (X′ X)−1 ) 
- = ( X Y βˆ 
OLS (Ordinary Least Squres) 
X′ )−1X′ is the estimator of 
β 
10/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 10 of 53
Bayesian Models in R 10/3/14, 13:37 
Distribu6on*Es6ma6on 
The Bayesian Model 
Before obtaining data, one has beliefs about the value of the proportion and models his or her 
beliefs in terms of a prior distribution. 
After data have been observed, one updates one’s beliefs about the proportion by computing the 
posterior distribution. 
· 
· 
11/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 11 of 53
Bayesian Models in R 10/3/14, 13:37 
Distribu6on*Es6ma6on 
The Bayesian Model 
Building a Bayesian model begins with Bayesian Thinking (every value has its own distribution). 
Steps to build a Bayesian model: 
· 
· 
Make inferences about prior distribution 
Calculate the parameter of the posterior distribution 
Finish the statistical task (interval estimationstatistical decision, etc.) 
- 
- 
- 
12/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 12 of 53
Bayesian Models in R 10/3/14, 13:37 
Inferring*from*the*posterior*distribu6on 
Posterior inference is the core of Bayes' Theorem, because we do not actually know the 
population distribution which generated our data. We use the conditional distribution to address 
this gap indirectly. In this section, a certain degree of mathematical sophistication is required 
without which we cannot easily implement the model computationally. 
· 
Essentials: 
Bayes' theorem 
Conditional distribution 
- For example: ϵ in regression is from a normal distribution 
Certain prior distribution 
· 
· 
· 
- No information given 
13/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 13 of 53
Bayesian Models in R 10/3/14, 13:37 
Calcula6ng*the*posterior*distribu6on 
The most difficult part is calculating the posterior distribution, which requires integration. 
· Markov chain Monte Carlo (MCMC) 
Gibbs 
MH method 
- 
- 
14/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 14 of 53
Bayesian Models in R 10/3/14, 13:37 
Conditional probability 
15/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 15 of 53
Bayesian Models in R 10/3/14, 13:37 
Condi6onal*probability 
What is conditional probability? 
· A B 
P(A|B) 
The probablity that event will occur when event has occurred. This probability is written as 
. 
P(A|B) = 
P(AB) 
P(B) 
A and B are two events 
· P(AB) 
· P(B) 
is the probability that both A and B occur. 
is the probability that B occurs. 
· 
16/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 16 of 53
Bayesian Models in R 10/3/14, 13:37 
Condi6onal*probability 
Why conditional probability 
Example 
· Suppose 
A: The event of getting a cold 
B: The event of a rainy day (p = 0.2) 
AB: The event that when it rains you get a cold (p = 0.1) 
- 
- 
- 
P(AB) 
P(B) 
0.1 
0.2 
P(A|B) = = = 0.5 
· Interpretation: 
- When it rains, the probablity of getting a cold is 50% 
17/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 17 of 53
Bayesian Models in R 10/3/14, 13:37 
Condi6onal*probability 
Exercise 
· There are two kids in a family. 
If one of the kids is a boy, the probability that the other one is also a boy is... 
If the first one is a boy, the probability that the other one is a boy is... 
, 
- 
- 
- 23 
12 
18/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 18 of 53
Bayesian Models in R 10/3/14, 13:37 
Condi6onal*Probability 
The model relates to conditional probability 
· A priori 
Mining associated rules 
The association from A to B is defined as: 
- 
- 
P(AB) 
P(A) 
A = B : = P(B|A) 
· In R, use the arules package 
19/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 19 of 53
Bayesian Models in R 10/3/14, 13:37 
Condi6onal*Probability 
A priori 
Goal: find the items with strong relationships 
First, load the data: 
· 
· 
library(arules) 
data = read.csv(data/BASKETS1n) 
names(data) 
[1] cardid value pmethod sex homeown income 
[7] age fruitveg freshmeat dairy cannedveg cannedmeat 
[13] frozenmeal beer wine softdrink fish confectionery 
20/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 20 of 53
Bayesian Models in R 10/3/14, 13:37 
Condi6onal*Probability 
A priori 
basket = data[, 8:18] 
names(basket)[which(basket[1, ] == T)] 
[1] freshmeat dairy confectionery 
tbs2 = apply(basket, 1, function(x) names(basket)[which(x==T)]) 
len = sapply(tbs2, length) 
require(arules) 
trans.code = rep(1:1000, len) 
trans.items = unname(unlist(tbs2)) 
trans.code.ind = match(trans.code, unique(trans.code)) 
trans.items.ind = match(trans.items, unique(trans.items)) 
21/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 21 of 53
Bayesian Models in R 10/3/14, 13:37 
Condi6onal*Probability 
A priori 
mat = sparseMatrix(i = trans.items.ind, 
j = trans.code.ind, 
x = 1, 
dims = c(length(unique(trans.items)), 
length(unique(trans.code)))) 
mat = as(mat, 'ngCMatrix') 
#after setting the argument we get the model: 
trans.res = apriori(mat,parameter = list(confidence=0.05, 
support=0.05, 
minlen=2,maxlen=3)) 
22/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 22 of 53
Bayesian Models in R 10/3/14, 13:37 
Condi6onal*Probability 
A priori 
parameter specification: 
confidence minval smax arem aval originalSupport support minlen maxlen target ext 
0.05 0.1 1 none FALSE TRUE 0.05 2 3 rules FALSE 
algorithmic control: 
filter tree heap memopt load sort verbose 
0.1 TRUE TRUE FALSE TRUE 2 TRUE 
apriori - find association rules with the apriori algorithm 
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt 
set item appearances ...[0 item(s)] done [0.00s]. 
set transactions ...[11 item(s), 940 transaction(s)] done [0.00s]. 
sorting and recoding items ... [11 item(s)] done [0.00s]. 
creating transaction tree ... done [0.00s]. 
checking subsets of size 1 2 3 done [0.00s]. 
writing ... [108 rule(s)] done [0.00s]. 23/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 23 of 53
Bayesian Models in R 10/3/14, 13:37 
Condi6onal*Probability 
· At last, we have the items with the strongest relationship in one basket 
#let's see these rules: 
lhs.generic = unique(trans.items)[trans.res@lhs@data@i+1] 
rhs.generic = unique(trans.items)[trans.res@rhs@data@i+1] 
cbind(lhs.generic, rhs.generic)[1:10, ] 
lhs.generic rhs.generic 
[1,] dairy confectionery 
[2,] confectionery dairy 
[3,] dairy fish 
[4,] fish dairy 
[5,] dairy fruitveg 
[6,] fruitveg dairy 
[7,] dairy frozenmeal 
[8,] frozenmeal dairy 
[9,] freshmeat confectionery 
[10,] confectionery freshmeat 
24/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 24 of 53
Bayesian Models in R 10/3/14, 13:37 
Condi6onal*Probability 
The model relates to conditional probablity 
· Naive Bayes 
Used in recommendation systemsclassification problems 
Compute the posterior probability for all values of C using the Bayes 
theorem: 
- 
- P(C|A1, A2,…, An) 
P(C|A1A2 ⋯An) = 
- Choose the value of C that maximizes 
P(C|A1, A2, . . . , An) 
- P(A1, A2, . . . , An|C)P(C) 
P(A1A2 ⋯An |C) × P(C) 
P(A1A2 ⋯An ) 
Equivalent to choosing the value of C that maximizes 
25/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 25 of 53
Bayesian Models in R 10/3/14, 13:37 
Naive*Bayes 
data(iris) 
m = naiveBayes(Species ~ ., data=iris) 
## alternatively: 
m = naiveBayes(iris[, -5], iris[, 5]) 
26/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 26 of 53
Bayesian Models in R 10/3/14, 13:37 
Naive*Bayes 
Model: 
m 
Naive Bayes Classifier for Discrete Predictors 
Call: 
naiveBayes.default(x = iris[, -5], y = iris[, 5]) 
A-priori probabilities: 
iris[, 5] 
setosa versicolor virginica 
0.33333 0.33333 0.33333 
Conditional probabilities: 
Sepal.Length 
iris[, 5] [,1] [,2] 
setosa 5.006 0.35249 27/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 27 of 53
Bayesian Models in R 10/3/14, 13:37 
Naive*Bayes 
Predict: 
table(predict(m, iris), iris[,5]) 
setosa versicolor virginica 
setosa 50 0 0 
versicolor 0 47 3 
virginica 0 3 47 
28/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 28 of 53
Bayesian Models in R 10/3/14, 13:37 
From*condi6onal*probablity*to*Bayes'*Theorem 
We have: 
So: 
Change the Conditional Prob. 
· 
P(B|A) = 
P(AB) 
P(A) 
· 
P(AB) = P(B|A)P(A) 
· 
P(AB) 
P(B) 
P(A|B) = = 
P(B|A)P(A) 
P(B) 
29/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 29 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayes'*Theorem 
P(A|B) = 
P(B|A)P(A) 
P(B) 
Bayes' theorem relates the conditional probablity to the marginal distribution of a random varable. 
Bayes' theorm can tell us how to update our thinking after obtaining new data. 
Harold Jeffreys has claimed that Bayes' theorem is to Statistics as the Pythagorean theorem is to 
geometry. 
· 
· 
30/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 30 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayes'*theorem 
Continuous situation 
The Bayes' theorem mentioned above is in discrete form 
In the real world often we are using and analyzing continuous random variables 
The Bayes' theorem can be written in continuous form as: 
· 
· 
· 
π(θ|x) = 
f (x|θ)π(θ) 
m(x) 
31/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 31 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayes'*Theorem 
Continous form 
π(θ|x) = 
f (x|θ)π(θ) 
m(x) 
· Here 
- θ 
is an unknown parameter 
- X 
is the data observed 
- Processing is from π(θ) to 
π(θ|x) 
- From the original knowledge of θ updated to the situation after we observe 
X 
32/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 32 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayes'*Theorem 
Continuous form 
π(θ|x) = 
f (x|θ)π(θ) 
m(x) 
· Based on the properties of continous random variables, it can be written as: 
π(θ|x) = 
f (x|θ)π(θ) 
∫ f (x|θ)π(θ)dθ 
33/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 33 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayes'*Theorem 
Continuous form 
Important distributions: 
f (x|θ)π(θ) 
m(x) 
π(θ|x) = = 
f (x|θ)π(θ) 
∫ f (x|θ)π(θ)dθ 
· π(θ) 
- Prior distribution 
· π(θ|x) 
- Posterior distribution 
34/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 34 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayes'*Theorem 
Continuous form 
Other distributions: 
f (x|θ)π(θ) 
m(x) 
π(θ|x) = = 
f (x|θ)π(θ) 
∫ f (x|θ)π(θ)dθ 
· m(x) = ∫ f (x|θ)π(θ)dθ 
- Marginal Distribution 
· f (x|θ)π(θ) = f (x, θ) 
- Joint distribution 
35/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 35 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian Models 
36/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 36 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Models 
Bayesian thinking 
data(iris) 
head(iris) 
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 
1 5.1 3.5 1.4 0.2 setosa 
2 4.9 3.0 1.4 0.2 setosa 
3 4.7 3.2 1.3 0.2 setosa 
4 4.6 3.1 1.5 0.2 setosa 
5 5.0 3.6 1.4 0.2 setosa 
6 5.4 3.9 1.7 0.4 setosa 
· Data are random variables with a mean of μ 
37/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 37 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Models 
Bayesian thinking 
· The frequency perspective: The mean μ is a constant 
colMeans(iris[, 1:3]) 
Sepal.Length Sepal.Width Petal.Length 
5.8433 3.0573 3.7580 
38/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 38 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Models 
Bayesian thinking 
· The Bayesian perspective: The mean μ is a random variable 
PROB SEPAL LENGTH SEPAL WIDTH PETAL LENGTH 
90% 5.843333 3.057333 3.758000 
10% Others Others Others 
39/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 39 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Models 
In fact, nearly all of modern Bayesian modeling uses Bayesian thinking 
Nearly all statistical models can be implemented as Bayesian-form models 
Even some non-parametric models can be transformed to Bayeseian versions 
Bayes Cluster 
Bayes Regression 
- Logit, Probit, Tobit, Quantile, LASSO... 
Bayes Neural Net 
Non-parametric Bayes 
Hierarchical model 
etc. 
· 
· 
· 
· 
· 
· 
· 
· 
· 
40/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 40 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
Question 
For a Sample from a normal distribution. We want to know the mean of this sample. 
ˆ 
θ· X1, X2, . . . , Xn ∼ N(θ, σ) 
· Frequentists think 
= mean(x) · θ 
· θ ∼ N(μ, τ2) 
Bayesians think is a random variable with a distribution 
Suppose that 
· 
Infer the posterior distribution 
Calculate the posterior distribution 
Estimate the mean of the sample 
- 
- 
- 
41/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 41 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
Inference 
Inferring the posterior distribution using Bayes' Theorem in continous form: 
f (x|θ)π(θ) 
m(x) 
π(θ|x) = = 
f (x|θ)π(θ) 
∫ f (x|θ)π(θ)dθ 
· Put the distribution into the theorem to calculate the posterior distribution 
- Prior distribution 
θ ∼ N(μ, τ2) 
- Conditional distribution 
x|θ ∼ N(θ, σ2 ) 
42/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 42 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
Inference 
43/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 43 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
Calculating the posterior distribution 
According to the theorem, we know the mean and the variance of θ for a normal distribution. 
postDis = function(miu=2, tau=4, n=100) { 
x = rnorm(n,3,5) 
a = list(0) 
a[[1]] = (var(x)*miu+tau^2*mean(x))/(var(x)+tau^2) 
a[[2]] = var(x)*tau^2/(var(x)+tau^2) 
a 
} 
postDis(3, 5, 1000) 
[[1]] 
[1] 2.9284 
[[2]] 
[1] 12.254 
44/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 44 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
Estimating the mean 
· μ 
In ordinary statistics, the MLE and moment estimators of in a normal distribution are the sample 
mean. 
For the Bayes posterior distribution 
· 
MLE --- posterior maximum likelihood estimator 
Can be considered as MLE of posterior distribution 
Posterior distribution is normal, too. So, the parameter of the mean is: 
- 
- 
- 
(σ2μ + τ2x)/(σ2 + τ2 ) 
45/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 45 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
Estimating the mean 
· x ∼ N(μ, σ) = N(3, 5) 
- The mean is 3 
When using a different prior distribution 
Observe the error in a different situation 
· 
· 
46/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 46 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
· Prior distribution: N(3, 1) 
library(ggplot2) 
plot_dif = function(miu=3, tau=1) { 
i = seq(100, 10000, by=10) 
set.seed(123) 
meanCompare = function(n=100, miu=3, tau=1) { 
x = rnorm(n, 3, 5) 
(var(x)*miu+tau^2*mean(x))/(var(x)+tau^2)-3 
} 
aa = sapply(i, meanCompare, miu=miu, tau=tau) 
bb = sapply(i,function(i) mean(rnorm(i,3,5))-3) 
g = ggplot(data.frame(i=i, a=aa, b=bb)) + 
geom_line(aes(x=i ,y=b), col=blue) + 
geom_line(aes(x=i, y=a), col=red) 
print(g) 
} 
47/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 47 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
· Prior distribution: N(3, 1) (Bayes estimator in red, MLE in blue) 
plot_dif(3, 1) 
48/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 48 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
· Prior distribution: N(2, 1) (Bayes estimator in red, MLE in blue) 
plot_dif(2,1) 
49/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 49 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
· Prior distribution: N(2, 4) (Bayes estimator in red, MLE in blue) 
plot_dif(2,4) 
50/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 50 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
· Prior distribution: N(2, 100) (Bayes estimator in red, MLE in blue) 
plot_dif(2,100) 
51/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 51 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
1. As we can see, if the prior distribution is very accurate, the Bayes estimator is better than the 
ordinary estimator. 
2. If the prior distribution is not accurate enough: 
Larger variance is better 
For a suitable variance more data is better 
· 
· 
52/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 52 of 53
Bayesian Models in R 10/3/14, 13:37 
Bayesian*Modeling*Example 
Choosing the prior distribution 
· Choosing a prior distribution... 
If sure for the model, can improve the accuracy of the estimator 
If not sure, should be done by selecting for greater variance to improve the estimator 
- 
- 
53/53 
http://docs.supstat.com/BayesianModelEN/#1 Page 53 of 53

Bayesian models in r

  • 1.
    Bayesian Models inR 10/3/14, 13:37 Bayesian Models in R Vivian Zhang | SupStat Inc. Copyright SupStat Inc., All rights reserved http://docs.supstat.com/BayesianModelEN/#1 Page 1 of 53
  • 2.
    Bayesian Models inR 10/3/14, 13:37 Outline 1. Introduction to Bayes and Bayes' Theorem 2. Distribution estimation 3. Conditional probability 4. Bayesian models 2/53 http://docs.supstat.com/BayesianModelEN/#1 Page 2 of 53
  • 3.
    Bayesian Models inR 10/3/14, 13:37 Introduction to Bayes and Bayes' Theorem 3/53 http://docs.supstat.com/BayesianModelEN/#1 Page 3 of 53
  • 4.
    Bayesian Models inR 10/3/14, 13:37 The*story*behind*the*Bayesian*model Thomas Bayes 18th century English statistician Most known for the Bayes Theorem Essential contributor to early development of probability theory · · · Source: http://www.bioquest.org/products/auth_images/422_bayes.gif 4/53 http://docs.supstat.com/BayesianModelEN/#1 Page 4 of 53
  • 5.
    Bayesian Models inR 10/3/14, 13:37 The*Model 1. Models using Bayes' theorem (based on conditional probablity · Naive Bayes, Association Rules 2. Bayes Decision Theory · Classical Bayesian model for Decision Theory 3. Models implementing Bayesian thinking · Treat all the parameter as random variables, especially in hierarchical models 5/53 http://docs.supstat.com/BayesianModelEN/#1 Page 5 of 53
  • 6.
    Bayesian Models inR 10/3/14, 13:37 Distribution Estimation 6/53 http://docs.supstat.com/BayesianModelEN/#1 Page 6 of 53
  • 7.
    Bayesian Models inR 10/3/14, 13:37 Distribu6on*Es6ma6on Probablity Density Function In statistics, the Probablity Density Function (PDF) of a continous random variable is an output discribing this variable, which means the probability around a certain point. Example: plot of PDF of the Normal distribution · · 7/53 http://docs.supstat.com/BayesianModelEN/#1 Page 7 of 53
  • 8.
    Bayesian Models inR 10/3/14, 13:37 Distribu6on*Es6ma6on Probablity Density Function The PDF has an important place in statistics - It contains all the information in the random variable Knowing the PDF, we can calculate the · · Mean Variance Median etc. - - - - 8/53 http://docs.supstat.com/BayesianModelEN/#1 Page 8 of 53
  • 9.
    Bayesian Models inR 10/3/14, 13:37 Distribu6on*Es6ma6on Probablity Density Function Obtain the PDF, get everything from a random variable. This allows you to perform: Bayesian Hypothesis Tests Bayesian Interval Estimation Bayesian Regression Models Bayesian Logistic Models etc. · · · · · 9/53 http://docs.supstat.com/BayesianModelEN/#1 Page 9 of 53
  • 10.
    Bayesian Models inR 10/3/14, 13:37 Distribu6on*Es6ma6on Probablity Density Function ExampleBayesian Regression: Y = Xβ + ϵ, ϵ ∼ N(0, σ2 ) Estimation methods for the regression model · · - - β ∼ N((X′ X)−1X′ Y, (X′ X)−1 ) - = ( X Y βˆ OLS (Ordinary Least Squres) X′ )−1X′ is the estimator of β 10/53 http://docs.supstat.com/BayesianModelEN/#1 Page 10 of 53
  • 11.
    Bayesian Models inR 10/3/14, 13:37 Distribu6on*Es6ma6on The Bayesian Model Before obtaining data, one has beliefs about the value of the proportion and models his or her beliefs in terms of a prior distribution. After data have been observed, one updates one’s beliefs about the proportion by computing the posterior distribution. · · 11/53 http://docs.supstat.com/BayesianModelEN/#1 Page 11 of 53
  • 12.
    Bayesian Models inR 10/3/14, 13:37 Distribu6on*Es6ma6on The Bayesian Model Building a Bayesian model begins with Bayesian Thinking (every value has its own distribution). Steps to build a Bayesian model: · · Make inferences about prior distribution Calculate the parameter of the posterior distribution Finish the statistical task (interval estimationstatistical decision, etc.) - - - 12/53 http://docs.supstat.com/BayesianModelEN/#1 Page 12 of 53
  • 13.
    Bayesian Models inR 10/3/14, 13:37 Inferring*from*the*posterior*distribu6on Posterior inference is the core of Bayes' Theorem, because we do not actually know the population distribution which generated our data. We use the conditional distribution to address this gap indirectly. In this section, a certain degree of mathematical sophistication is required without which we cannot easily implement the model computationally. · Essentials: Bayes' theorem Conditional distribution - For example: ϵ in regression is from a normal distribution Certain prior distribution · · · - No information given 13/53 http://docs.supstat.com/BayesianModelEN/#1 Page 13 of 53
  • 14.
    Bayesian Models inR 10/3/14, 13:37 Calcula6ng*the*posterior*distribu6on The most difficult part is calculating the posterior distribution, which requires integration. · Markov chain Monte Carlo (MCMC) Gibbs MH method - - 14/53 http://docs.supstat.com/BayesianModelEN/#1 Page 14 of 53
  • 15.
    Bayesian Models inR 10/3/14, 13:37 Conditional probability 15/53 http://docs.supstat.com/BayesianModelEN/#1 Page 15 of 53
  • 16.
    Bayesian Models inR 10/3/14, 13:37 Condi6onal*probability What is conditional probability? · A B P(A|B) The probablity that event will occur when event has occurred. This probability is written as . P(A|B) = P(AB) P(B) A and B are two events · P(AB) · P(B) is the probability that both A and B occur. is the probability that B occurs. · 16/53 http://docs.supstat.com/BayesianModelEN/#1 Page 16 of 53
  • 17.
    Bayesian Models inR 10/3/14, 13:37 Condi6onal*probability Why conditional probability Example · Suppose A: The event of getting a cold B: The event of a rainy day (p = 0.2) AB: The event that when it rains you get a cold (p = 0.1) - - - P(AB) P(B) 0.1 0.2 P(A|B) = = = 0.5 · Interpretation: - When it rains, the probablity of getting a cold is 50% 17/53 http://docs.supstat.com/BayesianModelEN/#1 Page 17 of 53
  • 18.
    Bayesian Models inR 10/3/14, 13:37 Condi6onal*probability Exercise · There are two kids in a family. If one of the kids is a boy, the probability that the other one is also a boy is... If the first one is a boy, the probability that the other one is a boy is... , - - - 23 12 18/53 http://docs.supstat.com/BayesianModelEN/#1 Page 18 of 53
  • 19.
    Bayesian Models inR 10/3/14, 13:37 Condi6onal*Probability The model relates to conditional probability · A priori Mining associated rules The association from A to B is defined as: - - P(AB) P(A) A = B : = P(B|A) · In R, use the arules package 19/53 http://docs.supstat.com/BayesianModelEN/#1 Page 19 of 53
  • 20.
    Bayesian Models inR 10/3/14, 13:37 Condi6onal*Probability A priori Goal: find the items with strong relationships First, load the data: · · library(arules) data = read.csv(data/BASKETS1n) names(data) [1] cardid value pmethod sex homeown income [7] age fruitveg freshmeat dairy cannedveg cannedmeat [13] frozenmeal beer wine softdrink fish confectionery 20/53 http://docs.supstat.com/BayesianModelEN/#1 Page 20 of 53
  • 21.
    Bayesian Models inR 10/3/14, 13:37 Condi6onal*Probability A priori basket = data[, 8:18] names(basket)[which(basket[1, ] == T)] [1] freshmeat dairy confectionery tbs2 = apply(basket, 1, function(x) names(basket)[which(x==T)]) len = sapply(tbs2, length) require(arules) trans.code = rep(1:1000, len) trans.items = unname(unlist(tbs2)) trans.code.ind = match(trans.code, unique(trans.code)) trans.items.ind = match(trans.items, unique(trans.items)) 21/53 http://docs.supstat.com/BayesianModelEN/#1 Page 21 of 53
  • 22.
    Bayesian Models inR 10/3/14, 13:37 Condi6onal*Probability A priori mat = sparseMatrix(i = trans.items.ind, j = trans.code.ind, x = 1, dims = c(length(unique(trans.items)), length(unique(trans.code)))) mat = as(mat, 'ngCMatrix') #after setting the argument we get the model: trans.res = apriori(mat,parameter = list(confidence=0.05, support=0.05, minlen=2,maxlen=3)) 22/53 http://docs.supstat.com/BayesianModelEN/#1 Page 22 of 53
  • 23.
    Bayesian Models inR 10/3/14, 13:37 Condi6onal*Probability A priori parameter specification: confidence minval smax arem aval originalSupport support minlen maxlen target ext 0.05 0.1 1 none FALSE TRUE 0.05 2 3 rules FALSE algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE apriori - find association rules with the apriori algorithm version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt set item appearances ...[0 item(s)] done [0.00s]. set transactions ...[11 item(s), 940 transaction(s)] done [0.00s]. sorting and recoding items ... [11 item(s)] done [0.00s]. creating transaction tree ... done [0.00s]. checking subsets of size 1 2 3 done [0.00s]. writing ... [108 rule(s)] done [0.00s]. 23/53 http://docs.supstat.com/BayesianModelEN/#1 Page 23 of 53
  • 24.
    Bayesian Models inR 10/3/14, 13:37 Condi6onal*Probability · At last, we have the items with the strongest relationship in one basket #let's see these rules: lhs.generic = unique(trans.items)[trans.res@lhs@data@i+1] rhs.generic = unique(trans.items)[trans.res@rhs@data@i+1] cbind(lhs.generic, rhs.generic)[1:10, ] lhs.generic rhs.generic [1,] dairy confectionery [2,] confectionery dairy [3,] dairy fish [4,] fish dairy [5,] dairy fruitveg [6,] fruitveg dairy [7,] dairy frozenmeal [8,] frozenmeal dairy [9,] freshmeat confectionery [10,] confectionery freshmeat 24/53 http://docs.supstat.com/BayesianModelEN/#1 Page 24 of 53
  • 25.
    Bayesian Models inR 10/3/14, 13:37 Condi6onal*Probability The model relates to conditional probablity · Naive Bayes Used in recommendation systemsclassification problems Compute the posterior probability for all values of C using the Bayes theorem: - - P(C|A1, A2,…, An) P(C|A1A2 ⋯An) = - Choose the value of C that maximizes P(C|A1, A2, . . . , An) - P(A1, A2, . . . , An|C)P(C) P(A1A2 ⋯An |C) × P(C) P(A1A2 ⋯An ) Equivalent to choosing the value of C that maximizes 25/53 http://docs.supstat.com/BayesianModelEN/#1 Page 25 of 53
  • 26.
    Bayesian Models inR 10/3/14, 13:37 Naive*Bayes data(iris) m = naiveBayes(Species ~ ., data=iris) ## alternatively: m = naiveBayes(iris[, -5], iris[, 5]) 26/53 http://docs.supstat.com/BayesianModelEN/#1 Page 26 of 53
  • 27.
    Bayesian Models inR 10/3/14, 13:37 Naive*Bayes Model: m Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = iris[, -5], y = iris[, 5]) A-priori probabilities: iris[, 5] setosa versicolor virginica 0.33333 0.33333 0.33333 Conditional probabilities: Sepal.Length iris[, 5] [,1] [,2] setosa 5.006 0.35249 27/53 http://docs.supstat.com/BayesianModelEN/#1 Page 27 of 53
  • 28.
    Bayesian Models inR 10/3/14, 13:37 Naive*Bayes Predict: table(predict(m, iris), iris[,5]) setosa versicolor virginica setosa 50 0 0 versicolor 0 47 3 virginica 0 3 47 28/53 http://docs.supstat.com/BayesianModelEN/#1 Page 28 of 53
  • 29.
    Bayesian Models inR 10/3/14, 13:37 From*condi6onal*probablity*to*Bayes'*Theorem We have: So: Change the Conditional Prob. · P(B|A) = P(AB) P(A) · P(AB) = P(B|A)P(A) · P(AB) P(B) P(A|B) = = P(B|A)P(A) P(B) 29/53 http://docs.supstat.com/BayesianModelEN/#1 Page 29 of 53
  • 30.
    Bayesian Models inR 10/3/14, 13:37 Bayes'*Theorem P(A|B) = P(B|A)P(A) P(B) Bayes' theorem relates the conditional probablity to the marginal distribution of a random varable. Bayes' theorm can tell us how to update our thinking after obtaining new data. Harold Jeffreys has claimed that Bayes' theorem is to Statistics as the Pythagorean theorem is to geometry. · · 30/53 http://docs.supstat.com/BayesianModelEN/#1 Page 30 of 53
  • 31.
    Bayesian Models inR 10/3/14, 13:37 Bayes'*theorem Continuous situation The Bayes' theorem mentioned above is in discrete form In the real world often we are using and analyzing continuous random variables The Bayes' theorem can be written in continuous form as: · · · π(θ|x) = f (x|θ)π(θ) m(x) 31/53 http://docs.supstat.com/BayesianModelEN/#1 Page 31 of 53
  • 32.
    Bayesian Models inR 10/3/14, 13:37 Bayes'*Theorem Continous form π(θ|x) = f (x|θ)π(θ) m(x) · Here - θ is an unknown parameter - X is the data observed - Processing is from π(θ) to π(θ|x) - From the original knowledge of θ updated to the situation after we observe X 32/53 http://docs.supstat.com/BayesianModelEN/#1 Page 32 of 53
  • 33.
    Bayesian Models inR 10/3/14, 13:37 Bayes'*Theorem Continuous form π(θ|x) = f (x|θ)π(θ) m(x) · Based on the properties of continous random variables, it can be written as: π(θ|x) = f (x|θ)π(θ) ∫ f (x|θ)π(θ)dθ 33/53 http://docs.supstat.com/BayesianModelEN/#1 Page 33 of 53
  • 34.
    Bayesian Models inR 10/3/14, 13:37 Bayes'*Theorem Continuous form Important distributions: f (x|θ)π(θ) m(x) π(θ|x) = = f (x|θ)π(θ) ∫ f (x|θ)π(θ)dθ · π(θ) - Prior distribution · π(θ|x) - Posterior distribution 34/53 http://docs.supstat.com/BayesianModelEN/#1 Page 34 of 53
  • 35.
    Bayesian Models inR 10/3/14, 13:37 Bayes'*Theorem Continuous form Other distributions: f (x|θ)π(θ) m(x) π(θ|x) = = f (x|θ)π(θ) ∫ f (x|θ)π(θ)dθ · m(x) = ∫ f (x|θ)π(θ)dθ - Marginal Distribution · f (x|θ)π(θ) = f (x, θ) - Joint distribution 35/53 http://docs.supstat.com/BayesianModelEN/#1 Page 35 of 53
  • 36.
    Bayesian Models inR 10/3/14, 13:37 Bayesian Models 36/53 http://docs.supstat.com/BayesianModelEN/#1 Page 36 of 53
  • 37.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Models Bayesian thinking data(iris) head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa · Data are random variables with a mean of μ 37/53 http://docs.supstat.com/BayesianModelEN/#1 Page 37 of 53
  • 38.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Models Bayesian thinking · The frequency perspective: The mean μ is a constant colMeans(iris[, 1:3]) Sepal.Length Sepal.Width Petal.Length 5.8433 3.0573 3.7580 38/53 http://docs.supstat.com/BayesianModelEN/#1 Page 38 of 53
  • 39.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Models Bayesian thinking · The Bayesian perspective: The mean μ is a random variable PROB SEPAL LENGTH SEPAL WIDTH PETAL LENGTH 90% 5.843333 3.057333 3.758000 10% Others Others Others 39/53 http://docs.supstat.com/BayesianModelEN/#1 Page 39 of 53
  • 40.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Models In fact, nearly all of modern Bayesian modeling uses Bayesian thinking Nearly all statistical models can be implemented as Bayesian-form models Even some non-parametric models can be transformed to Bayeseian versions Bayes Cluster Bayes Regression - Logit, Probit, Tobit, Quantile, LASSO... Bayes Neural Net Non-parametric Bayes Hierarchical model etc. · · · · · · · · · 40/53 http://docs.supstat.com/BayesianModelEN/#1 Page 40 of 53
  • 41.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example Question For a Sample from a normal distribution. We want to know the mean of this sample. ˆ θ· X1, X2, . . . , Xn ∼ N(θ, σ) · Frequentists think = mean(x) · θ · θ ∼ N(μ, τ2) Bayesians think is a random variable with a distribution Suppose that · Infer the posterior distribution Calculate the posterior distribution Estimate the mean of the sample - - - 41/53 http://docs.supstat.com/BayesianModelEN/#1 Page 41 of 53
  • 42.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example Inference Inferring the posterior distribution using Bayes' Theorem in continous form: f (x|θ)π(θ) m(x) π(θ|x) = = f (x|θ)π(θ) ∫ f (x|θ)π(θ)dθ · Put the distribution into the theorem to calculate the posterior distribution - Prior distribution θ ∼ N(μ, τ2) - Conditional distribution x|θ ∼ N(θ, σ2 ) 42/53 http://docs.supstat.com/BayesianModelEN/#1 Page 42 of 53
  • 43.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example Inference 43/53 http://docs.supstat.com/BayesianModelEN/#1 Page 43 of 53
  • 44.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example Calculating the posterior distribution According to the theorem, we know the mean and the variance of θ for a normal distribution. postDis = function(miu=2, tau=4, n=100) { x = rnorm(n,3,5) a = list(0) a[[1]] = (var(x)*miu+tau^2*mean(x))/(var(x)+tau^2) a[[2]] = var(x)*tau^2/(var(x)+tau^2) a } postDis(3, 5, 1000) [[1]] [1] 2.9284 [[2]] [1] 12.254 44/53 http://docs.supstat.com/BayesianModelEN/#1 Page 44 of 53
  • 45.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example Estimating the mean · μ In ordinary statistics, the MLE and moment estimators of in a normal distribution are the sample mean. For the Bayes posterior distribution · MLE --- posterior maximum likelihood estimator Can be considered as MLE of posterior distribution Posterior distribution is normal, too. So, the parameter of the mean is: - - - (σ2μ + τ2x)/(σ2 + τ2 ) 45/53 http://docs.supstat.com/BayesianModelEN/#1 Page 45 of 53
  • 46.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example Estimating the mean · x ∼ N(μ, σ) = N(3, 5) - The mean is 3 When using a different prior distribution Observe the error in a different situation · · 46/53 http://docs.supstat.com/BayesianModelEN/#1 Page 46 of 53
  • 47.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example · Prior distribution: N(3, 1) library(ggplot2) plot_dif = function(miu=3, tau=1) { i = seq(100, 10000, by=10) set.seed(123) meanCompare = function(n=100, miu=3, tau=1) { x = rnorm(n, 3, 5) (var(x)*miu+tau^2*mean(x))/(var(x)+tau^2)-3 } aa = sapply(i, meanCompare, miu=miu, tau=tau) bb = sapply(i,function(i) mean(rnorm(i,3,5))-3) g = ggplot(data.frame(i=i, a=aa, b=bb)) + geom_line(aes(x=i ,y=b), col=blue) + geom_line(aes(x=i, y=a), col=red) print(g) } 47/53 http://docs.supstat.com/BayesianModelEN/#1 Page 47 of 53
  • 48.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example · Prior distribution: N(3, 1) (Bayes estimator in red, MLE in blue) plot_dif(3, 1) 48/53 http://docs.supstat.com/BayesianModelEN/#1 Page 48 of 53
  • 49.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example · Prior distribution: N(2, 1) (Bayes estimator in red, MLE in blue) plot_dif(2,1) 49/53 http://docs.supstat.com/BayesianModelEN/#1 Page 49 of 53
  • 50.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example · Prior distribution: N(2, 4) (Bayes estimator in red, MLE in blue) plot_dif(2,4) 50/53 http://docs.supstat.com/BayesianModelEN/#1 Page 50 of 53
  • 51.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example · Prior distribution: N(2, 100) (Bayes estimator in red, MLE in blue) plot_dif(2,100) 51/53 http://docs.supstat.com/BayesianModelEN/#1 Page 51 of 53
  • 52.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example 1. As we can see, if the prior distribution is very accurate, the Bayes estimator is better than the ordinary estimator. 2. If the prior distribution is not accurate enough: Larger variance is better For a suitable variance more data is better · · 52/53 http://docs.supstat.com/BayesianModelEN/#1 Page 52 of 53
  • 53.
    Bayesian Models inR 10/3/14, 13:37 Bayesian*Modeling*Example Choosing the prior distribution · Choosing a prior distribution... If sure for the model, can improve the accuracy of the estimator If not sure, should be done by selecting for greater variance to improve the estimator - - 53/53 http://docs.supstat.com/BayesianModelEN/#1 Page 53 of 53