SlideShare a Scribd company logo
1 of 53
Download to read offline
Bayesian LMMs: Practical issues
Bayesian LMMs: Practical issues
Shravan Vasishth
Department of Linguistics
University of Potsdam, Germany
October 16, 2014
1 / 53
Bayesian LMMs: Practical issues
Comment on HW0
Think about what happens to posterior variance as sample size n
increases, for given v and σ:
v∗ = 1
1
v + 1
σ2/n
0 200 400 600 800 1000
0.0000.0100.020
sample size
posteriorvariance
2 / 53
Bayesian LMMs: Practical issues
Topics for the final lecture
Goals for today
Today I want to discuss
1 Monte Carlo Markov Chain (MCMC) Sampling
2 Model evaluation and checking model assumptions:
1 Using lmer.
2 Using Stan or JAGS.
3 The Box-Cox procedure for obtaining constant variance.
3 Some more complex examples of Bayesian LMMs.
3 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Monte Carlo integration
It sounds fancy, but basically this amounts to sampling from a
distribution, and computing summaries like the mean.
Formally, we calculate E(f(X)) by drawing samples {X1,...,Xn}
and then approximating:
E(f (X)) ≈
1
n ∑f (X) (1)
For example:
> x<-rnorm(1000,mean=0,sd=1)
> mean(x)
[1] 0.02445578
We can increase sample size to as large as we want.
We can also compute quantities like P(X < 1.96) by sampling:
> mean(x<1.96)
[1] 0.976
4 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
MCMC sampling
1 So, we can compute summary statistics using simulation if we
know the distribution we are sampling from.
2 However, if we only know up to proportionality the form of
the distribution to sample from, how do we get these samples
to summarize from? Recall:
Posterior ∝ Lik Prior
3 Monte Carlo Markov Chain (MCMC) methods provide that
capability: they allow you to sample from distributions you
only know up to proportionality.
5 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Markov chain sampling
We have been doing non-Markov chain sampling:
> indep.samp<-rnorm(500,mean=0,sd=1)
> head(indep.samp)
[1] -0.6296954 0.3448306 0.5497537 -0.1217278 -1.1123841
The vector of values sampled here are statistically independent.
0 100 200 300 400 500
−3−2−10123
indep.samp
6 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Markov chain
If the current value influences the next one, we have a Markov
chain. Here is a simple Markov chain: the i-th draw is dependent
on the i −1-th draw:
> nsim<-500
> x<-rep(NA,nsim)
> y<-rep(NA,nsim)
> x[1]<-rnorm(1) ## initialize x
> for(i in 2:nsim){
+ ## draw i-th value based on i-1-th value:
+ y[i]<-rnorm(1,mean=x[i-1],sd=1)
+ x[i]<-y[i]
+ }
7 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Markov chain
0 100 200 300 400 500
−25−15−505
1:nsim
y
8 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Monte Carlo Markov Chain sampling
The basic idea of MCMC is that we sample from an approximate
distribution, and then correct or adjust the values.
9 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Two sampling methods
Next, I will discuss two sampling methods, in order to give a flavor
for how we can sample from a distribution.
1 Inverse sampling
2 Rejection sampling
The central problem is: given a distribution, say
f (x) = 1
40(2x +3)
how can we sample from it?
Recall that with the normal distribution it’s easy: use rnorm.
10 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Inverse sampling
This method works when we can know the closed form of the pdf
we want to simulate from and can derive the inverse of that
function.
Steps:
1 Sample one number u from Unif (0,1). Let
u = F(z) =
´ z
L f (x)dx (here, L is the lower bound of the pdf
f).
2 Then z = F−1(u) is a draw from f (x).
11 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Inverse sampling
Example: Let f (x) = 1
40(2x +3), with 0 < x < 5.
0 1 2 3 4 5
0.100.25
x
fn(x)
12 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Inverse sampling
1 Let f (x) = 1
40(2x +3), with 0 < x < 5.
2 We have to draw a number from the uniform distribution and
then solve for z, which amounts to finding the inverse
function:
u =
ˆ z
0
1
40
(2x +3)dx (2)
3 Solving the integral:
ˆ z
0
1
40
(2x +3) =
1
40
ˆ z
0
(2x +3)dx
=
1
40
2x2
2
+3x
z
0
=
1
40
[z2
+3z]
(3)
13 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Inverse sampling
So, if u = 1
40[z2 +3z], then, 40u = z2 +3z.
Completing the square, we can express z in terms of u:
Notice that
(z +
3
2
)2
= z2
+3z +
9
4
(4)
Therefore
40u +
9
4
= (z +
3
2
)2
(5)
Solving for z we get:
z =
1
2
(
√
160u +9−3) (6)
Note: you will never need to do this yourself, this is just intended
to give you a sense of how sampling works.
14 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Inverse sampling
> u<-runif(1000,min=0,max=1)
> z<-(1/2) * (sqrt(160*u +9)-3)
This method can’t be used if we can’t find the inverse, and it can’t
be used with multivariate distributions.
15 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Inverse sampling
> hist(z,freq=FALSE)
> fn<-function(x){(1/40) * (2*x + 3)}
> x<-seq(0,5,by=0.01)
> lines(x,fn(x))
Histogram of z
z
Density
0 1 2 3 4 5
0.000.100.200.30
16 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Rejection sampling
If F−1(u) can’t be computed, we sample from f (x) as follows:
1 Sample a value z from a distribution g(z) from which
sampling is easy, and for which
mg(z) > f (z) m a constant (7)
mg(z) is called an envelope function because it envelops f (z).
2 Compute the ratio
R =
f (z)
mg(z)
(8)
3 Sample u ∼ Unif (0,1).
4 If R > u, then z is treated as a draw from f (x). Otherwise
return to step 1.
17 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Rejection sampling
1 For example, consider f(x) as above: f (x) = 1
40(2x +3), with
0 < x < 5. The maximum height of f (x) is 0.325.
2 So we need an envelope function that exceeds 0.325. The
uniform density Unif (0,5) has maximum height 0.2, so if we
multiply it by 2 we have maximum height 0.4, which is greater
than 0.325.
0 1 2 3 4 5
0.00.20.4
x
fn(x)
18 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Rejection sampling
In the first step, we sample a number x from a uniform distribution
Unif(0,5). This serves to locate a point on the x-axis between 0
and 5 (the domain of x).
> x<-runif(1,0,5)
[1] 4.693517
Keep in mind that even though you can’t sample from the function
f(x), you can evaluate it for any x:
> fn(x)
[1] 0.3096758
19 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Rejection sampling
The next step involves locating a point in the y direction once the
x coordinate is fixed. If we draw a number u from Unif(0,1):
> u<-runif(1,0,1)
[1] 0.4401601
then mg(x)u = 2∗0.2u is a number between 0 and 2×0.2 = 0.4:
> 0.4*u
[1] 0.176064
> 0.4*u < fn(x)
[1] TRUE
If this number is less than f(x), that means that the y value falls
within f(x), so we accept it, else reject.
20 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Rejection sampling
> count<-0
> k<-1
> accepted<-rep(NA,1000)
> rejected<-rep(NA,1000)
> while(k<1001)
+ {
+ z<-runif(1,min=0,max=5)
+ r<-((1/40)*(2*z+3))/(2*.2)
+ if(r>runif(1,min=0,max=1)) {
+ accepted[k]<-z
+ k<-k+1} else {
+ rejected[k]<-z
+ }
+ count<-count+1
+ }
21 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Rejection sampling
Example of rejection sampling
Density
0 1 2 3 4 5
0.000.050.100.150.200.250.30
22 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Some limitations of rejection sampling
1 Finding an envelope function may be difficult.
2 The acceptance rate would be low if the constant m is set too
high and/or if the envelope function is too high relative to
f(x), making the algorithm inefficient.
23 / 53
Bayesian LMMs: Practical issues
Monte Carlo Markov Chain sampling
Two simple sampling methods
Summary so far
1 We now know what a Markov Chain is.
2 We know that there are several methods to sample from a
distribution (we saw two).
Next, I will talk about a sampling method using in JAGS (Just
Another Gibbs Sampler).
24 / 53
Bayesian LMMs: Practical issues
Gibbs sampling
Gibbs Sampling
Goal: sample from a posterior distribution of some parameter(s).
Let Θ be the vector of parameter values, say Θ =< θ1,θ2 >. Let j
index the j-th iteration.
Algorithm:
1 Assign starting values to Θ =< θ1,θ2 >:
Θj=0 ← S
2 Set j ← j +1
3 1. Sample θj
1 | θj−1
2 (e.g., using inversion sampling).
2. Sample θj
2 | θj
1 (e.g., using inversion sampling).
4 Return to step 1.
Simplifying a bit, eventually, we will start sampling the parameter
values from the correct distribution—we will get sample from the
posterior distribution.
Of course, I didn’t explain here why this works. See Lynch book
for details.
25 / 53
Bayesian LMMs: Practical issues
Gibbs sampling
Gibbs sampling: Example
Sample two random variables from a bivariate normal, where
X ∼ N(0,1) and Y ∼ N(0,1).
26 / 53
Bayesian LMMs: Practical issues
Gibbs sampling
Gibbs sampling: Example
See MC walk graphic.
27 / 53
Bayesian LMMs: Practical issues
Gibbs sampling
Gibbs sampling: Example
We can plot the“trace plot”of each density, as well as the
marginal density: see traceplot graphic.
28 / 53
Bayesian LMMs: Practical issues
Gibbs sampling
Gibbs sampling: Example
The trace plots show the points that were sampled. The initial
values were not realistic, and it could be that the first few iterations
do not really yield representative samples. We discard these, and
the initial period is called“burn-in”(JAGS) or“warm-up”(Stan).
The two dimensional trace plot traces the Markov chain walk (see
Markov chain walk figure)
29 / 53
Bayesian LMMs: Practical issues
Gibbs sampling
Gibbs sampling: Example
We can also summarize the marginal distributions. One way is to
compute the 95% credible interval, and the mean or median.
That’s what we have been doing in the last few lectures.
30 / 53
Bayesian LMMs: Practical issues
Gibbs sampling
Key point regarding MCMC sampling
1 As long as we know the posterior distribution up to
proportionality, we can sample from the posterior:
Posterior ∝ Lik Prior
2 There are other sampling methods, such as
Metropolis-Hastings, and Hamiltonian MC (Stan uses this).
3 As end-users, we do not need to be able to program these
ourselves (except for fun).
31 / 53
Bayesian LMMs: Practical issues
Gibbs sampling
Recall the JAGS commands
> headnoun.mod <- jags.model(
file="gwmaximal.jag",
data = headnoun.dat,
n.chains = 4,
n.adapt =2000 , quiet=T)
1 n.chains: Number of independent chains to run (always
have more than 1, I use 4 usually)
2 a.adapt: the initial period (to be ignored).
32 / 53
Bayesian LMMs: Practical issues
Gibbs sampling
Recall the JAGS commands
headnoun.res <- coda.samples(headnoun.mod,
var = track.variables,
n.iter = 10000,
thin = 1)
1 n.iter: Number of iterations (walks) in an MCMC chain.
2 thin: Reduce number of MCMC sequences from posterior
(helps in plotting less dense plots). thin=1 means keep all of
them. thin=2 means take every alternative one, etc.
33 / 53
Bayesian LMMs: Practical issues
Evaluating the model
Evaluating the model
The first thing we want to do is find out whether our model has
converged: whether the MCMC sampler is sampling from the
posterior distribution.
The function gelman.diag will give you this information, but the
traceplots are also informative:
gelman.diag()
[See lecture 3, slide 20, for usage]
34 / 53
Bayesian LMMs: Practical issues
Evaluating the model
Evaluating model fit
Recall Gibson and Wu data.
We can examine quality of fit by generating predicted reading
times, and comparing them with
1 the actual data
2 new data (much better)
We present a detailed example of Gibson and Wu data in Sorensen
and Vasishth (submitted to Journal of Mathematical Psychology):
http://www.ling.uni-
potsdam.de/∼vasishth/statistics/BayesLMMs.html
35 / 53
Bayesian LMMs: Practical issues
Evaluating the model
Evaluating model fit
Using the log normal to model the Gibson and Wu data
The data clearly seem to be log-normally distributed:
subj. relative
log normal
rt (msec)
Density
0 2000 5000
0.00000.00100.0020
obj. relative
log normal
rt (msec)
Density 0 2000 5000
0.00000.00100.0020
subj. relative
normal
rt (msec)
Density
−6000 0 4000
0.00000.00100.0020
obj. relative
normal
rt (msec)
Density
−6000 0 4000
0.00000.00100.0020
36 / 53
Bayesian LMMs: Practical issues
Evaluating the model
Evaluating model fit
Using the log normal to model the Gibson and Wu data
In Stan we simply need to write:
rt ~ lognormal(mu,sigma)
subj. relative
log normal
rt (msec)
Density
0 2000 5000
0.00000.00100.0020
obj. relative
log normal
rt (msec)
Density
0 2000 5000
0.00000.00100.0020
subj. relative
normal
rt (msec)
Density
−6000 0 4000
0.00000.00100.0020
obj. relative
normal
rt (msec)
Density
−6000 0 4000
0.00000.00100.0020
37 / 53
Bayesian LMMs: Practical issues
Evaluating the model
Evaluating model fit
Using new data from Vasishth et al 2013
These extreme values turn up again in the subject relative condition.
Either we could include a mixture of models in the data, or we remove
these extreme values as irrelevant to the research question.
subj. relative
Density
0 3000 7000 11000
obj. relative
1000 3000
38 / 53
Bayesian LMMs: Practical issues
Model assumptions
Checking model assumptions
The best model in the Gibson and Wu data-set is:
> m0<-lmer(rrt~x+(1|subj)+(1|item),headnoun)
39 / 53
Bayesian LMMs: Practical issues
Model assumptions
Checking model assumptions
> library(car)
> qqPlot(residuals(m0))
−3 −2 −1 0 1 2 3
−2−10123
norm quantiles
residuals(m0)
40 / 53
Bayesian LMMs: Practical issues
Model assumptions
Checking model assumptions
What would have happened if. . . we had used raw RTs?
> m1<-lmer(rt~x+(1|subj)+(1|item),headnoun)
> qqPlot(residuals(m1))
−3 −2 −1 0 1 2 3
020004000
norm quantiles
residuals(m1)
41 / 53
Bayesian LMMs: Practical issues
Model assumptions
A bad consequence of ignoring model assumptions
The published result in Gibson and Wu is driven by 8 out of 547 extreme
values > 3000ms:
Fixed effects:
Estimate Std. Error t value
(Intercept) 548.43 51.56 10.638
x -120.39 48.01 -2.508
Normalizing the residuals leads to no effect at all:
Estimate Std. Error t value
(Intercept) -2.67217 0.13758 -19.423
x -0.07494 0.08172 -0.917
Removing the 8/547 extreme values in raw RT:
Estimate Std. Error t value
(Intercept) 494.59 34.09 14.509
x -12.66 29.94 -0.423
42 / 53
Bayesian LMMs: Practical issues
Model assumptions
Gibson and Wu: the published result
> head.means<-aggregate(rt~subj+type,
+ mean,data=headnoun)
>
Now, after aggregation, we can run a paired t-test:
> t_test_result<-t.test(subset(head.means,type=="subj-ext")
+ subset(head.means,type=="obj-ext")$rt,paired=T)
> t_test_result$statistic;t_test_result$p.value
t
2.63007
[1] 0.01248104
The difference was significant. (They did an ANOVA, but it’s the
same thing, as t2 = F.)
43 / 53
Bayesian LMMs: Practical issues
Model assumptions
The lesson in this example
The lesson here is to
1 first examine your data graphically
2 check whether model assumptions are satisfied
Fitting models is not just about finding statistical significance, we
are also
1 defining how we think the data were generated
2 generating predictions about what future data will look like.
44 / 53
Bayesian LMMs: Practical issues
Model assumptions
Non-normality of residuals can lead to loss of power
I showed an example where extreme values (non-normality) lead to
an over-enthusiastic p-value.
Here is another consequence:
> ## number of simulations:
> nsim<-100
> ## sample size:
> n<-100
> ## predictor:
> pred<-rep(c(0,1),each=n/2)
> ## matrices for storing results:
> store<-matrix(0,nsim,4)
> storenorm<-matrix(0,nsim,4)
> ## true effect:
> beta.1<-0.5
45 / 53
Bayesian LMMs: Practical issues
Model assumptions
Non-normality of residuals can lead to loss of power
> for(i in 1:nsim){
+ errors<-rlnorm(n)
+ errorsnorm<-rnorm(n)
+ #errors<-errors-mean(errors)
+ #errorsnorm<-errorsnorm-mean(errorsnorm)
+ y<-5 + beta.1*pred + errors ## generate data:
+ ynorm<-5 + beta.1*pred + errorsnorm
+ fm<-lm(y~pred)
+ fmnorm<-lm(ynorm~pred)
+ ## store coef., SE, t-value, p-value:
+ store[i,1:4] <- summary(fm)$coef[2,1:4]
+ storenorm[i,1:4] <- summary(fmnorm)$coef[2,1:4]
+ }
46 / 53
Bayesian LMMs: Practical issues
Model assumptions
Non-normality of residuals can lead to loss of power
> ## observed power for raw scores:
> mean(store[,4]<0.05)
[1] 0.27
> mean(storenorm[,4]<0.05)
[1] 0.64
47 / 53
Bayesian LMMs: Practical issues
Model assumptions
The cost of non-normality: loss of power
200 400 600 800 1000
0.10.20.30.40.50.6
why violating normality is bad for null results
sample size
lossofpower
48 / 53
Bayesian LMMs: Practical issues
Model assumptions
Non-normality of residuals can lead to loss of power
> library(car)
> op<-par(mfrow=c(1,2),pty="s")
> qqPlot(residuals(fm))
> qqPlot(residuals(fmnorm))
−3 −1 1 2 3
0102030
norm quantiles
residuals(fm)
−3 −1 1 2 3
−3−1123
norm quantiles
residuals(fmnorm)
49 / 53
Bayesian LMMs: Practical issues
Model assumptions
What transform to use?
Box-Cox method to find the right transform:
> library(MASS)
> boxcox(rt~type*subj,data=headnoun)
−2 −1 0 1 2
−2500−2000−1500
λ
log−Likelihood
95%
50 / 53
Bayesian LMMs: Practical issues
Model assumptions
What transform to use?
1 A λ of -1 implies a reciprocal transform.
2 A λ of 0 implies a log transform.
We generally use the reciprocal or log transform.
See Venables and Ripley 2002 for details (or Box and Cox, 1964).
51 / 53
Bayesian LMMs: Practical issues
Concluding remarks
Concluding remarks
1 When we fit frequentist models, we mistakenly believe that rejecting
a null hypothesis gives us evidence for the alternative.
Rejecting a null only gives us evidence against the null, not evidence
for the alternative.
2 We don’t check model assumptions, leading to incorrect inferences.
3 We make binary decisions: “reject”and even worse,“accept”the null
hypothesis.
The bigger problem here is that we feel like we have to make a
binary decision.
52 / 53
Bayesian LMMs: Practical issues
Concluding remarks
Concluding remarks
1 All this leads to a false sense of security: we say things like
“we know that X is true.”
2 The key problem is that we should be framing the discussion
in terms of how certain we are that X is true, given data.
The Bayesian toolkit provides us with a method to talk about how
sure we are about X being true.
This does not mean that we should abandon frequentist methods;
rather, we should use the methods correctly, and we should use all
methods available to us.
53 / 53

More Related Content

What's hot

Presentation on Solution to non linear equations
Presentation on Solution to non linear equationsPresentation on Solution to non linear equations
Presentation on Solution to non linear equationsRifat Rahamatullah
 
Distributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedDistributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedPei-Che Chang
 
ROOT OF NON-LINEAR EQUATIONS
ROOT OF NON-LINEAR EQUATIONSROOT OF NON-LINEAR EQUATIONS
ROOT OF NON-LINEAR EQUATIONSfenil patel
 
Solution of non-linear equations
Solution of non-linear equationsSolution of non-linear equations
Solution of non-linear equationsZunAib Ali
 
Newton’s Divided Difference Formula
Newton’s Divided Difference FormulaNewton’s Divided Difference Formula
Newton’s Divided Difference FormulaJas Singh Bhasin
 
Numerical
NumericalNumerical
Numerical1821986
 
Numerical Analysis (Solution of Non-Linear Equations) part 2
Numerical Analysis (Solution of Non-Linear Equations) part 2Numerical Analysis (Solution of Non-Linear Equations) part 2
Numerical Analysis (Solution of Non-Linear Equations) part 2Asad Ali
 
Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1TOMMYLINK1
 
3.2.interpolation lagrange
3.2.interpolation lagrange3.2.interpolation lagrange
3.2.interpolation lagrangeSamuelOseiAsare
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Pei-Che Chang
 
Newton-Raphson Method
Newton-Raphson MethodNewton-Raphson Method
Newton-Raphson MethodJigisha Dabhi
 
Numerical Analysis (Solution of Non-Linear Equations)
Numerical Analysis (Solution of Non-Linear Equations)Numerical Analysis (Solution of Non-Linear Equations)
Numerical Analysis (Solution of Non-Linear Equations)Asad Ali
 
Calculus AB - Slope of secant and tangent lines
Calculus AB - Slope of secant and tangent linesCalculus AB - Slope of secant and tangent lines
Calculus AB - Slope of secant and tangent linesKenyon Hundley
 

What's hot (20)

Presentation on Solution to non linear equations
Presentation on Solution to non linear equationsPresentation on Solution to non linear equations
Presentation on Solution to non linear equations
 
Es272 ch4b
Es272 ch4bEs272 ch4b
Es272 ch4b
 
Distributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedDistributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and Related
 
Es272 ch5b
Es272 ch5bEs272 ch5b
Es272 ch5b
 
ROOT OF NON-LINEAR EQUATIONS
ROOT OF NON-LINEAR EQUATIONSROOT OF NON-LINEAR EQUATIONS
ROOT OF NON-LINEAR EQUATIONS
 
Unit4
Unit4Unit4
Unit4
 
Solution of non-linear equations
Solution of non-linear equationsSolution of non-linear equations
Solution of non-linear equations
 
Bisection and fixed point method
Bisection and fixed point methodBisection and fixed point method
Bisection and fixed point method
 
Newton’s Divided Difference Formula
Newton’s Divided Difference FormulaNewton’s Divided Difference Formula
Newton’s Divided Difference Formula
 
Secant method
Secant methodSecant method
Secant method
 
Numerical
NumericalNumerical
Numerical
 
Numerical Analysis (Solution of Non-Linear Equations) part 2
Numerical Analysis (Solution of Non-Linear Equations) part 2Numerical Analysis (Solution of Non-Linear Equations) part 2
Numerical Analysis (Solution of Non-Linear Equations) part 2
 
Bisection
BisectionBisection
Bisection
 
Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1
 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
 
3.2.interpolation lagrange
3.2.interpolation lagrange3.2.interpolation lagrange
3.2.interpolation lagrange
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)
 
Newton-Raphson Method
Newton-Raphson MethodNewton-Raphson Method
Newton-Raphson Method
 
Numerical Analysis (Solution of Non-Linear Equations)
Numerical Analysis (Solution of Non-Linear Equations)Numerical Analysis (Solution of Non-Linear Equations)
Numerical Analysis (Solution of Non-Linear Equations)
 
Calculus AB - Slope of secant and tangent lines
Calculus AB - Slope of secant and tangent linesCalculus AB - Slope of secant and tangent lines
Calculus AB - Slope of secant and tangent lines
 

Similar to Paris Lecture 4: Practical issues in Bayesian modeling

Maths iii quick review by Dr Asish K Mukhopadhyay
Maths iii quick review by Dr Asish K MukhopadhyayMaths iii quick review by Dr Asish K Mukhopadhyay
Maths iii quick review by Dr Asish K MukhopadhyayDr. Asish K Mukhopadhyay
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksESCOM
 
AOT2 Single Variable Optimization Algorithms.pdf
AOT2 Single Variable Optimization Algorithms.pdfAOT2 Single Variable Optimization Algorithms.pdf
AOT2 Single Variable Optimization Algorithms.pdfSandipBarik8
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Sparsenet
SparsenetSparsenet
Sparsenetndronen
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Regularized Compression of A Noisy Blurred Image
Regularized Compression of A Noisy Blurred Image Regularized Compression of A Noisy Blurred Image
Regularized Compression of A Noisy Blurred Image ijcsa
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fieldslswing
 
Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)Joonyoung Yi
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...AIST
 
A study on number theory and its applications
A study on number theory and its applicationsA study on number theory and its applications
A study on number theory and its applicationsItishree Dash
 

Similar to Paris Lecture 4: Practical issues in Bayesian modeling (20)

Input analysis
Input analysisInput analysis
Input analysis
 
3_MLE_printable.pdf
3_MLE_printable.pdf3_MLE_printable.pdf
3_MLE_printable.pdf
 
Maths iii quick review by Dr Asish K Mukhopadhyay
Maths iii quick review by Dr Asish K MukhopadhyayMaths iii quick review by Dr Asish K Mukhopadhyay
Maths iii quick review by Dr Asish K Mukhopadhyay
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networks
 
AOT2 Single Variable Optimization Algorithms.pdf
AOT2 Single Variable Optimization Algorithms.pdfAOT2 Single Variable Optimization Algorithms.pdf
AOT2 Single Variable Optimization Algorithms.pdf
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Network Security CS3-4
Network Security CS3-4 Network Security CS3-4
Network Security CS3-4
 
Sparsenet
SparsenetSparsenet
Sparsenet
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Regularized Compression of A Noisy Blurred Image
Regularized Compression of A Noisy Blurred Image Regularized Compression of A Noisy Blurred Image
Regularized Compression of A Noisy Blurred Image
 
EPE821_Lecture3.pptx
EPE821_Lecture3.pptxEPE821_Lecture3.pptx
EPE821_Lecture3.pptx
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fields
 
A bit about мcmc
A bit about мcmcA bit about мcmc
A bit about мcmc
 
Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
F0333034038
F0333034038F0333034038
F0333034038
 
Mit6 094 iap10_lec03
Mit6 094 iap10_lec03Mit6 094 iap10_lec03
Mit6 094 iap10_lec03
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
 
A study on number theory and its applications
A study on number theory and its applicationsA study on number theory and its applications
A study on number theory and its applications
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Paris Lecture 4: Practical issues in Bayesian modeling

  • 1. Bayesian LMMs: Practical issues Bayesian LMMs: Practical issues Shravan Vasishth Department of Linguistics University of Potsdam, Germany October 16, 2014 1 / 53
  • 2. Bayesian LMMs: Practical issues Comment on HW0 Think about what happens to posterior variance as sample size n increases, for given v and σ: v∗ = 1 1 v + 1 σ2/n 0 200 400 600 800 1000 0.0000.0100.020 sample size posteriorvariance 2 / 53
  • 3. Bayesian LMMs: Practical issues Topics for the final lecture Goals for today Today I want to discuss 1 Monte Carlo Markov Chain (MCMC) Sampling 2 Model evaluation and checking model assumptions: 1 Using lmer. 2 Using Stan or JAGS. 3 The Box-Cox procedure for obtaining constant variance. 3 Some more complex examples of Bayesian LMMs. 3 / 53
  • 4. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Monte Carlo integration It sounds fancy, but basically this amounts to sampling from a distribution, and computing summaries like the mean. Formally, we calculate E(f(X)) by drawing samples {X1,...,Xn} and then approximating: E(f (X)) ≈ 1 n ∑f (X) (1) For example: > x<-rnorm(1000,mean=0,sd=1) > mean(x) [1] 0.02445578 We can increase sample size to as large as we want. We can also compute quantities like P(X < 1.96) by sampling: > mean(x<1.96) [1] 0.976 4 / 53
  • 5. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling MCMC sampling 1 So, we can compute summary statistics using simulation if we know the distribution we are sampling from. 2 However, if we only know up to proportionality the form of the distribution to sample from, how do we get these samples to summarize from? Recall: Posterior ∝ Lik Prior 3 Monte Carlo Markov Chain (MCMC) methods provide that capability: they allow you to sample from distributions you only know up to proportionality. 5 / 53
  • 6. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Markov chain sampling We have been doing non-Markov chain sampling: > indep.samp<-rnorm(500,mean=0,sd=1) > head(indep.samp) [1] -0.6296954 0.3448306 0.5497537 -0.1217278 -1.1123841 The vector of values sampled here are statistically independent. 0 100 200 300 400 500 −3−2−10123 indep.samp 6 / 53
  • 7. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Markov chain If the current value influences the next one, we have a Markov chain. Here is a simple Markov chain: the i-th draw is dependent on the i −1-th draw: > nsim<-500 > x<-rep(NA,nsim) > y<-rep(NA,nsim) > x[1]<-rnorm(1) ## initialize x > for(i in 2:nsim){ + ## draw i-th value based on i-1-th value: + y[i]<-rnorm(1,mean=x[i-1],sd=1) + x[i]<-y[i] + } 7 / 53
  • 8. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Markov chain 0 100 200 300 400 500 −25−15−505 1:nsim y 8 / 53
  • 9. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Monte Carlo Markov Chain sampling The basic idea of MCMC is that we sample from an approximate distribution, and then correct or adjust the values. 9 / 53
  • 10. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Two sampling methods Next, I will discuss two sampling methods, in order to give a flavor for how we can sample from a distribution. 1 Inverse sampling 2 Rejection sampling The central problem is: given a distribution, say f (x) = 1 40(2x +3) how can we sample from it? Recall that with the normal distribution it’s easy: use rnorm. 10 / 53
  • 11. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Inverse sampling This method works when we can know the closed form of the pdf we want to simulate from and can derive the inverse of that function. Steps: 1 Sample one number u from Unif (0,1). Let u = F(z) = ´ z L f (x)dx (here, L is the lower bound of the pdf f). 2 Then z = F−1(u) is a draw from f (x). 11 / 53
  • 12. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Inverse sampling Example: Let f (x) = 1 40(2x +3), with 0 < x < 5. 0 1 2 3 4 5 0.100.25 x fn(x) 12 / 53
  • 13. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Inverse sampling 1 Let f (x) = 1 40(2x +3), with 0 < x < 5. 2 We have to draw a number from the uniform distribution and then solve for z, which amounts to finding the inverse function: u = ˆ z 0 1 40 (2x +3)dx (2) 3 Solving the integral: ˆ z 0 1 40 (2x +3) = 1 40 ˆ z 0 (2x +3)dx = 1 40 2x2 2 +3x z 0 = 1 40 [z2 +3z] (3) 13 / 53
  • 14. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Inverse sampling So, if u = 1 40[z2 +3z], then, 40u = z2 +3z. Completing the square, we can express z in terms of u: Notice that (z + 3 2 )2 = z2 +3z + 9 4 (4) Therefore 40u + 9 4 = (z + 3 2 )2 (5) Solving for z we get: z = 1 2 ( √ 160u +9−3) (6) Note: you will never need to do this yourself, this is just intended to give you a sense of how sampling works. 14 / 53
  • 15. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Inverse sampling > u<-runif(1000,min=0,max=1) > z<-(1/2) * (sqrt(160*u +9)-3) This method can’t be used if we can’t find the inverse, and it can’t be used with multivariate distributions. 15 / 53
  • 16. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Inverse sampling > hist(z,freq=FALSE) > fn<-function(x){(1/40) * (2*x + 3)} > x<-seq(0,5,by=0.01) > lines(x,fn(x)) Histogram of z z Density 0 1 2 3 4 5 0.000.100.200.30 16 / 53
  • 17. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Rejection sampling If F−1(u) can’t be computed, we sample from f (x) as follows: 1 Sample a value z from a distribution g(z) from which sampling is easy, and for which mg(z) > f (z) m a constant (7) mg(z) is called an envelope function because it envelops f (z). 2 Compute the ratio R = f (z) mg(z) (8) 3 Sample u ∼ Unif (0,1). 4 If R > u, then z is treated as a draw from f (x). Otherwise return to step 1. 17 / 53
  • 18. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Rejection sampling 1 For example, consider f(x) as above: f (x) = 1 40(2x +3), with 0 < x < 5. The maximum height of f (x) is 0.325. 2 So we need an envelope function that exceeds 0.325. The uniform density Unif (0,5) has maximum height 0.2, so if we multiply it by 2 we have maximum height 0.4, which is greater than 0.325. 0 1 2 3 4 5 0.00.20.4 x fn(x) 18 / 53
  • 19. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Rejection sampling In the first step, we sample a number x from a uniform distribution Unif(0,5). This serves to locate a point on the x-axis between 0 and 5 (the domain of x). > x<-runif(1,0,5) [1] 4.693517 Keep in mind that even though you can’t sample from the function f(x), you can evaluate it for any x: > fn(x) [1] 0.3096758 19 / 53
  • 20. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Rejection sampling The next step involves locating a point in the y direction once the x coordinate is fixed. If we draw a number u from Unif(0,1): > u<-runif(1,0,1) [1] 0.4401601 then mg(x)u = 2∗0.2u is a number between 0 and 2×0.2 = 0.4: > 0.4*u [1] 0.176064 > 0.4*u < fn(x) [1] TRUE If this number is less than f(x), that means that the y value falls within f(x), so we accept it, else reject. 20 / 53
  • 21. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Rejection sampling > count<-0 > k<-1 > accepted<-rep(NA,1000) > rejected<-rep(NA,1000) > while(k<1001) + { + z<-runif(1,min=0,max=5) + r<-((1/40)*(2*z+3))/(2*.2) + if(r>runif(1,min=0,max=1)) { + accepted[k]<-z + k<-k+1} else { + rejected[k]<-z + } + count<-count+1 + } 21 / 53
  • 22. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Rejection sampling Example of rejection sampling Density 0 1 2 3 4 5 0.000.050.100.150.200.250.30 22 / 53
  • 23. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Some limitations of rejection sampling 1 Finding an envelope function may be difficult. 2 The acceptance rate would be low if the constant m is set too high and/or if the envelope function is too high relative to f(x), making the algorithm inefficient. 23 / 53
  • 24. Bayesian LMMs: Practical issues Monte Carlo Markov Chain sampling Two simple sampling methods Summary so far 1 We now know what a Markov Chain is. 2 We know that there are several methods to sample from a distribution (we saw two). Next, I will talk about a sampling method using in JAGS (Just Another Gibbs Sampler). 24 / 53
  • 25. Bayesian LMMs: Practical issues Gibbs sampling Gibbs Sampling Goal: sample from a posterior distribution of some parameter(s). Let Θ be the vector of parameter values, say Θ =< θ1,θ2 >. Let j index the j-th iteration. Algorithm: 1 Assign starting values to Θ =< θ1,θ2 >: Θj=0 ← S 2 Set j ← j +1 3 1. Sample θj 1 | θj−1 2 (e.g., using inversion sampling). 2. Sample θj 2 | θj 1 (e.g., using inversion sampling). 4 Return to step 1. Simplifying a bit, eventually, we will start sampling the parameter values from the correct distribution—we will get sample from the posterior distribution. Of course, I didn’t explain here why this works. See Lynch book for details. 25 / 53
  • 26. Bayesian LMMs: Practical issues Gibbs sampling Gibbs sampling: Example Sample two random variables from a bivariate normal, where X ∼ N(0,1) and Y ∼ N(0,1). 26 / 53
  • 27. Bayesian LMMs: Practical issues Gibbs sampling Gibbs sampling: Example See MC walk graphic. 27 / 53
  • 28. Bayesian LMMs: Practical issues Gibbs sampling Gibbs sampling: Example We can plot the“trace plot”of each density, as well as the marginal density: see traceplot graphic. 28 / 53
  • 29. Bayesian LMMs: Practical issues Gibbs sampling Gibbs sampling: Example The trace plots show the points that were sampled. The initial values were not realistic, and it could be that the first few iterations do not really yield representative samples. We discard these, and the initial period is called“burn-in”(JAGS) or“warm-up”(Stan). The two dimensional trace plot traces the Markov chain walk (see Markov chain walk figure) 29 / 53
  • 30. Bayesian LMMs: Practical issues Gibbs sampling Gibbs sampling: Example We can also summarize the marginal distributions. One way is to compute the 95% credible interval, and the mean or median. That’s what we have been doing in the last few lectures. 30 / 53
  • 31. Bayesian LMMs: Practical issues Gibbs sampling Key point regarding MCMC sampling 1 As long as we know the posterior distribution up to proportionality, we can sample from the posterior: Posterior ∝ Lik Prior 2 There are other sampling methods, such as Metropolis-Hastings, and Hamiltonian MC (Stan uses this). 3 As end-users, we do not need to be able to program these ourselves (except for fun). 31 / 53
  • 32. Bayesian LMMs: Practical issues Gibbs sampling Recall the JAGS commands > headnoun.mod <- jags.model( file="gwmaximal.jag", data = headnoun.dat, n.chains = 4, n.adapt =2000 , quiet=T) 1 n.chains: Number of independent chains to run (always have more than 1, I use 4 usually) 2 a.adapt: the initial period (to be ignored). 32 / 53
  • 33. Bayesian LMMs: Practical issues Gibbs sampling Recall the JAGS commands headnoun.res <- coda.samples(headnoun.mod, var = track.variables, n.iter = 10000, thin = 1) 1 n.iter: Number of iterations (walks) in an MCMC chain. 2 thin: Reduce number of MCMC sequences from posterior (helps in plotting less dense plots). thin=1 means keep all of them. thin=2 means take every alternative one, etc. 33 / 53
  • 34. Bayesian LMMs: Practical issues Evaluating the model Evaluating the model The first thing we want to do is find out whether our model has converged: whether the MCMC sampler is sampling from the posterior distribution. The function gelman.diag will give you this information, but the traceplots are also informative: gelman.diag() [See lecture 3, slide 20, for usage] 34 / 53
  • 35. Bayesian LMMs: Practical issues Evaluating the model Evaluating model fit Recall Gibson and Wu data. We can examine quality of fit by generating predicted reading times, and comparing them with 1 the actual data 2 new data (much better) We present a detailed example of Gibson and Wu data in Sorensen and Vasishth (submitted to Journal of Mathematical Psychology): http://www.ling.uni- potsdam.de/∼vasishth/statistics/BayesLMMs.html 35 / 53
  • 36. Bayesian LMMs: Practical issues Evaluating the model Evaluating model fit Using the log normal to model the Gibson and Wu data The data clearly seem to be log-normally distributed: subj. relative log normal rt (msec) Density 0 2000 5000 0.00000.00100.0020 obj. relative log normal rt (msec) Density 0 2000 5000 0.00000.00100.0020 subj. relative normal rt (msec) Density −6000 0 4000 0.00000.00100.0020 obj. relative normal rt (msec) Density −6000 0 4000 0.00000.00100.0020 36 / 53
  • 37. Bayesian LMMs: Practical issues Evaluating the model Evaluating model fit Using the log normal to model the Gibson and Wu data In Stan we simply need to write: rt ~ lognormal(mu,sigma) subj. relative log normal rt (msec) Density 0 2000 5000 0.00000.00100.0020 obj. relative log normal rt (msec) Density 0 2000 5000 0.00000.00100.0020 subj. relative normal rt (msec) Density −6000 0 4000 0.00000.00100.0020 obj. relative normal rt (msec) Density −6000 0 4000 0.00000.00100.0020 37 / 53
  • 38. Bayesian LMMs: Practical issues Evaluating the model Evaluating model fit Using new data from Vasishth et al 2013 These extreme values turn up again in the subject relative condition. Either we could include a mixture of models in the data, or we remove these extreme values as irrelevant to the research question. subj. relative Density 0 3000 7000 11000 obj. relative 1000 3000 38 / 53
  • 39. Bayesian LMMs: Practical issues Model assumptions Checking model assumptions The best model in the Gibson and Wu data-set is: > m0<-lmer(rrt~x+(1|subj)+(1|item),headnoun) 39 / 53
  • 40. Bayesian LMMs: Practical issues Model assumptions Checking model assumptions > library(car) > qqPlot(residuals(m0)) −3 −2 −1 0 1 2 3 −2−10123 norm quantiles residuals(m0) 40 / 53
  • 41. Bayesian LMMs: Practical issues Model assumptions Checking model assumptions What would have happened if. . . we had used raw RTs? > m1<-lmer(rt~x+(1|subj)+(1|item),headnoun) > qqPlot(residuals(m1)) −3 −2 −1 0 1 2 3 020004000 norm quantiles residuals(m1) 41 / 53
  • 42. Bayesian LMMs: Practical issues Model assumptions A bad consequence of ignoring model assumptions The published result in Gibson and Wu is driven by 8 out of 547 extreme values > 3000ms: Fixed effects: Estimate Std. Error t value (Intercept) 548.43 51.56 10.638 x -120.39 48.01 -2.508 Normalizing the residuals leads to no effect at all: Estimate Std. Error t value (Intercept) -2.67217 0.13758 -19.423 x -0.07494 0.08172 -0.917 Removing the 8/547 extreme values in raw RT: Estimate Std. Error t value (Intercept) 494.59 34.09 14.509 x -12.66 29.94 -0.423 42 / 53
  • 43. Bayesian LMMs: Practical issues Model assumptions Gibson and Wu: the published result > head.means<-aggregate(rt~subj+type, + mean,data=headnoun) > Now, after aggregation, we can run a paired t-test: > t_test_result<-t.test(subset(head.means,type=="subj-ext") + subset(head.means,type=="obj-ext")$rt,paired=T) > t_test_result$statistic;t_test_result$p.value t 2.63007 [1] 0.01248104 The difference was significant. (They did an ANOVA, but it’s the same thing, as t2 = F.) 43 / 53
  • 44. Bayesian LMMs: Practical issues Model assumptions The lesson in this example The lesson here is to 1 first examine your data graphically 2 check whether model assumptions are satisfied Fitting models is not just about finding statistical significance, we are also 1 defining how we think the data were generated 2 generating predictions about what future data will look like. 44 / 53
  • 45. Bayesian LMMs: Practical issues Model assumptions Non-normality of residuals can lead to loss of power I showed an example where extreme values (non-normality) lead to an over-enthusiastic p-value. Here is another consequence: > ## number of simulations: > nsim<-100 > ## sample size: > n<-100 > ## predictor: > pred<-rep(c(0,1),each=n/2) > ## matrices for storing results: > store<-matrix(0,nsim,4) > storenorm<-matrix(0,nsim,4) > ## true effect: > beta.1<-0.5 45 / 53
  • 46. Bayesian LMMs: Practical issues Model assumptions Non-normality of residuals can lead to loss of power > for(i in 1:nsim){ + errors<-rlnorm(n) + errorsnorm<-rnorm(n) + #errors<-errors-mean(errors) + #errorsnorm<-errorsnorm-mean(errorsnorm) + y<-5 + beta.1*pred + errors ## generate data: + ynorm<-5 + beta.1*pred + errorsnorm + fm<-lm(y~pred) + fmnorm<-lm(ynorm~pred) + ## store coef., SE, t-value, p-value: + store[i,1:4] <- summary(fm)$coef[2,1:4] + storenorm[i,1:4] <- summary(fmnorm)$coef[2,1:4] + } 46 / 53
  • 47. Bayesian LMMs: Practical issues Model assumptions Non-normality of residuals can lead to loss of power > ## observed power for raw scores: > mean(store[,4]<0.05) [1] 0.27 > mean(storenorm[,4]<0.05) [1] 0.64 47 / 53
  • 48. Bayesian LMMs: Practical issues Model assumptions The cost of non-normality: loss of power 200 400 600 800 1000 0.10.20.30.40.50.6 why violating normality is bad for null results sample size lossofpower 48 / 53
  • 49. Bayesian LMMs: Practical issues Model assumptions Non-normality of residuals can lead to loss of power > library(car) > op<-par(mfrow=c(1,2),pty="s") > qqPlot(residuals(fm)) > qqPlot(residuals(fmnorm)) −3 −1 1 2 3 0102030 norm quantiles residuals(fm) −3 −1 1 2 3 −3−1123 norm quantiles residuals(fmnorm) 49 / 53
  • 50. Bayesian LMMs: Practical issues Model assumptions What transform to use? Box-Cox method to find the right transform: > library(MASS) > boxcox(rt~type*subj,data=headnoun) −2 −1 0 1 2 −2500−2000−1500 λ log−Likelihood 95% 50 / 53
  • 51. Bayesian LMMs: Practical issues Model assumptions What transform to use? 1 A λ of -1 implies a reciprocal transform. 2 A λ of 0 implies a log transform. We generally use the reciprocal or log transform. See Venables and Ripley 2002 for details (or Box and Cox, 1964). 51 / 53
  • 52. Bayesian LMMs: Practical issues Concluding remarks Concluding remarks 1 When we fit frequentist models, we mistakenly believe that rejecting a null hypothesis gives us evidence for the alternative. Rejecting a null only gives us evidence against the null, not evidence for the alternative. 2 We don’t check model assumptions, leading to incorrect inferences. 3 We make binary decisions: “reject”and even worse,“accept”the null hypothesis. The bigger problem here is that we feel like we have to make a binary decision. 52 / 53
  • 53. Bayesian LMMs: Practical issues Concluding remarks Concluding remarks 1 All this leads to a false sense of security: we say things like “we know that X is true.” 2 The key problem is that we should be framing the discussion in terms of how certain we are that X is true, given data. The Bayesian toolkit provides us with a method to talk about how sure we are about X being true. This does not mean that we should abandon frequentist methods; rather, we should use the methods correctly, and we should use all methods available to us. 53 / 53