Upcoming SlideShare
×

# Bayesian Core: Chapter 8

1,078 views

Published on

These are the slides for the final Chapter 8 of Bayesian Core (2007)

Published in: Education, Technology
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,078
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
52
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Bayesian Core: Chapter 8

1. 1. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image analysis 7 Image analysis Computer Vision and Classiﬁcation Image Segmentation 413 / 458
2. 2. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning communities as a quick way to classify objects into predeﬁned groups. This approach requires a training dataset where both the class y and the vector x of characteristics (or covariates) of each observation are known. 414 / 458
3. 3. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation The k-nearest-neighbor method The training dataset (yi , xi )1≤i≤n is used by the k-nearest-neighbor procedure to predict the value of yn+1 given a new vector of covariates xn+1 in a very rudimentary manner. The predicted value of yn+1 is simply the most frequent class found amongst the k nearest neighbors of xn+1 in the set (xi )1≤i≤n . 415 / 458
4. 4. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation The k-nearest-neighbor method The classical knn procedure does not involve much calibration and requires no statistical modeling at all! There exists a Bayesian reformulation. 416 / 458
5. 5. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation vision dataset (1) vision: 1373 color pictures, described by 200 variables rather than by the whole table of 500 × 375 pixels Four classes of images: class C1 for motorcycles, class C2 for bicycles, class C3 for humans and class C4 for cars 417 / 458
6. 6. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation vision dataset (2) Typical issue in computer vision problems: build a classiﬁer to identify a picture pertaining to a speciﬁc topic without human intervention We use about half of the images (648 pictures) to construct the training dataset and we save the 689 remaining images to test the performance of our procedures. 418 / 458
7. 7. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation vision dataset (3) 419 / 458
8. 8. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation A probabilistic version of the knn methodology (1) Symmetrization of the neighborhood relation: If xi belongs to the k-nearest-neighborhood of xj and if xj does not belong to the k-nearest-neighborhood of xi , the point xj is added to the set of neighbors of xi Notation: i ∼k j The transformed set of neighbors is then called the symmetrized k-nearest-neighbor system. 420 / 458
9. 9. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation A probabilistic version of the knn methodology (2) 421 / 458
10. 10. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation A probabilistic version of the knn methodology (3)   exp β ICj (yℓ ) Nk  ℓ∼k i P(yi = Cj |y−i , X, β, k) =  , G exp β ICg (yℓ ) Nk  g=1 ℓ∼k i Nk being the average number of neighbours over all xi ’s, β > 0 and y−i = (y1 , . . . , yi−1 , yi+1 , . . . , yn ) and X = {x1 , . . . , xn } . 422 / 458
11. 11. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation A probabilistic version of the knn methodology (4) β grades the inﬂuence of the prevalent neighborhood class ; the probabilistic knn model is conditional on the covariate matrix X ; the frequencies n1 /n, . . . , nG /n of the classes within the training set are representative of the marginal probabilities p1 = P(yi = C1 ), . . . , pG = P(yi = CG ): If the marginal probabilities pg are known and diﬀerent from ng /n =⇒ reweighting the various classes according to their true frequencies 423 / 458
12. 12. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation Bayesian analysis of the knn probabilistic model From a Bayesian perspective, given a prior distribution π(β, k) with support [0, βmax ] × {1, . . . , K}, the marginal predictive distribution of yn+1 is P(yn+1 = Cj |xn+1 , y, X) = K βmax P(yn+1 = Cj |xn+1 , y, X, β, k)π(β, k|y, X) dβ , k=1 0 where π(β, k|y, X) is the posterior distribution given the training dataset (y, X). 424 / 458
13. 13. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation Unknown normalizing constant Major diﬃculty: it is impossible to compute f (y|X, β, k) Use instead the pseudo-likelihood made of the product of the full conditionals G f (y|X, β, k) = P(yi = Cg |y−i , X, β, k) g=1 yi =Cg 425 / 458
14. 14. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation Pseudo-likelihood approximation Pseudo-posterior distribution π(β, k|y, X) ∝ f (y|X, β, k)π(β, k) Pseudo-predictive distribution P(yn+1 = Cj |xn+1 , y, X) = K βmax P(yn+1 = Cj |xn+1 , y, X, β, k)π(β, k|y, X) dβ . k=1 0 426 / 458
15. 15. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation MCMC implementation (1) MCMC approximation is required Random walk Metropolis–Hastings algorithm Both β and k are updated using random walk proposals: (i) for β, logistic tranformation β (t) = βmax exp(θ(t) ) (exp(θ(t) ) + 1) , in order to be able to simulate a normal random walk on the θ’s, ˜ θ ∼ N (θ(t) , τ 2 ); 427 / 458
16. 16. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation MCMC implementation (2) (ii) for k, we use a uniform proposal on the r neighbors of k (t) , namely on {k (t) − r, . . . , k (t) − 1, k (t) + 1, . . . k (t) + r} ∩ {1, . . . , K}. Using this algorithm, we can thus derive the most likely class associated with a covariate vector xn+1 from the approximated probabilities, M 1 P yn+1 = l|xn+1 , y, x, (β (i) , k (i) ) . M i=1 428 / 458
17. 17. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Computer Vision and Classiﬁcation MCMC implementation (3) 2000 8 1500 Frequency 7 β 1000 6 500 5 0 0 5000 10000 15000 20000 5.0 5.5 6.0 6.5 7.0 7.5 Iterations 15000 40 30 10000 Frequency k 20 5000 10 0 0 5000 10000 15000 20000 4 6 8 10 12 14 16 Iterations βmax = 15, K = 83, τ 2 = 0.05 and r = 1 k sequences for the knn Metropolis–Hastings β based on 20, 000 iterations 429 / 458
18. 18. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Image Segmentation This underlying structure of the “true” pixels is denoted by x, while the observed image is denoted by y. Both objects x and y are arrays, with each entry of x taking a ﬁnite number of values and each entry of y taking real values. We are interested in the posterior distribution of x given y provided by Bayes’ theorem, π(x|y) ∝ f (y|x)π(x). The likelihood, f (y|x), describes the link between the observed image and the underlying classiﬁcation. 430 / 458
19. 19. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Menteith dataset (1) The Menteith dataset is a 100 × 100 pixel satellite image of the lake of Menteith. The lake of Menteith is located in Scotland, near Stirling, and oﬀers the peculiarity of being called “lake” rather than the traditional Scottish “loch.” 431 / 458
20. 20. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Menteith dataset (2) 100 80 60 40 20 20 40 60 80 100 Satellite image of the lake of Menteith 432 / 458
21. 21. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Random Fields If we take a lattice I of sites or pixels in an image, we denote by i ∈ I a coordinate in the lattice. The neighborhood relation is then denoted by ∼. A random ﬁeld on I is a random structure indexed by the lattice I, a collection of random variables {xi ; i ∈ I} where each xi takes values in a ﬁnite set χ. 433 / 458
22. 22. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Markov Random Fields Let xn(i) be the set of values taken by the neighbors of i. A random ﬁeld is a Markov random ﬁeld (MRF) if the conditional distribution of any pixel given the other pixels only depends on the values of the neighbors of that pixel; i.e., for i ∈ I, π(xi |x−i ) = π(xi |xn(i) ) . 434 / 458
23. 23. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Ising Models If pixels of the underlying (true) image x can only take two colors (black and white, say), x is then binary, while y is a grey-level image. We typically refer to each pixel xi as being foreground if xi = 1 (black) and background if xi = 0 (white). We have π(xi = j|x−i ) ∝ exp(βni,j ) , β > 0, where ni,j = ℓ∈n(i) Ixℓ =j is the number of neighbors of xi with color j. 435 / 458
24. 24. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Ising Models The Ising model is deﬁned via full conditionals exp(βni,1 ) π(xi = 1|x−i ) = , exp(βni,0 ) + exp(βni,1 ) and the joint distribution satisﬁes   π(x) ∝ exp β Ixj =xi  , j∼i where the summation is taken over all pairs (i, j) of neighbors. 436 / 458
25. 25. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Simulating from the Ising Models The normalizing constant of the Ising Model is intractable except for very small lattices I... Direct simulation of x is not possible! 437 / 458
26. 26. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Ising Gibbs Sampler Algorithm (Ising Gibbs Sampler) Initialization: For i ∈ I, generate independently (0) xi ∼ B(1/2) . Iteration t (t ≥ 1): 1 Generate u = (ui )i∈I , a random ordering of the elements of I. (t) (t) 2 For 1 ≤ ℓ ≤ |I|, update nuℓ ,0 and nuℓ ,1 , and generate (t) exp(βnuℓ ,1 ) x(t) ∼ B uℓ (t) (t) . exp(βnuℓ ,0 ) + exp(βnuℓ ,1 ) 438 / 458
27. 27. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Ising Gibbs Sampler (2) 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 20 20 20 20 40 40 40 40 40 60 60 60 60 60 80 80 80 80 80 100 100 100 100 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 20 20 20 20 40 40 40 40 40 60 60 60 60 60 80 80 80 80 80 100 100 100 100 100 Simulations from the Ising model with a four-neighbor neighborhood structure on a 100 × 100 array after 1, 000 iterations of the Gibbs sampler: β varies in steps of 0.1 from 0.3 to 1.2. 439 / 458
28. 28. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Potts Models If there are G colors and if ni,g denotes the number of neighbors of i ∈ I with color g (1 ≤ g ≤ G) (that is, ni,g = j∼i Ixj =g ) In that case, the full conditional distribution of xi is chosen to satisfy π(xi = g|x−i ) ∝ exp(βni,g ). This choice corresponds to the Potts model, whose joint density is given by   π(x) ∝ exp β Ixj =xi  . j∼i 440 / 458
29. 29. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Simulating from the Potts Models (1) Algorithm (Potts Metropolis–Hastings Sampler) Initialization: For i ∈ I, generate independently (0) xi ∼ U ({1, . . . , G}) . Iteration t (t ≥ 1): 1 Generate u = (ui )i∈I a random ordering of the elements of I. 2 For 1 ≤ ℓ ≤ |I|, generate x(t) ∼ U ({1, xuℓ − 1, xuℓ + 1, . . . , G}) , ˜ uℓ (t−1) (t−1) 441 / 458
30. 30. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Simulating from the Potts Models (2) Algorithm (continued) (t) compute the nul ,g and (t)(t) ρl = exp(βnuℓ ,˜ )/ exp(βnuℓ ,xu ) ∧ 1 , x ℓ (t) and set xuℓ equal to xuℓ with probability ρl . ˜ 442 / 458
31. 31. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Simulating from the Potts Models (3) 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 20 20 20 20 40 40 40 40 40 60 60 60 60 60 80 80 80 80 80 100 100 100 100 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 20 20 20 20 40 40 40 40 40 60 60 60 60 60 80 80 80 80 80 100 100 100 100 100 Simulations from the Potts model with four grey levels and a four-neighbor neighborhood structure based on 1000 iterations of the Metropolis–Hastings sampler. The parameter β varies in steps of 0.1 from 0.3 to 1.2. 443 / 458
32. 32. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Posterior Inference (1) The prior on x is a Potts model with G categories,   1 π(x|β) = exp β Ixj =xi  , Z(β) i∈I j∼i where Z(β) is the normalizing constant. Given x, we assume that the observations in y are independent normal random variables, 1 1 f (y|x, σ 2 , µ1 , . . . , µG ) = exp − 2 (yi − µxi )2 . (2πσ 2 )1/2 2σ i∈I 444 / 458
33. 33. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Posterior Inference (2) Priors β ∼ U ([0, 2]) , µ = (µ1 , . . . , µG ) ∼ U ({µ ; 0 ≤ µ1 ≤ . . . ≤ µG ≤ 255}) , π(σ 2 ) ∝ σ −2 I]0,∞[ (σ 2 ) , the last prior corresponding to a uniform prior on log σ. 445 / 458
34. 34. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Posterior Inference (3) Posterior distribution   1 π(x, β, σ 2 , µ|y) ∝ π(β, σ 2 , µ) × exp β Ixj =xi  Z(β) i∈I j∼i 1 −1 × exp (yi − µxi )2 . (2πσ 2 )1/2 2σ 2 i∈I 446 / 458
35. 35. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Full conditionals (1)    1  P(xi = g|y, β, σ 2 , µ) ∝ exp β Ixj =g − (yi − µg )2 ,  2σ 2  j∼i can be simulated directly. ng = Ixi =g and sg = Ixi =g yi i∈I i∈I the full conditional distribution of µg is a truncated normal distribution on [µg−1 , µg+1 ] with mean sg /ng and variance σ 2 /ng 447 / 458
36. 36. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Full conditionals (2) The full conditional distribution of σ 2 is an inverse gamma distribution with parameters |I|2 /2 and i∈I (yi − µxi )2 /2. The full conditional distribution of β is such that   1 π(β|y) ∝ exp β Ixj =xi  . Z(β) i∈I j∼i 448 / 458
37. 37. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Path sampling Approximation (1) Z(β) = exp {βS(x)} , x where S(x) = i∈I j∼i Ixj =xi dZ(β) exp(βS(x)) = Z(β) S(x) dβ x Z(β) = Z(β) Eβ [S(X)] , d log Z(β) = Eβ [S(X)] . dβ 449 / 458
38. 38. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Path sampling Approximation (2) Path sampling identity β1 log {Z(β1 )/Z(β0 )} = Eβ [S(x)]dβ . β0 450 / 458
39. 39. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Path sampling Approximation (3) For a given value of β, Eβ [S(X)] can be approximated from an MCMC sequence. The integral itself can be approximated by computing the value of f (β) = Eβ [S(X)] for a ﬁnite number of values of β and ˆ approximating f (β) by a piecewise-linear function f (β) for the intermediate values of β. 451 / 458
40. 40. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Path sampling Approximation (3) 40000 35000 30000 25000 20000 15000 10000 0.0 0.5 1.0 1.5 2.0 Approximation of f (β) for the Potts model on a 100 × 100 image, a four-neighbor neighborhood, and G = 6, based on 1500 MCMC iterations after burn-in. 452 / 458
41. 41. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Posterior Approximation (1) 35.0 58.5 59.0 59.5 60.0 60.5 34.0 33.0 32.0 0 500 1000 1500 0 500 1000 1500 70.0 70.5 71.0 71.5 72.0 72.5 83.0 83.5 84.0 84.5 85.0 0 500 1000 1500 0 500 1000 1500 113.5 95.0 95.5 96.0 96.5 97.0 97.5 112.5 111.5 110.5 0 500 1000 1500 0 500 1000 1500 Dataset Menteith: Sequence of µg ’s based on 2000 iterations of the hybrid Gibbs sampler (read row-wise from µ1 to µ6 ). 453 / 458
42. 42. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Posterior Approximation (2) 100 40 80 30 60 20 40 10 20 0 0 32.0 32.5 33.0 33.5 34.0 34.5 35.0 58.5 59.0 59.5 60.0 60.5 60 50 50 40 40 30 30 20 20 10 10 0 0 70.0 70.5 71.0 71.5 72.0 72.5 83.0 83.5 84.0 84.5 85.0 100 50 80 40 60 30 40 20 20 10 0 0 95.0 95.5 96.0 96.5 97.0 97.5 110.5 111.0 111.5 112.0 112.5 113.0 113.5 Histograms of the µg ’s 454 / 458
43. 43. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Posterior Approximation (3) 120 20.5 100 20.0 80 60 19.5 40 19.0 20 18.5 0 0 500 1000 1500 18.5 19.0 19.5 20.0 20.5 1.295 1.300 1.305 1.310 1.315 1.320 100 120 80 60 40 20 0 0 500 1000 1500 1.295 1.300 1.305 1.310 1.315 1.320 Raw plots and histograms of the σ 2 ’s and β’s based on 2000 iterations of the hybrid Gibbs sampler 455 / 458
44. 44. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Image segmentation (1) Based on (x(t) )1≤t≤T , an estimator of x needs to be derived from an evaluation of the consequences of wrong allocations. Two common ways corresponding to two loss functions L1 (x, x) = Ixi =ˆi , x i∈I L2 (x, x) = Ix=b , x Corresponding estimators xM P M = arg max Pπ (xi = g|y) , ˆi i∈I, 1≤g≤G xM AP = arg max π(x|y) , x 456 / 458
45. 45. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Image segmentation (2) xM P M and xM AP not available in closed form! Approximation N xM P M ˆi = max Ix(j) =g , g∈{1,...,G} i j=1 based on a simulated sequence, x(1) , . . . , x(N ) . 457 / 458
46. 46. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Image analysis Image Segmentation Image segmentation (3) 100 80 60 40 20 20 40 60 80 100 100 80 60 40 20 20 40 60 80 100 (top) Segmented image based on the MPM estimate produced after 2000 iterations of the Gibbs sampler and (bottom) the observed image 458 / 458