A Unifying Review of Linear Gaussian Models1 
Sam Roweis, Zoubin Ghahramani 
Feynman Liang 
Application #: 10342444 
November 11, 2014 
1Roweis, Sam, and Zoubin Ghahramani. A Unifying Review of Linear Gaussian 
Models." Neural Computation 11.2 (1999): 305{45. Print. 
F. Liang Linear Gaussian Models Nov 2014 1 / 18
Motivation 
Many super
cially disparate models: : : 
(a) Factor Analysis (b) PCA 
(c) Mixture of Gaussians (d) Hidden Markov Models 
F. Liang Linear Gaussian Models Nov 2014 2 / 18
Outline 
Basic model 
Inference and learning 
problems 
EM algorithm 
Various specializations of 
the basic model 
Factor Analysis 
R = lim!0 I 
SPCA 
PCA 
Kalman Filter 
Gaussian Mixture Model 
1-NN 
HMM 
cts state A = 0 R diag 
R = I 
A6= 0 
discrete state 
A = 0 R = lim!0 R0 
A6= 0 
F. Liang Linear Gaussian Models Nov 2014 3 / 18
The Basic (Generative) Model 
Goal: Model P(fxtgt 
=1; fytgt 
=1) 
Assumptions: 
Linear dynamics, additive Gaussian 
noise 
xt+1 = Axt + w; w  N(0;Q) 
yt = Cxt + v; v  N(0; R) 
wlog E[w] = E[v] = 0 
Markov property 
Time homogeneity 
w 
+ 
xt xt+1 
yt 
v 
A 
C 
+ 
t 
Figure: The Basic Model as a DBN 
P(fxtgt 
=1; fytgt 
=1) = P(x1) 
Y1 
t=1 
P(xt+1jxt ) 
Y 
t=1 
P(yt jxt ) 
F. Liang Linear Gaussian Models Nov 2014 4 / 18
Why Gaussians? 
Gaussian family closed under ane transforms 
x  N(x ;x ); y  N(y ;y ); a; b; c 2 R 
=) ax + by + c  N(ax + by + c; a2x + b2y ) 
Gaussian is conjugate prior for Gaussian likelihood 
P(x) Normal; P(yjx) Normal =) P(xjy) Normal 
F. Liang Linear Gaussian Models Nov 2014 5 / 18
The Inference Problem 
Given the system model and initial distribution (fA; C;Q; R; 1;Q1g): 
Filtering: P(xt jfyigti 
=1) 
Smoothing: P(xt jfyigi 
=1) where   t 
If we had the partition function: 
P(fyigi=1) = 
Z 
8fxi gi 
=1 
P(fxig; fyig)dfxig 
Then 
P(xt jfyigi 
=1) = 
P(fxig; fyig) 
P(fyig) 
F. Liang Linear Gaussian Models Nov 2014 6 / 18
The Learning Problem 
Let  = fA; C;Q; R; 1;Q1g, X = fxigi 
=1, Y = fyigi 
=1. 
Given (several) observable sequences Y : 
arg max L() = arg max log P(Y j) 
Solved by expectation maximization. 
F. Liang Linear Gaussian Models Nov 2014 7 / 18
Expectation Maximixation 
For any distribution Q on Sx : 
L()  F(Q; ) = 
Z 
X 
Q(X) log P(X; Y j)  
Z 
X 
Q(X) logQ(X)dX 
= L() + H(Q; P(jY ; ))  H(Q) 
= L()  DKL(QjjP(jY ; )) 
Monotonically increasing coordinate ascent on F(Q; ): 
E step: Qk+1   arg maxQ F(Q; k ) = P(XjY ; k ) 
M step: k+1   arg max F(Qk+1; ) 
F. Liang Linear Gaussian Models Nov 2014 8 / 18
Continuous-State Static Modeling 
Assumptions: 
x is continuously supported 
A = 0 
x = w  N(0;Q) =) y = Cx + v  N(0;CQCT + R) 
wlog Q = I 
Ecient Inference Using Sucient Statistics: Gaussian is conjugate 
prior for Gaussian likelihood, so 
P(xjy) = N(
y; I
C);
= CT (CCT + R)1 
Learning: R must be constrained to avoid degenerate solution. . . 
F. Liang Linear Gaussian Models Nov 2014 9 / 18
Continuous-State Static Modeling: Factor Analysis 
y = Cx + v  N(0; CCT + R) 
Additional Assumption: 
R diagonal =) observation noise v independent along basis for y 
Interpretation: 
R : variance along basis 
C : correlation structure of latent factors 
Properties: 
Scale invariant 
Not rotation invariant 
F. Liang Linear Gaussian Models Nov 2014 10 / 18
Continuous-State Static Modeling: SPCA and PCA 
y = Cx + v  N(0; CCT + R) 
Additional Assumptions: 
R = I ;  2 R 
For PCA: R = lim!0 I 
Interpretation: 
 : global noise level 
Columns of C : principal components 
(optimizes three equivalent objectives) 
Properties 
Rotation invariant 
Not scale invariant 
F. Liang Linear Gaussian Models Nov 2014 11 / 18
Continuous-State Dynamic Modeling: Kalman Filters 
Relax A = 0 assumptio. 
Optimal Bayes
lter assuming linearity and normality (conjugate prior) 
F. Liang Linear Gaussian Models Nov 2014 12 / 18
Discrete-State Modeling: Winner-Takes-All (WTA) 
Non-linearity 
Assume: x discretely supported, 
R 
7! 
P 
Winner-Takes-All Non-Linearity: WTA[x] = ei where i = arg maxj xj 
xt+1 = WTA[Axt + w] w  N(;Q) 
yt = Cxt + v v  N(0; R) 
x  WTA[N(; )] de

A Unifying Review of Gaussian Linear Models (Roweis 1999)

  • 1.
    A Unifying Reviewof Linear Gaussian Models1 Sam Roweis, Zoubin Ghahramani Feynman Liang Application #: 10342444 November 11, 2014 1Roweis, Sam, and Zoubin Ghahramani. A Unifying Review of Linear Gaussian Models." Neural Computation 11.2 (1999): 305{45. Print. F. Liang Linear Gaussian Models Nov 2014 1 / 18
  • 2.
  • 3.
    cially disparate models:: : (a) Factor Analysis (b) PCA (c) Mixture of Gaussians (d) Hidden Markov Models F. Liang Linear Gaussian Models Nov 2014 2 / 18
  • 4.
    Outline Basic model Inference and learning problems EM algorithm Various specializations of the basic model Factor Analysis R = lim!0 I SPCA PCA Kalman Filter Gaussian Mixture Model 1-NN HMM cts state A = 0 R diag R = I A6= 0 discrete state A = 0 R = lim!0 R0 A6= 0 F. Liang Linear Gaussian Models Nov 2014 3 / 18
  • 5.
    The Basic (Generative)Model Goal: Model P(fxtgt =1; fytgt =1) Assumptions: Linear dynamics, additive Gaussian noise xt+1 = Axt + w; w N(0;Q) yt = Cxt + v; v N(0; R) wlog E[w] = E[v] = 0 Markov property Time homogeneity w + xt xt+1 yt v A C + t Figure: The Basic Model as a DBN P(fxtgt =1; fytgt =1) = P(x1) Y1 t=1 P(xt+1jxt ) Y t=1 P(yt jxt ) F. Liang Linear Gaussian Models Nov 2014 4 / 18
  • 6.
    Why Gaussians? Gaussianfamily closed under ane transforms x N(x ;x ); y N(y ;y ); a; b; c 2 R =) ax + by + c N(ax + by + c; a2x + b2y ) Gaussian is conjugate prior for Gaussian likelihood P(x) Normal; P(yjx) Normal =) P(xjy) Normal F. Liang Linear Gaussian Models Nov 2014 5 / 18
  • 7.
    The Inference Problem Given the system model and initial distribution (fA; C;Q; R; 1;Q1g): Filtering: P(xt jfyigti =1) Smoothing: P(xt jfyigi =1) where t If we had the partition function: P(fyigi=1) = Z 8fxi gi =1 P(fxig; fyig)dfxig Then P(xt jfyigi =1) = P(fxig; fyig) P(fyig) F. Liang Linear Gaussian Models Nov 2014 6 / 18
  • 8.
    The Learning Problem Let = fA; C;Q; R; 1;Q1g, X = fxigi =1, Y = fyigi =1. Given (several) observable sequences Y : arg max L() = arg max log P(Y j) Solved by expectation maximization. F. Liang Linear Gaussian Models Nov 2014 7 / 18
  • 9.
    Expectation Maximixation Forany distribution Q on Sx : L() F(Q; ) = Z X Q(X) log P(X; Y j) Z X Q(X) logQ(X)dX = L() + H(Q; P(jY ; )) H(Q) = L() DKL(QjjP(jY ; )) Monotonically increasing coordinate ascent on F(Q; ): E step: Qk+1 arg maxQ F(Q; k ) = P(XjY ; k ) M step: k+1 arg max F(Qk+1; ) F. Liang Linear Gaussian Models Nov 2014 8 / 18
  • 10.
    Continuous-State Static Modeling Assumptions: x is continuously supported A = 0 x = w N(0;Q) =) y = Cx + v N(0;CQCT + R) wlog Q = I Ecient Inference Using Sucient Statistics: Gaussian is conjugate prior for Gaussian likelihood, so P(xjy) = N(
  • 11.
  • 12.
  • 13.
    = CT (CCT+ R)1 Learning: R must be constrained to avoid degenerate solution. . . F. Liang Linear Gaussian Models Nov 2014 9 / 18
  • 14.
    Continuous-State Static Modeling:Factor Analysis y = Cx + v N(0; CCT + R) Additional Assumption: R diagonal =) observation noise v independent along basis for y Interpretation: R : variance along basis C : correlation structure of latent factors Properties: Scale invariant Not rotation invariant F. Liang Linear Gaussian Models Nov 2014 10 / 18
  • 15.
    Continuous-State Static Modeling:SPCA and PCA y = Cx + v N(0; CCT + R) Additional Assumptions: R = I ; 2 R For PCA: R = lim!0 I Interpretation: : global noise level Columns of C : principal components (optimizes three equivalent objectives) Properties Rotation invariant Not scale invariant F. Liang Linear Gaussian Models Nov 2014 11 / 18
  • 16.
    Continuous-State Dynamic Modeling:Kalman Filters Relax A = 0 assumptio. Optimal Bayes
  • 17.
    lter assuming linearityand normality (conjugate prior) F. Liang Linear Gaussian Models Nov 2014 12 / 18
  • 18.
    Discrete-State Modeling: Winner-Takes-All(WTA) Non-linearity Assume: x discretely supported, R 7! P Winner-Takes-All Non-Linearity: WTA[x] = ei where i = arg maxj xj xt+1 = WTA[Axt + w] w N(;Q) yt = Cxt + v v N(0; R) x WTA[N(; )] de
  • 19.
    nes a probabilityvector where i = P(x = ei ) = probability mass assigned by N(; ) to fz 2 Sx : 8j6= i : (z)i (z)jg F. Liang Linear Gaussian Models Nov 2014 13 / 18
  • 20.
    Static Discrete-State Modeling:Mixture of Gaussians and Vector Quantization x = WTA[w] w N(;Q) y = Cx + v v N(0; R) Additional Assumption: A = 0 Mixture of Gaussians: P(y) = X i P(x = ej ; y) = X i N(Ci ; R)i All Gaussians have same covariance R Inference: P(x = ej jy) = P(x = ej ; y) P(y) = PN(Cj ; R)j i N(Ci ; R)i Vector Quantization: R = lim!0 R0 F. Liang Linear Gaussian Models Nov 2014 14 / 18
  • 21.
    Dynamic Discrete-State Modeling:Hidden Markov Models xt+1 = WTA[Axt + w] w N(0;Q) yt = Cxt + v v N(0; R) Theorem Any Markov chain transition dynamics T can be equivalently modeled using A and Q in the above model and vice versa. All states have same emission covariance R Learning: EM Algorithm (Baum-Welch) Inference: Viterbi Algorithm for MAP estimate In discrete case, MAP estimate6= least-squares estimate Approaches Kalman
  • 22.
  • 23.
    ner F. LiangLinear Gaussian Models Nov 2014 15 / 18
  • 24.
    Conclusions Linearity andnormality =) computationally tractable Universal basic model generalizes idiosyncratic special cases and highlights relationships (e.g. static vs dynamic, zero noise limit, hyperparameter selection) Uni
  • 25.
    ed set ofequations and algorithms for inference and learning F. Liang Linear Gaussian Models Nov 2014 16 / 18
  • 26.
    Critique / FutureWork Critique: Uni
  • 27.
    ed algorithms notthe most ecient Can only model y with support Rp, x with support Rk or f1; : : : ; ng Future Work: Increase hierarchy beyond two levels (e.g. Speech ! n-gram ! PCFG) Relax time homogeneity assumption (e.g. Extended Kalman Filter) Extend to other distributions Try other (likelihood,conjugate prior) pairs Approximate inference (MH-MCMC) F. Liang Linear Gaussian Models Nov 2014 17 / 18
  • 28.
    References S. Roweis,Z. Ghahramani. A Unifying Review of Linear Gaussian Models. Computation and Neural Systems, 11(2):305{345, 1999. Image Attributions: http://www.robots.ox.ac.uk/ parg/projects/ica/riz/Thesis/Figs/var/MoG.jpeg https://github.com/echen/restricted-boltzmann-machines http://upload.wikimedia.org/wikipedia/commons/1/15/GaussianScatterPCA.png http://www.ee.columbia.edu/ln/LabROSA/doc/HTKBook21/img15.gif http://commons.wikimedia.org/wiki/File:Basic concept of Kalman
  • 29.