This document provides a unifying review of linear Gaussian models. It discusses both continuous and discrete state static and dynamic models. The basic models involve hidden states that evolve linearly over time with Gaussian noise and observations of these states that are also corrupted by Gaussian noise. Common techniques like Kalman filtering, expectation-maximization, and principal component analysis are described as special cases of inference and learning solutions to these basic models. The review highlights relationships between models and opportunities for future extensions to other distributions or mixture state formulations.
3. The Basic Model
Basic
Model
Continuous-
state
A = 0
R diagonal
Factor
Analysis
𝑅
= 𝑙𝑖𝑚 𝜖→0 𝜖 𝐼
PCA
𝑅 = 𝜖 𝐼 SPCA
A ≠ 0
Kalman
Filter
Discrete-
state
A = 0
𝑅
= 𝑙𝑖𝑚 𝜖→0 𝜖 𝐼
VQ
Mixtures
Gaussian
A ≠ 0 HMM
𝑥 𝑡+1 = 𝐴𝑥 𝑡 + 𝑤𝑡 = 𝐴𝑥 𝑡 + 𝑤.
𝑦𝑡 = 𝐶𝑥 𝑡 + 𝑣 𝑡 = 𝐶𝑥 𝑡 + 𝑣.
𝑤.~𝒩 0, 𝑄 : state evolution noise
𝑣.~𝒩 0, 𝑅 : observation noise
Hidden state sequence x is an
explanation of the complicated
observation sequence y (k<<p)
4. The basic Model: Inference
Probability Computation(Why
Gaussian)
Gaussian + Gaussian = Gaussian
Filtering and smoothing
Given initial parameters:
Total likelihood:
Filtering:
Smoothing: where 𝜏 > 𝑡
5. The Basic Model: Learning
Maximize likelihood
Use solutions to filtering and smoothing
Expectation-Maximization algorithm
Y – observed data
X – hidden data
- parameters of the model
6. Continuous-state Static Modeling
Assume Q = I, without loss of generality
Inference:
Learning: identifying the matrices C and R
X – continuous ⇒ ∫
7. Continuous-state Static Modeling: Factor Analysis
Condition
R – diagonal (uniqueness element)
Q – identity matrix
C – factor loading matrix
Process
Marginal distribution:
EM provides a unified approach for
learning
12. Discrete-state Dynamic Modeling: Hidden
Markov Models
State Transition Matrix T
𝐴𝑠𝑠𝑢𝑚𝑒 𝑄 = 𝐼 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑔𝑒𝑛𝑒𝑟𝑎𝑙𝑖𝑡𝑦 𝑙𝑜𝑠𝑠
T can be equivalently modeled by A
and Q
Emission probability: same covariance
Initial Distribution ⇐ 𝑚𝑒𝑎𝑛 𝜇1
𝐴 ≠ 0,
Inference: forward-backward
algorithm
Learning: Baum-Welch algorithm
13. Network Interpretations and Regularization
Output
Layer
Hidden
Layer
Input
Layer
y. 𝑥. 𝑦.
Factor Analysis:
Tying recognition weights to
generative weights; minimizing cost
function.
Mixtures of Gaussians:
Linear recognition followed by a soft
max nonlinearity; minimizing cost
function.
Recognition Generative
14. Summary & Reflection
Pros & Cons
Advantages:
Computation simplification
Highlights relationships between
models
Natural extensions
Disadvantages:
Not most efficient
Future Work
• Explore other distributions
• Develop mixture-state model
• Spatially adaptive observation
noise for continuous-state
models
Inference
•Input: observed data,
measurements
•Output: parameters of
unobserved states.
Learning
•Initialize parameters
•E-step: use inference solutions
•M-step: update parameters
•Output updated parameters
15. References
Roweis, S. and Ghahramani, Z. (1999). A Unifying Review of Linear Gaussian
Models. Neural Computation, 11(2), pp.305-345.
En.wikipedia.org. (2017). Independent component analysis. [online] Available
at: https://en.wikipedia.org/wiki/Independent_component_analysis [Accessed
18 Jan. 2017].
En.wikipedia.org. (2017). Kalman filter. [online] Available at:
https://en.wikipedia.org/wiki/Kalman_filter [Accessed 18 Jan. 2017].
En.wikipedia.org. (2017). Kalman filter. [online] Available at:
https://en.wikipedia.org/wiki/Kalman_filter#/media/File:Basic_concept_of_Kal
man_filtering.svg [Accessed 18 Jan. 2017].