STAN_MS_PPT.pptx

Comprehensive Examination
Master’s in Statistics and Analytics
Md Abul Hayat
Graduate Assistant
Electrical Engineering
April 26, 2021

Contents
• An Introduction to Locally Linear Embedding
– Objective
– Idea
– Algorithm
– Results
• Explaining Variational Approximations
– Idea
– Algorithm
– Examples
• Q&A

An Introduction to Locally Linear Embedding
Lawrence K. Saul, Sam T. Roweis
Unpublished (2000)
Available at https://cs.nyu.edu/~roweis/lle/publications.html

Locally Linear Embedding (LLE)
• Unsupervised dimension reduction technique
• Eigenvector method for nonlinear dimensionality reduction
– Both PCA and MDS are eigenvector methods
– designed to model linear variabilities in high dimensional data
– optimizations do not involve local minima
• LLE maps high dimensional data into a system of lower dimensionality

LLE Algorithm
• Data contains 𝑁 real valued vectors 𝑋𝑖 of dimension 𝐷
• We want to minimize
• The number of neighbors 𝐾 to look for is predefined
• Assuming the data lie on or near a smooth nonlinear manifold of
dimensionality 𝑑 ≪ 𝐷
• LLE is done by choosing 𝑑 dimensional coordinates 𝑌𝑖 that minimize

LLE Algorithm
Courtesy: https://cs.nyu.edu/~roweis/lle/algorithm.html

Constrained Least Squares Problem
• Step 1:
s.t.
• Notations
• Cost Function

Constrained Least Squares Problem
• Cost Function
• Assuming,
• The cost function becomes
• Optimization

Eigenvector Problem
• Step-2
• Notation
– 𝑊𝑖 is i-th column of 𝑛 𝑥 𝑛 weight matrix 𝑊
– 𝐼𝑖 is i-th column of 𝑛 𝑥 𝑛 identity matrix 𝐼
• Using this notation

Eigenvector Problem
• This gives
• Replacing 𝑀
• The solution 𝑌 consists of 𝑑 eigenvectors of 𝑀 corresponding to 2 to 𝑑 + 1
minimum eigenvalues

Explaining Variational Approximations
John T. Ormerod, Matt P. Wand
The American Statistician (2010)

Introduction
• Variational approximations facilitate approximate inference for the
parameters in complex statistical models and provide fast, deterministic
alternatives to Monte Carlo methods
• Variational approximations are limited in their approximation accuracy
– opposed to MCMC that can be very accurate
• This paper does not discuss the quality of variational approximations
• Variational approximations can be useful for both likelihood-based and
Bayesian inference
• Topics
– Section 2: Density transform approach
– Section 3: Tangent transform approach
– Section 4: Same idea on frequentist context

Density Transform Approach
• Consider a generic Bayesian model with parameter vector 𝜃 ∈ Θ and
observed data vector 𝒚
• Posterior density function
• The denominator 𝑝(𝒚) is known as the marginal likelihood
– model evidence in the Computer Science literature
• Assuming q to be an arbitrary density function and q ∈ Θ

equality if and only if 𝑞(𝜽) = 𝑝(𝜽|𝒚)

• Exponential of Evidence Lower-bound (ELBO)
The key idea of density transform bases variationals approach is
• Approximation of the posterior density 𝑝(𝜽|𝒚) by a 𝑞(𝜽) for which 𝑝(𝒚; 𝑞) is
more tractable than 𝑝(𝒚)
• Tractability is achieved by restricting 𝑞 to a more manageable class of
densities and then maximizing 𝑝(𝒚; 𝑞) over that class
• Maximization of 𝑝(𝒚; 𝑞) is equivalent to minimization of the Kullback–Leibler
divergence between 𝑞 and 𝑝(· |𝒚)

• The most common restrictions for the q density are:
– 𝑞(𝜽) factorizes into Π𝑖=1
𝑀
𝑞𝑖(𝜽𝑖) for some partition {𝜽1, … , 𝜽𝑀} of 𝜽
• Product density transform
• Mean Field Approximation (Variational Bayes)
• Nonparametric restriction
– 𝑞 is a member of a parametric family of density functions
• Depending on the Bayesian model at hand, both restrictions can have minor
or major impacts on the resulting inference

Product Density Transforms
• ELBO under product density transform
• We also define

• ELBO under product density transform becomes
• From Result 1
• The optimal 𝑞1 is then

• Repeating the same argument for maximizing
• where E−𝜃𝑖
denotes expectation with respect to density Π𝑗≠𝑖𝑞𝑗(𝜃𝑗)
• The key thing to note is the expectation is on distribution 𝒒𝒊
• A valid alternative expression with full conditionals

Algorithm: Product Density Transforms
•

Example 1: Normal Random Sample
• Random independent sample 𝑋𝑖 from normal distribution with
𝜃 = {𝜇, 𝜎2
}
• The product density transform approximation to 𝑝(𝜇, 𝜎2
|𝒙) is
• The optimal densities take the form

• Standard manipulations lead to
• Here, where 𝒙 = 𝑋1, … , 𝑋𝑛
𝑇
and 𝑋 = (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛)/𝑛

• Optimal densities
• Also
• ELBO

• Algorithm and result

Example 2: Linear Mixed Model
• Bayesian Gaussian Linear Mixed Model
– 𝒀 and 𝜷 are a 𝑛𝑥1 and 𝑝𝑥1 vector respectively
– Variance component model
– Conjugate priors

• Tractable solution arises for two component model
• Let 𝝁𝑞 𝜷, 𝒖 and Σ𝑞(𝜷, 𝒖) be the mean and covariance of 𝑞∗ 𝜷, 𝒖
• Set 𝑪 = 𝑿 𝒁
• Markov blanket

•
• Upon convergence the approximate posteriors are:

• Longitudinal Orthodontic Measurement (Pinherio and Bates 2000)
• Model
• Comparing with
• Here

•

Example 3: Probit Regression
• Bayesian probit regression
• Likelihood
• Auxiliary variable

Example 3: Probit Regression
• Product density

Example 4: Finite Mixture Model
• Let (𝑋1, 𝑋2, ⋯ 𝑋𝑛) be univariate samples that are modeled as mixture of 𝐾
normal density functions with parameter (𝜇𝑘, 𝜎𝑘
2
)
• Auxiliary variable

Example 4: Finite Mixture Model
•
•

Parametric Density Transform
• Poisson Regression with Gaussian Transform
– Assuming 𝜷 ∼ (𝝁𝜷, 𝚺𝜷) and 𝑿 = [1 𝑥1𝑖 ⋯ 𝑥𝑘𝑖]
• Likelihood
• Marginal likelihood
• Take the 𝑞 𝛽 = 𝑁(𝝁𝑞 𝛽 , 𝚺𝑞(𝛽)) density

Tangent Transform Approach
• Work with ‘tangent-type’ representations of concave and convex functions
– The value of 𝜉 can then be chosen to make the approximation as accurate as possible.

Bayesian Logistic Regression
• Model
• Likelihood
• Assuming 𝜷 ∼ 𝝁𝜷, 𝚺𝜷 , the posterior of 𝜷 is
– Here

• Here
• Similarly
• Lower bound on 𝑝(𝒚, 𝜷)

• Maximizing the following term using EM gives us the solution

STAN_MS_PPT.pptx

Recommended

Recommended

More Related Content

Similar to STAN_MS_PPT.pptx

Similar to STAN_MS_PPT.pptx (20)

More from Md Abul Hayat

More from Md Abul Hayat (13)

Recently uploaded

Recently uploaded (20)

STAN_MS_PPT.pptx