SlideShare a Scribd company logo
Linear Discriminant Analysis under f -divergence Measures
Anmol Dwivedi Sihui Wang Ali Tajer
Department of Electrical, Computer, and Systems Engineering
Rensselaer Polytechnic Institute
ISIT 2021
Linear
Dis-
crim-
i-
nant
Anal-
y-
sis
un-
der
f -
divergence
Mea-
sures
2/21
Linear Discriminant Analysis
−2 2 6
−2
0
2
4
−2 2 6
−2
0
2
4
I Choice of direction for projection to maximize separation of data as in figure1
1
Christopher M Bishop, Pattern Recognition and Machine Learning, Springer
2 / 21
3/21
Binary Classification: Discriminant Analysis
I population sample X ∈ Rn
I objective:
observe X =⇒ classify between PA and QA where A ∈ Rr×n
s.t. error constraints
3 / 21
4/21
Motivation
I Motivation: inference in high dimensions requires forming high-dimensional statistics
I Example: Consider the likelihood ratio dP
dQ
(X) test for classification
I Challenges:
I Computationally complex optimal test for large data dimension n
I Renders the statistical-to-computational performance gap between
information-theoretically
viable tests
(unbounded complexity)
tests with bounded
computation power
(bounded complexity)
4 / 21
5/21
Statistical Distinguishability under f -divergence Measures
I Objective function for LDA
argmax
a
a>
SB a
a>SW a
where SB and SW are between and within class
scatter matrices
I Optimizes a heuristic objective function
I Proposed objective for LDA
argmax
A
Df (QA k PA)
where PA and QA are probability measures
after dimensionality reduction
I Optimizes information measures as objective
Information measures represent the true performance limits in a wide range of inference problems
5 / 21
6/21
Data Model
I Consider n-dimensional zero-mean Gaussian models
P : X ∼ N(0, ΣP) vs. Q : X ∼ N(0, ΣQ)
I Design A ∈ Rr×n
to maximally distinguish r-dimensional models
PA : Y ∼ N(0, A ΣP A>
) vs. QA : Y ∼ N(0, A ΣQ A>
)
I WLOG design Ā to maximally distinguish models
PĀ : Y ∼ N(0, Ā Ā>
) vs. QĀ : Y ∼ N(0, Ā Σ Ā>
)
where
Σ
4
= Σ
−1/2
P ΣQ Σ
−1/2
P
6 / 21
7/21
Problem Statement
I f -divergence between r-dimensional data models PĀ and QĀ
Df (Ā)
4
= EPĀ

f

dQĀ
dPĀ

I Design Ā s.t.
P : max
Ā∈Rr×n
Df (Ā)
under the following four choices of f -divergence measures
I Kullback-Leibler divergence (DKL)
I Squared Hellinger distance (H2)
I Chi-squared divergence (χ2)
I Total Variation distance (dTV)
7 / 21
8/21
Design Space for A
I Motivation: Large design space for Ā a challenge
Theorem
Corresponding to any matrix Ā there exists a semi-orthogonal matrix A such that Df (Ā) = Df (A).
I WLOG problem P is equivalent to the constrained problem Q
Q ,
(
max
A∈Rr×n
Df (A)
s.t. A · A
= Ir
I Interpretation: Semi-orthogonality constraints limit the design space for A
8 / 21
9/21
f -divergences of Interest
I Kullback-Leibler (KL) divergence for f (t) = t log t:
DKL(A)
4
= EQA

log
dQA
dPA

I χ2
-divergence for f (t) = (t − 1)2
:
χ2
(A)
4
=
Z
Y
(dQA − dPA)2
dPA
I Squared Hellinger distance for f (t) = (1 −
√
t)2
:
H2
(A)
4
=
Z
Y
p
dQA −
p
dPA
2
I Total variation distance for f (t) = 1
2
· |t − 1|:
dTV(A)
4
=
1
2
Z
|dQA − dPA|
9 / 21
10/21
Bounds on Eigenspace
I Motivation: Eigenspace of AΣA
characterizes the optimal solution for all f -measures
Theorem (Poincaré Separation Theorem)
If the eigenvalues of Σ ∈ Rn×n
denoted by {λi : i ∈ [n]} s.t. λ1 ≥ · · · ≥ λn, then the eigenvalues of
AΣA
∈ Rr×r
denoted by {γi : i ∈ [r]} s.t. γ1 ≥ · · · ≥ γr satisfy
λn−(r−i) ≤ γi ≤ λi for all i ∈ [r]
I Interpretation: Example: n = 5, r = 2
eig(Σ)
λ5 λ4 λ3 λ2 λ1
eig(AΣA
)
γ1
eig(AΣA
)
γ2
10 / 21
11/21
Kullback-Leibler divergence: Motivation
-4
-2
0
2
4
0 20 40 60 80 100 0 20 40 60 80 100
0
20
40
60
80
100
120
140
Detection
Delay
I Inference Problem: Quickest change-point detection (Minimax setting)
min
τ
sup
κ≥1
Eκ [τ − κ | τ ≥ κ] subject to FAR(τ) ≤ α
I Figure of Merit: Average Detection Delay (ADD) for the asymptotic optimal test statistic
ADD ∼
c
DKL(Q k P)
11 / 21
12/21
Kullback-Leibler divergence: Results
Theorem
Define the permutation π∗
KL : [n] → [n] as a solution to π∗
KL
4
= arg maxπ DKL(λπ(i)). To maximize
DKL(A) =
Pr
i=1
1
2
(γi − log γi − 1)
1. The eigenvalues of AΣA
are given by γi = λπ∗
KL
(i).
2. Row i of matrix A is the eigenvector of Σ associated with the eigenvalue γi = λπ∗
KL
(i).
0 1 2 3 4 5
0
0.5
1
1.5
2
KL
12 / 21
13/21
Kullback-Leibler divergence: Observations
I Observation 1: If λmin
4
= λn ≥ 1
γi = λi for all i ∈ [r]
rows of A =⇒ eigenvectors of Σ associated with r largest {λi : i ∈ [r]}
Example: n = 5, r = 2
λ5 λ4 λ3 λ2 λ1
1 γ1 =
γ2 =
I Observation 2: If λmax
4
= λ1 ≤ 1
γi = λn−r+i for all i ∈ [r]
rows of A =⇒ eigenvectors of Σ associated with r smallest {λi : i ∈ [r]}
Example: n = 5, r = 2
λ5 λ4 λ3 λ2 λ1 1
γ1 =
γ2 =
13 / 21
14/21
Chi-squared divergence: Motivation
I Latent variable θ ⊂ Θ
I Estimator θ̂ = T(X1, . . . , Xs )
I Xi ∼ Pθ
I Inference Problem: Parameter estimation
I Figure of Merit: Variance of an estimator under quadratic loss
Varθ(θ̂) ≥ sup
θ06=θ
(Eθ0 [θ̂] − Eθ[θ̂])2
χ2(Q k P)
where θ0
⊂ Θ, P = Pθ, Q = Pθ0
14 / 21
15/21
Chi-squared divergence: Results
Theorem
Define the permutation π∗
χ2 : [n] → [n] as a solution to π∗
χ2
4
= arg maxπ χ2
(λπ(i)). To maximize
χ2
(A) =
Qr
i=1
1
√
γi (2−γi )
− 1
1. The eigenvalues of AΣA
are given by γi = λπ∗
χ2 (i).
2. Row i of matrix A is the eigenvector of Σ associated with the eigenvalue γi = λπ∗
χ2 (i).
0 1 2
0
2
4
6
2
15 / 21
16/21
Total Variation Distance: Motivation
-5 -3 -1 1 3 5
0
0.1
0.2
0.3
0.4
0.5
H0
H1 Type-I Error
Type-II Error
I Inference Problem: Hypothesis testing
H0 : X ∼ P vs. H1 : X ∼ Q
I Figure of Merit: Probability of Error
decision rule d : X → {H0, H1} =⇒ inf
d
[ PA(d = H1)
| {z }
Type-I error
+ QA(d = H0)
| {z }
Type-II error
] = 1 − dTV(PA, QA)
16 / 21
17/21
Total Variation Distance: Results
I No closed form expression for dTV for Gaussian models
I Maximize matching bounds on dTV instead
Theorem
Define the permutation π∗
dTV
: [n] → [n] as a solution to π∗
dTV
4
= arg maxπ dTV(λπ(i)). To maximize
matching bounds on dTV
1
100
≤ dTV(A)
min{1,
r
Pr
i=1

1
γi
−1
2
)}
≤ 3
2
1. The eigenvalues of AΣA
are given by γi = λπ∗
dTV
(i).
2. Row i of matrix A is the eigenvector of Σ associated with the eigenvalue γi = λπ∗
dTV
(i).
17 / 21
18/21
Squared Hellinger Distance: Motivation
I Inference Problem: Hypothesis testing
H0 : X ∼ P vs. H1 : X ∼ Q
I Figure of Merit: Bounds on probability of error (Pe ) for equal priors
Pe ≤
1
2
·

2 − H2
(P, Q)

18 / 21
19/21
Squared Hellinger Distance: Results
Theorem
Define the permutation π∗
H2 : [n] → [n] as a solution to π∗
H2
4
= arg maxπ H2
(λπ(i)). To maximize
H2
(A) = 2 − 2
Qr
i=1
4
q
4·γi
(γi +1)2
1. The eigenvalues of AΣA
are given by γi = λπ∗
H2 (i).
2. Row i of matrix A is the eigenvector of Σ associated with the eigenvalue γi = λπ∗
H2 (i).
0 1 2 3 4 5
0
0.5
1
1.5
2
H
2
19 / 21
20/21
Numerical Evaluations (Quickest change-point detection)
I Max LDA: Rows of A are eigenvectors associated with the maximum eigenvalues of Σ
I DKL LDA: Rows of A are eigenvectors associated with the eigenvalues of Σ that maximize DKL(A)
I Average detection delay (ADD) vs. reduced dimension r for data dimension n and fixed FAR
1 3 5 7 9
0
3
6
9
12
15
KL
1 3 5 7 9
0
3
6
9
12
KL
1 3 5 7 9
0
3
6
KL
20 / 21
21/21
Conclusions
I Linear dimensionality reduction for statistical inference problems
I Optimal design for linear transformation that optimize f -divergence measures for Gaussian models
I Row space of the linear map associated with the eigenspace of the covariance matrix
I Design of the linear map independent of the inference problem in certain regimes
21 / 21

More Related Content

What's hot

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)
Frank Nielsen
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Frank Nielsen
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
NAVER Engineering
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
Frank Nielsen
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formats
Alexander Litvinenko
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Alexander Litvinenko
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
Daisuke Yoneoka
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
Frank Nielsen
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
Alexander Litvinenko
 
Numerical approach for Hamilton-Jacobi equations on a network: application to...
Numerical approach for Hamilton-Jacobi equations on a network: application to...Numerical approach for Hamilton-Jacobi equations on a network: application to...
Numerical approach for Hamilton-Jacobi equations on a network: application to...
Guillaume Costeseque
 
Low-rank methods for analysis of high-dimensional data (SIAM CSE talk 2017)
Low-rank methods for analysis of high-dimensional data (SIAM CSE talk 2017) Low-rank methods for analysis of high-dimensional data (SIAM CSE talk 2017)
Low-rank methods for analysis of high-dimensional data (SIAM CSE talk 2017)
Alexander Litvinenko
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
Andres Mendez-Vazquez
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Frank Nielsen
 

What's hot (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formats
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Numerical approach for Hamilton-Jacobi equations on a network: application to...
Numerical approach for Hamilton-Jacobi equations on a network: application to...Numerical approach for Hamilton-Jacobi equations on a network: application to...
Numerical approach for Hamilton-Jacobi equations on a network: application to...
 
Low-rank methods for analysis of high-dimensional data (SIAM CSE talk 2017)
Low-rank methods for analysis of high-dimensional data (SIAM CSE talk 2017) Low-rank methods for analysis of high-dimensional data (SIAM CSE talk 2017)
Low-rank methods for analysis of high-dimensional data (SIAM CSE talk 2017)
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
 

Similar to Linear Discriminant Analysis (LDA) Under f-Divergence Measures

Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
Christian Robert
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cube
VjekoslavKovac1
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
Christian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theorem
Wathna
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
RayKim51
 
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Yandex
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Chiheb Ben Hammouda
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUs
Pantelis Sopasakis
 
Connection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problemsConnection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problems
Alexander Litvinenko
 
Talk iccf 19_ben_hammouda
Talk iccf 19_ben_hammoudaTalk iccf 19_ben_hammouda
Talk iccf 19_ben_hammouda
Chiheb Ben Hammouda
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
Tomasz Kusmierczyk
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
Christian Robert
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
Chiheb Ben Hammouda
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
The Statistical and Applied Mathematical Sciences Institute
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
The Statistical and Applied Mathematical Sciences Institute
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
Sean Meyn
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
Arthur Charpentier
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
FredrikRonquist
 

Similar to Linear Discriminant Analysis (LDA) Under f-Divergence Measures (20)

Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cube
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theorem
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUs
 
Connection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problemsConnection between inverse problems and uncertainty quantification problems
Connection between inverse problems and uncertainty quantification problems
 
Talk iccf 19_ben_hammouda
Talk iccf 19_ben_hammoudaTalk iccf 19_ben_hammouda
Talk iccf 19_ben_hammouda
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
 

More from Anmol Dwivedi

Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extraction
Anmol Dwivedi
 
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision ApplicationsTutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Anmol Dwivedi
 
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Anmol Dwivedi
 
Voltage stability Analysis using GridCal
Voltage stability Analysis using GridCalVoltage stability Analysis using GridCal
Voltage stability Analysis using GridCal
Anmol Dwivedi
 
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Anmol Dwivedi
 
Simulated Annealing for Optimal Power Flow (OPF)
Simulated Annealing for Optimal Power Flow (OPF)Simulated Annealing for Optimal Power Flow (OPF)
Simulated Annealing for Optimal Power Flow (OPF)
Anmol Dwivedi
 
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)
Anmol Dwivedi
 
Detection of Generator Loss of Excitation (LOE)
Detection of Generator Loss of Excitation (LOE)Detection of Generator Loss of Excitation (LOE)
Detection of Generator Loss of Excitation (LOE)
Anmol Dwivedi
 
Rotor Resistance Control of Wound Rotor Induction Generator (WRIG) using PSCA...
Rotor Resistance Control of Wound Rotor Induction Generator (WRIG) using PSCA...Rotor Resistance Control of Wound Rotor Induction Generator (WRIG) using PSCA...
Rotor Resistance Control of Wound Rotor Induction Generator (WRIG) using PSCA...
Anmol Dwivedi
 
IEEE International Conference Presentation
IEEE International Conference PresentationIEEE International Conference Presentation
IEEE International Conference Presentation
Anmol Dwivedi
 
Presentation on listening effectively.
Presentation on listening effectively.Presentation on listening effectively.
Presentation on listening effectively.
Anmol Dwivedi
 
Solar PV cells
Solar PV cellsSolar PV cells
Solar PV cells
Anmol Dwivedi
 

More from Anmol Dwivedi (12)

Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extraction
 
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision ApplicationsTutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
 
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
 
Voltage stability Analysis using GridCal
Voltage stability Analysis using GridCalVoltage stability Analysis using GridCal
Voltage stability Analysis using GridCal
 
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
 
Simulated Annealing for Optimal Power Flow (OPF)
Simulated Annealing for Optimal Power Flow (OPF)Simulated Annealing for Optimal Power Flow (OPF)
Simulated Annealing for Optimal Power Flow (OPF)
 
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)
 
Detection of Generator Loss of Excitation (LOE)
Detection of Generator Loss of Excitation (LOE)Detection of Generator Loss of Excitation (LOE)
Detection of Generator Loss of Excitation (LOE)
 
Rotor Resistance Control of Wound Rotor Induction Generator (WRIG) using PSCA...
Rotor Resistance Control of Wound Rotor Induction Generator (WRIG) using PSCA...Rotor Resistance Control of Wound Rotor Induction Generator (WRIG) using PSCA...
Rotor Resistance Control of Wound Rotor Induction Generator (WRIG) using PSCA...
 
IEEE International Conference Presentation
IEEE International Conference PresentationIEEE International Conference Presentation
IEEE International Conference Presentation
 
Presentation on listening effectively.
Presentation on listening effectively.Presentation on listening effectively.
Presentation on listening effectively.
 
Solar PV cells
Solar PV cellsSolar PV cells
Solar PV cells
 

Recently uploaded

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 

Recently uploaded (20)

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 

Linear Discriminant Analysis (LDA) Under f-Divergence Measures

  • 1. Linear Discriminant Analysis under f -divergence Measures Anmol Dwivedi Sihui Wang Ali Tajer Department of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute ISIT 2021 Linear Dis- crim- i- nant Anal- y- sis un- der f - divergence Mea- sures
  • 2. 2/21 Linear Discriminant Analysis −2 2 6 −2 0 2 4 −2 2 6 −2 0 2 4 I Choice of direction for projection to maximize separation of data as in figure1 1 Christopher M Bishop, Pattern Recognition and Machine Learning, Springer 2 / 21
  • 3. 3/21 Binary Classification: Discriminant Analysis I population sample X ∈ Rn I objective: observe X =⇒ classify between PA and QA where A ∈ Rr×n s.t. error constraints 3 / 21
  • 4. 4/21 Motivation I Motivation: inference in high dimensions requires forming high-dimensional statistics I Example: Consider the likelihood ratio dP dQ (X) test for classification I Challenges: I Computationally complex optimal test for large data dimension n I Renders the statistical-to-computational performance gap between information-theoretically viable tests (unbounded complexity) tests with bounded computation power (bounded complexity) 4 / 21
  • 5. 5/21 Statistical Distinguishability under f -divergence Measures I Objective function for LDA argmax a a> SB a a>SW a where SB and SW are between and within class scatter matrices I Optimizes a heuristic objective function I Proposed objective for LDA argmax A Df (QA k PA) where PA and QA are probability measures after dimensionality reduction I Optimizes information measures as objective Information measures represent the true performance limits in a wide range of inference problems 5 / 21
  • 6. 6/21 Data Model I Consider n-dimensional zero-mean Gaussian models P : X ∼ N(0, ΣP) vs. Q : X ∼ N(0, ΣQ) I Design A ∈ Rr×n to maximally distinguish r-dimensional models PA : Y ∼ N(0, A ΣP A> ) vs. QA : Y ∼ N(0, A ΣQ A> ) I WLOG design Ā to maximally distinguish models PĀ : Y ∼ N(0, Ā Ā> ) vs. QĀ : Y ∼ N(0, Ā Σ Ā> ) where Σ 4 = Σ −1/2 P ΣQ Σ −1/2 P 6 / 21
  • 7. 7/21 Problem Statement I f -divergence between r-dimensional data models PĀ and QĀ Df (Ā) 4 = EPĀ f dQĀ dPĀ I Design Ā s.t. P : max Ā∈Rr×n Df (Ā) under the following four choices of f -divergence measures I Kullback-Leibler divergence (DKL) I Squared Hellinger distance (H2) I Chi-squared divergence (χ2) I Total Variation distance (dTV) 7 / 21
  • 8. 8/21 Design Space for A I Motivation: Large design space for Ā a challenge Theorem Corresponding to any matrix Ā there exists a semi-orthogonal matrix A such that Df (Ā) = Df (A). I WLOG problem P is equivalent to the constrained problem Q Q , ( max A∈Rr×n Df (A) s.t. A · A = Ir I Interpretation: Semi-orthogonality constraints limit the design space for A 8 / 21
  • 9. 9/21 f -divergences of Interest I Kullback-Leibler (KL) divergence for f (t) = t log t: DKL(A) 4 = EQA log dQA dPA I χ2 -divergence for f (t) = (t − 1)2 : χ2 (A) 4 = Z Y (dQA − dPA)2 dPA I Squared Hellinger distance for f (t) = (1 − √ t)2 : H2 (A) 4 = Z Y p dQA − p dPA 2 I Total variation distance for f (t) = 1 2 · |t − 1|: dTV(A) 4 = 1 2 Z |dQA − dPA| 9 / 21
  • 10. 10/21 Bounds on Eigenspace I Motivation: Eigenspace of AΣA characterizes the optimal solution for all f -measures Theorem (Poincaré Separation Theorem) If the eigenvalues of Σ ∈ Rn×n denoted by {λi : i ∈ [n]} s.t. λ1 ≥ · · · ≥ λn, then the eigenvalues of AΣA ∈ Rr×r denoted by {γi : i ∈ [r]} s.t. γ1 ≥ · · · ≥ γr satisfy λn−(r−i) ≤ γi ≤ λi for all i ∈ [r] I Interpretation: Example: n = 5, r = 2 eig(Σ) λ5 λ4 λ3 λ2 λ1 eig(AΣA ) γ1 eig(AΣA ) γ2 10 / 21
  • 11. 11/21 Kullback-Leibler divergence: Motivation -4 -2 0 2 4 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 120 140 Detection Delay I Inference Problem: Quickest change-point detection (Minimax setting) min τ sup κ≥1 Eκ [τ − κ | τ ≥ κ] subject to FAR(τ) ≤ α I Figure of Merit: Average Detection Delay (ADD) for the asymptotic optimal test statistic ADD ∼ c DKL(Q k P) 11 / 21
  • 12. 12/21 Kullback-Leibler divergence: Results Theorem Define the permutation π∗ KL : [n] → [n] as a solution to π∗ KL 4 = arg maxπ DKL(λπ(i)). To maximize DKL(A) = Pr i=1 1 2 (γi − log γi − 1) 1. The eigenvalues of AΣA are given by γi = λπ∗ KL (i). 2. Row i of matrix A is the eigenvector of Σ associated with the eigenvalue γi = λπ∗ KL (i). 0 1 2 3 4 5 0 0.5 1 1.5 2 KL 12 / 21
  • 13. 13/21 Kullback-Leibler divergence: Observations I Observation 1: If λmin 4 = λn ≥ 1 γi = λi for all i ∈ [r] rows of A =⇒ eigenvectors of Σ associated with r largest {λi : i ∈ [r]} Example: n = 5, r = 2 λ5 λ4 λ3 λ2 λ1 1 γ1 = γ2 = I Observation 2: If λmax 4 = λ1 ≤ 1 γi = λn−r+i for all i ∈ [r] rows of A =⇒ eigenvectors of Σ associated with r smallest {λi : i ∈ [r]} Example: n = 5, r = 2 λ5 λ4 λ3 λ2 λ1 1 γ1 = γ2 = 13 / 21
  • 14. 14/21 Chi-squared divergence: Motivation I Latent variable θ ⊂ Θ I Estimator θ̂ = T(X1, . . . , Xs ) I Xi ∼ Pθ I Inference Problem: Parameter estimation I Figure of Merit: Variance of an estimator under quadratic loss Varθ(θ̂) ≥ sup θ06=θ (Eθ0 [θ̂] − Eθ[θ̂])2 χ2(Q k P) where θ0 ⊂ Θ, P = Pθ, Q = Pθ0 14 / 21
  • 15. 15/21 Chi-squared divergence: Results Theorem Define the permutation π∗ χ2 : [n] → [n] as a solution to π∗ χ2 4 = arg maxπ χ2 (λπ(i)). To maximize χ2 (A) = Qr i=1 1 √ γi (2−γi ) − 1 1. The eigenvalues of AΣA are given by γi = λπ∗ χ2 (i). 2. Row i of matrix A is the eigenvector of Σ associated with the eigenvalue γi = λπ∗ χ2 (i). 0 1 2 0 2 4 6 2 15 / 21
  • 16. 16/21 Total Variation Distance: Motivation -5 -3 -1 1 3 5 0 0.1 0.2 0.3 0.4 0.5 H0 H1 Type-I Error Type-II Error I Inference Problem: Hypothesis testing H0 : X ∼ P vs. H1 : X ∼ Q I Figure of Merit: Probability of Error decision rule d : X → {H0, H1} =⇒ inf d [ PA(d = H1) | {z } Type-I error + QA(d = H0) | {z } Type-II error ] = 1 − dTV(PA, QA) 16 / 21
  • 17. 17/21 Total Variation Distance: Results I No closed form expression for dTV for Gaussian models I Maximize matching bounds on dTV instead Theorem Define the permutation π∗ dTV : [n] → [n] as a solution to π∗ dTV 4 = arg maxπ dTV(λπ(i)). To maximize matching bounds on dTV 1 100 ≤ dTV(A) min{1, r Pr i=1 1 γi −1 2 )} ≤ 3 2 1. The eigenvalues of AΣA are given by γi = λπ∗ dTV (i). 2. Row i of matrix A is the eigenvector of Σ associated with the eigenvalue γi = λπ∗ dTV (i). 17 / 21
  • 18. 18/21 Squared Hellinger Distance: Motivation I Inference Problem: Hypothesis testing H0 : X ∼ P vs. H1 : X ∼ Q I Figure of Merit: Bounds on probability of error (Pe ) for equal priors Pe ≤ 1 2 · 2 − H2 (P, Q) 18 / 21
  • 19. 19/21 Squared Hellinger Distance: Results Theorem Define the permutation π∗ H2 : [n] → [n] as a solution to π∗ H2 4 = arg maxπ H2 (λπ(i)). To maximize H2 (A) = 2 − 2 Qr i=1 4 q 4·γi (γi +1)2 1. The eigenvalues of AΣA are given by γi = λπ∗ H2 (i). 2. Row i of matrix A is the eigenvector of Σ associated with the eigenvalue γi = λπ∗ H2 (i). 0 1 2 3 4 5 0 0.5 1 1.5 2 H 2 19 / 21
  • 20. 20/21 Numerical Evaluations (Quickest change-point detection) I Max LDA: Rows of A are eigenvectors associated with the maximum eigenvalues of Σ I DKL LDA: Rows of A are eigenvectors associated with the eigenvalues of Σ that maximize DKL(A) I Average detection delay (ADD) vs. reduced dimension r for data dimension n and fixed FAR 1 3 5 7 9 0 3 6 9 12 15 KL 1 3 5 7 9 0 3 6 9 12 KL 1 3 5 7 9 0 3 6 KL 20 / 21
  • 21. 21/21 Conclusions I Linear dimensionality reduction for statistical inference problems I Optimal design for linear transformation that optimize f -divergence measures for Gaussian models I Row space of the linear map associated with the eigenspace of the covariance matrix I Design of the linear map independent of the inference problem in certain regimes 21 / 21