Seminar9

Candid Covariance-Free Incremental
Principal Component Analysis

CCIPCA(1) – Basic Problem
▧ 𝑢 1 , 𝑢 2 , ⋯ , 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑦 𝑖𝑛𝑓𝑖𝑛𝑖𝑡𝑒 : The sample vectors are acquired sequentially.
▧ 𝑢(𝑘) : 𝑘-th a 𝑑-dimensional vector. 𝑑 can be as large as 5,000 and beyond.
▧ 𝑢(𝑘) has a zero mean.
▧ 𝐴 = 𝐸{𝑢 𝑛 𝑢 𝑇
𝑛 } is the 𝑑 × 𝑑 covariance matrix.
▧ By definition, an eigenvector 𝑥 of matrix 𝐴 satisfies
λ𝑥 = 𝐴𝑥, ⋯ (1)
𝑣 = λ𝑥 → λ = ||𝑣||, 𝑥 =
𝑣
| 𝑣 |
.

▧ Considering 𝑥 =
𝑣
| 𝑣 |
, we may choose 𝑥 =
𝑣(𝑖−1)
| 𝑣(𝑖−1) |
,
𝑣 𝑛 =
1
𝑛 𝑖=1
𝑛
𝑢 𝑖 𝑢 𝑇
𝑖
𝑣 𝑖−1
𝑣 𝑖−1
. ⋯ (3)
▧ We set 𝑣(0) = 𝑢(1). For incremental estimation, (3) is written in a recursive form,
𝑣 𝑛 =
𝑛−1
𝑛
𝑣 𝑛 − 1 +
1
𝑛
𝑢(𝑛)𝑢 𝑇
(𝑛)
𝑣(𝑛−1)
||𝑣(𝑛−1)||
. ⋯ (4)
CCIPCA(2) – Incremental estimation
𝑣(𝑛) =
1
𝑛 𝑖=1
𝑛
𝑢 𝑖 𝑢 𝑇
𝑖 𝑥(𝑖) . ⋯ (2)
▧ By replacing the unknown 𝐴 and 𝑥, we use its estimate 𝑥(𝑖) at each time step 𝑖.
𝑣(𝑛) is the 𝑛th step estimate of 𝑣

CCIPCA (3) – amnesic parameter
▧ A way to implement this idea is to use an amnesic average by changing (4) into
𝑣 𝑛 =
𝑛−1−𝑙
𝑛
𝑣 𝑛 − 1 +
𝑙+1
𝑛
𝑢(𝑛)𝑢 𝑇
(𝑛)
𝑣(𝑛−1)
| 𝑣 𝑛−1 |
, ⋯ (10)
▧ where the positive parameter 𝑙 is called the amnesic parameter. Typically, 𝑙 ranges
from 2 to 4.
▧ We consider a further improvement to procedure (4).
In (4), we suppose that the “samples” are weighted equally.
𝑤 𝑖 = 𝑢(𝑖)𝑢 𝑇
(𝑖)
𝑣(𝑖−1)
| 𝑣 𝑖−1 |
,
▧ Then, since 𝑤(𝑖) is generated by 𝑣(𝑖 − 1) and 𝑣(𝑖 − 1) is far away from its real value
at a early estimation stage, w(𝑖) is a “sample” with large “noise” when 𝑖 is small.
4 의 첫번째 항 ∶
𝑛−1
𝑛
𝑣 𝑛 − 1 =
1
𝑛
𝑢 𝑛 − 1 𝑢 𝑇
𝑛 − 1
𝑣 𝑛−2
𝑣 𝑛−2
+ ⋯ +
1
𝑛
𝑢 1 𝑢 𝑇
1
𝑣 0
𝑣 0
.

CCIPCA (4) – Intuitive Explanation
▧ (4) 에 대한 직관적인 설명은 다음과 같습니다.
- Gaussian probability distribution functio을 따르는 이차원 데이터 셋을 고려합니다.
𝑣1(𝑛 − 1)
𝑢(𝑛)
𝑛 − 1
𝑛
𝑣1(𝑛 − 1) 1
𝑛
𝑢(𝑛)𝑢 𝑇
(𝑛)
𝑣(𝑛 − 1)
| 𝑣 𝑛 − 1 |
𝑢 𝑇
(𝑛)
𝑣(𝑛−1)
| 𝑣 𝑛−1 |
: scalar
1
𝑛
𝑢 𝑛 𝑢 𝑇
𝑛
𝑣 𝑛−1
𝑣 𝑛−1
: scaled vector of 𝑢 𝑛
𝑣1(𝑛)
𝑣1(𝑛) 𝑣1(𝑛)
𝑢(𝑛)
𝑣1(𝑛 − 1)
𝑛 − 1 − 𝑙
𝑛
𝑣1(𝑛 − 1)
1 + 𝑙
𝑛
𝑢(𝑛)𝑢 𝑇
(𝑛)
𝑣(𝑛 − 1)
| 𝑣 𝑛 − 1 |
𝑣1(𝑛)

CCIPCA (5) - Higher-Order Eigenvectors
▧ We know eigenvectors are orthogonal to each other. So, it helps to generate
“observations” only in a complementary space for the computation of the higher order
eigenvectors.
𝑣1
𝐴 : complementary
space for 𝑣1
𝑣2
▧ For example, to compute the second order eigenvector, we first subtract from the
data its projection on the estimated first order eigenvector 𝑣1(𝑛),
𝑢2 𝑛 = 𝑢1 𝑛 − 𝑢1
𝑇
(𝑛)
𝑣1(𝑛)
||𝑣1 𝑛 ||
𝑣1(𝑛)
||𝑣1 𝑛 ||
,
where 𝑢1(𝑛) = 𝑢(𝑛). 𝑢2 𝑛 is in the complementary space of 𝑣1(𝑛) serves as the input
data to the iteration step.

CCIPCA (6) - Algorithm
예) n=1,2,3, k=2.
▧ Algorithm Summary
>> First Step
𝑛 = 1,
1. 𝑢1 1 = 𝑢 1 .
2. 𝑖 = 1,
(a) 𝑣1 1 = 𝑢1 1 .
=> output : 𝑣1 1 .
>> Second Step
𝑛 = 2,
1. 𝑢1 2 = 𝑢 2 .
2. 𝑖 = 1,
(b) 𝑣1 2 =
1−𝑙
2
𝑣1 1 +
1+𝑙
2
𝑢1 2 𝑢1
𝑇
(2)
𝑣1(1)
||𝑣1 1 ||
,
𝑢2 2 = 𝑢1 2 − 𝑢1
𝑇
(2)
𝑣1(2)
||𝑣1 2 ||
𝑣1(2)
||𝑣1 2 ||
.
3. 𝑖 = 2,
(a) 𝑣2 2 = 𝑢2 2 .
=> output : 𝑣1 2 , 𝑣2 2 .
>> Third Step
𝑛 = 3,
1. 𝑢1 3 = 𝑢 3 .
2. 𝑖 = 1,
(b) 𝑣1 3 =
2−𝑙
3
𝑣1 2 +
1+𝑙
3
𝑢1 3 𝑢1
𝑇
(3)
𝑣1(2)
||𝑣1 2 ||
,
𝑢2 3 = 𝑢1 3 − 𝑢1
𝑇
(3)
𝑣1(3)
||𝑣1 3 ||
𝑣1(3)
||𝑣1 3 ||
.
3. 𝑖 = 2,
(b) 𝑣2 3 =
2−𝑙
3
𝑣2 2 +
1+𝑙
3
𝑢2 3 𝑢2
𝑇
(3)
𝑣2(2)
||𝑣2 2 ||
,
𝑢3 3 = 𝑢2 3 − 𝑢2
𝑇
(3)
𝑣2(3)
||𝑣2 3 ||
𝑣2(3)
||𝑣2 3 ||
.
=> output : 𝑣1 3 , 𝑣2 3 .

CCIPCA (7)
▧ 그림으로 살펴보는 알고리즘
𝑢(1)
𝑣1(1)
First StepSecond Step
𝑢(2)
𝑣1(2)
𝑣2 2 = 𝑢2 2
−𝑢1
𝑇
(2)
𝑣1(2)
||𝑣1 2 ||
𝑣1(2)
||𝑣1 2 ||
1 − 𝑙
2
𝑣1 1
1 + 𝑙
2
𝑢1 2 𝑢1
𝑇
(2)
𝑣1(1)
||𝑣1 1 ||
>> First Step
𝑛 = 1,
1. 𝑢1 1 = 𝑢 1 .
2. 𝑖 = 1,
(a) 𝑣1 1 = 𝑢1 1 .
=> output : 𝑣1 1 .
>> Second Step
𝑛 = 2,
1. 𝑢1 2 = 𝑢 2 .
2. 𝑖 = 1,
(b) 𝑣1 2 =
1−𝑙
2
𝑣1 1 +
1+𝑙
2
𝑢1 2 𝑢1
𝑇
(2)
𝑣1(1)
||𝑣1 1 ||
,
𝑢2 2 = 𝑢1 2 − 𝑢1
𝑇
(2)
𝑣1(2)
||𝑣1 2 ||
𝑣1(2)
||𝑣1 2 ||
.
3. 𝑖 = 2,
(a) 𝑣2 2 = 𝑢2 2 .
=> output : 𝑣1 2 , 𝑣2 2 .

CCIPCA (8)
▧ 그림으로 살펴보는 알고리즘
𝑢(1)
𝑢(2)
𝑣1(2)
𝑣2(2)
𝑢(3)
𝑢1(3)
2 − 𝑙
3
𝑣1 2
1 + 𝑙
3
𝑢1 3 𝑢1
𝑇
(3)
𝑣1(2)
||𝑣1 2 ||
𝑣1(3)
𝑢2(3)
−𝑢1
𝑇
(3)
𝑣1(3)
||𝑣1 3 ||
𝑣1(3)
||𝑣1 3 ||
−𝑢2(3)
𝑣2(3)
Third Step
>> Third Step
𝑛 = 3,
1. 𝑢1 3 = 𝑢 3 .
2. 𝑖 = 1,
(b) 𝑣1 3 =
2−𝑙
3
𝑣1 2 +
1+𝑙
3
𝑢1 3 𝑢1
𝑇
(3)
𝑣1(2)
||𝑣1 2 ||
,
𝑢2 3 = 𝑢1 3 − 𝑢1
𝑇
(3)
𝑣1(3)
||𝑣1 3 ||
𝑣1(3)
||𝑣1 3 ||
.
3. 𝑖 = 2,
(b) 𝑣2 3 =
2−𝑙
3
𝑣2 2 +
1+𝑙
3
𝑢2 3 𝑢2
𝑇
(3)
𝑣2(2)
||𝑣2 2 ||
,
𝑢3 3 = 𝑢2 3 − 𝑢2
𝑇
(3)
𝑣2(3)
||𝑣2 3 ||
𝑣2(3)
||𝑣2 3 ||
.
=> output : 𝑣1 3 , 𝑣2 3 .

CCIPCA (9) - 실험 준비
▧ We define sample-to-dimension ratio as
𝑛
𝑑
, where 𝑛 is the number of samples and
𝑑 is the dimension of the sample space.
▧ First presented here are our results on the FERET face data set [18]. The data set has 982
images. The size of each image is 88 x 64 pixels or 5,632 dimensions.
▧ The sample-to-dimension ratio as
982
5632
= 0.17
▧ We computed the eigenvectors using a batch PCA with QR method and used them as our
gound truth. The program for batch PCA was adapted from the C Recipes [9].
▧ Since the real mean of the image data is unknown. We incrementally estimated the sample
mean 𝑚(𝑛) by
𝑚 𝑛 =
𝑛−1
𝑛
𝑚 𝑛 − 1 +
1
𝑛
𝑥(𝑛)
where 𝑥(𝑛) is the 𝑛th sample image.
▧ The data entering the IPCA algorithms are the scatter vectors,
𝑢 𝑛 = 𝑥 𝑛 − 𝑚 𝑛 , 𝑛 = 1, 2, ⋯

CCIPCA (10)
SGA
실험 결과(1) The comparision of IPCA algorithms
GHA
CCIPCA
수렴하지 않음
SGA보다는 수렴하는
경향을 보이지만
정확도가 떨어짐
비록 고차(higher
order) 고유벡터의
수렴속도가 느리고
정확도가 떨어져도,
수렴도 하고 빠르다.
▧ The correlation between the estimated unit eigenvector 𝑣 and the one computed by the
batch meghod 𝑣′, also normalized, is represented by their inner product 𝑣 ∙ 𝑣′.

CCIPCA (11) 실험 결과(2) The ratio of eigenvalues
▧ To examine the convergence of eigenvalues, we use the ratio
||𝑣 𝑖||
λ 𝑖
.
∗ 𝑣𝑖 ∶ the length of the estimated eigenvector. This means estimate eigenvalue
at step 𝑖.
∗ λ𝑖 : the estimate comnputed by the batch method.
||𝑣𝑖||
λ𝑖

CCIPCA (12)
▧ To demonstrate the effect of amnesic parameter 𝑙 in (10).
𝑣 𝑛 =
𝑛−1−𝑙
𝑛
𝑣 𝑛 − 1 +
𝑙+1
𝑛
𝑢(𝑛)𝑢 𝑇
(𝑛)
𝑣(𝑛−1)
| 𝑣 𝑛−1 |
, ⋯ (10)
𝑙 이 클수록
수렴이 빠르다.
실험 결과(3) The effect of amnesic parameter

CCIPCA (13)
▧ Next, we will show the performance of the algorithm with a much longer data stream.
▧ Since the statistics of a real-world image stream may not necessarily be stationary,
the changing mean and variance make convergence evaluation difficult.
▧ To void this effect, we simulate a statistically stable long data stream by feeding the
images in FERET data set repeatedly into the algorithms.
실험 (4) Performance of the algorighms
▧ 예를 들어, 평균과 분산이 크게 변하지 않는 안정적인 데이터 100개가 있다면
100개를 알고리즘에 넣어서 구한 10개의 eigenvector를 구하는 것을 1 Epoch로,
그 다음 이 100개의 데이터를 다시 알고리즘에 넣어서 10개의 eigenvector를 구하는
것을 2 Epoch하여 총 20번을 반복한다. 알고리즘이 수렴하지 않는 것이 데이터가
안정적이지 못해서 나타날 수 있기 때문에 이같은 상황을 피하고자 안정적인 데이터를
가지고 수렴에 대한 성능평가를 하였다.

SGA
GHA
CCIPCA
수렴하지 않음.
수렴속도가 느리거나
고차 고유젝터에서는
수렴하는 것으로 보기
어려움.
모든 고유벡터가
수렴함을 알 수 있음.
CCIPCA (14) 실험 결과(4) Performance of the algorighms

CCIPCA (15)
▧ The average execution time of SGA, GHA, and CCIPCA in each estimation step is
shown in Table 1.
▧ GHA and CCIPCA run significantly faster than SGA. CCIPCA has a further
computational advantage over GHA because of a saving in normalization.
실험 결과(4) Performance of the algorighms

CCIPCA (16) 결론
▧ 이 논문은 incremental 하게 들어오는 high dimensional data stream의 고유벡터와
고유값을 계산하는 문제에 초점을 맞추었다.
▧ CCIPCA는 수렴속도가 빠르고 계산 복잡도가 낮다. (SGA, GHA에 비해)

CCIPCA (5) - Higher-Order Eigenvectors - 보충자료
▧ One way to compute the other higher order eigenvectors is following what SGA does.
SGA(Stochastic gradient ascent).
Start with a set of orthonormalized vectors, update them using the suggested iteration
step and recover the orthogonality using GSO.
SGA computes, Oja and Karhunen[9, 10],
𝑣𝑖 𝑛 = 𝑣𝑖 𝑛 − 1 + 𝛾𝑖 𝑢 𝑛 𝑢 𝑇
𝑛 𝑣𝑖 𝑛 − 1 ⋯ (6)
𝑣𝑖 𝑛 = 𝑜𝑟𝑡ℎ𝑜𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒 𝑣𝑖 𝑛 w.r.t. 𝑣𝑗(𝑛), 𝑗 = 1, 2, ⋯ , 𝑖 − 1. ⋯ (7)
where 𝑣𝑖 𝑛 is the estimate of the 𝑖th dominant eigenvectors of the sample covariance
matrix.

Seminar9

More Related Content

What's hot

Similar to Seminar9

Recently uploaded

Seminar9