Presentation held on June 8 2016 at Imperial College London at the conclusion of my undergraduate research project in the field of Computer Vision. The project entitled "Robust Joint and Individual Component Analysis" was supervised by Stefanos Zafeiriou and Yannis Panagakis of Imperial College's Computing Department. It led to a publication in CVPR 2017 available at http://openaccess.thecvf.com/content_cvpr_2017/html/Sagonas_Robust_Joint_and_CVPR_2017_paper.html
3. 3/28
Contents
Joint and Individual Variation Explained (JIVE)
Robust JIVE
Robust Nuclear Norm Regularised JIVE
Toy Example
Alina Leidinger (ICL) RJICA June 8, 2016 3 / 28
4. 4/28
Joint and Individual Variation Explained (JIVE)5
different views on the same data e.g. measurements of different
quantities on the same set of objects
Original application: cancer research
Data from The Cancer Genome Atlas
Potential fields of Application: clinical studies (patient groups),
Finance (markets), e-commerce (websites), environmental sciences
(locations), facial recognition (expressions) ...
5
Eric F. Lock et al. “Joint and individual variation explained (JIVE) for integrated
analysis of multiple data types”. In: Annals of Applied Statistics 7.1 (2013). ID:
TN_scopus2-s2.0-84876058478, pp. 523–542.
Alina Leidinger (ICL) RJICA June 8, 2016 4 / 28
6. 6/28
JIVE: Mathematical Framework
Xi = Ji + Ai + Ei i = 1, . . . , k
or
X1
...
Xk
X
=
J1
...
Jk
J
+
A1
...
Ak
A
+
E1
...
Ek
E
require uncorrelatedness between joint and individual components:
JAT
i = 0
Alina Leidinger (ICL) RJICA June 8, 2016 6 / 28
7. 7/28
JIVE: Optimisation Problem
arg min
J,Ai
X − J − [AT
1 , . . . , AT
k ]T 2
F
s.t. JAT
i = 0, rank(J) = r, rank(Ai ) = ri
J(t+1)
= arg min
J
J − (X − [A
(t)T
1 , . . . , A
(t)T
k ]T
) 2
F
s.t. rank(J) = r
A
(t+1)
i = arg min
Ai
Ai − (Xi − J
(t+1)
i ) 2
F
s.t. rank(Ai ) = ri , J(t+1)
AT
i = 0
Alina Leidinger (ICL) RJICA June 8, 2016 7 / 28
8. 8/28
JIVE: Alternating Optimisation
J(t+1)
= arg min
J
J − (X − [A
(t)T
1 , . . . , A
(t)T
k ]T
) 2
F s.t. rank(J) = r
→ rank-r SVD of X − [A
(t)T
1 , . . . , A
(t)T
k ]T
A
(t+1)
i = arg min
Ai
Ai − (Xi − J
(t+1)
i ) 2
F s.t rank(Ai ) = ri , J(t+1)
AT
i = 0
→ (rank-ri SVD of Xi − J
(t+1)
i ) × (I − VV T )
Singular Value Decomposition: J(t+1) = UΣV T
Alina Leidinger (ICL) RJICA June 8, 2016 8 / 28
9. 9/28
JIVE: Algorithm6
Algorithm 1 JIVE
1: Xjoint = X = [XT
1 . . . XT
k ]T
2: while not converged do
3: Estimate J by a rank r SVD of Xjoint i.e. J = Ur Λr V T
r
4: for i = 1, . . . , k do
5: Set XIndividual
i = Xi − Ji
6: Estimate Ai by a rank ri SVD of XIndividual
i (I − VV T )
7: Set XJoint
i = Xi − Ai
8: XJoint = [XJointT
1 . . . XJointT
k ]T
6
Lock et al. “Supplement to "JOINT AND INDIVIDUAL VARIATION EXPLAINED
(JIVE) FOR INTEGRATAED ANALYSIS OF MULTIPLE DATA TYPES"”. In: The
Annals of Applied Statistics 7.1 (2013).
Alina Leidinger (ICL) RJICA June 8, 2016 9 / 28
10. 10/28
JIVE: Criticism
sensitive to outliers, missing data points
sensitive to gross non-Gaussian noise
combinatorial pre-processing step: inefficient, unsuitable for analysis of
images
Alina Leidinger (ICL) RJICA June 8, 2016 10 / 28
11. 11/28
Robust JIVE
arg min
J,Ai
E 0 s.t. X = J + A + E, JAT
i = 0
→ arg min
J,Ai
E 1 s.t. X = J + A + E, JAT
i = 0
non-Gaussian noise
sparsity
0-norm → 1-norm7
7
Robert Tibshirani. “Regression Shrinkage and Selection via the Lasso”. In: Journal
of the Royal Statistical Society.Series B (Methodological) 58.1 (1996). ID:
TN_jstor_archive2346178, pp. 267–288.
Alina Leidinger (ICL) RJICA June 8, 2016 11 / 28
12. 12/28
Robust JIVE: Augmented Lagrangian Method8
arg max
X
f (X) s.t. g(X) = 0
X(t)
= arg max
X
f (X) + Z(t)
, g(X) +
µ(t)
2
g(X) 2
F
= arg max
X
f (X) +
µ(t)
2
g(X) + Z(t)
/µ(t) 2
F
µ(t+1)
= ρµ(t)
Z(t+1)
= Z(t)
+ µ(t)
(g(X))
8
Dimitri P. Bertsekas. Constrained optimization and Lagrange multiplier methods.
Bibliography: p.383-392.- Includes index.; ID: 44IMP_ALMA_DS2145810210001591.
New York ; London: Academic Press, 1982.
Alina Leidinger (ICL) RJICA June 8, 2016 12 / 28
13. 13/28
Robust JIVE: Alternating Direction Method9
arg min
E
E 1+
µ
2
X − J − A − E + Z/µ 2
F
s.t. JAT
i = 0, rank(J) = r, rank(Ai ) = ri
E∗
= arg min
E
E 1+
µ
2
X − J − A − E + Z/µ 2
F
J∗
= arg min
J
µ
2
X − J − A − E + Z/µ 2
F s.t. rank(J) = r
A∗
i = arg min
Ai
µ
2
Xi − Ji − Ai − Ei + Zi /µ 2
F
s.t rank(Ai ) = ri , JAT
i = 0
9
Daniel Gabay and Bertrand Mercier. “A dual algorithm for the solution of nonlinear
variational problems via finite element approximation”. In: Computers and Mathematics
with Applications 2.1 (1976). ID:
TN_sciversesciencedirect_elsevier0898-1221(76)90003-1, pp. 17–40.
Alina Leidinger (ICL) RJICA June 8, 2016 13 / 28
14. 14/28
Robust JIVE: ADM Steps10
E(t+1)
= arg min
E
E 1+
µ(t)
2
X − J(t)
− A(t)
− E + Z(t)
/µ(t) 2
F
= Sµ(t)−1 (X − J(t)
− A(t)
+ Z(t)
/µ(t)
)
Sa(x) =
x − a x > a
x + a x < a
0 otherwise
10
Emmanuel Candès J. et al. “Robust principal component analysis?” In: Journal of
the ACM (JACM) 58.3 (2011). ID: TN_acm1970395, pp. 1–37.
Alina Leidinger (ICL) RJICA June 8, 2016 14 / 28
15. 15/28
Robust JIVE: Algorithm
Algorithm 2 Robust JIVE
1: Output: J, Ai , E
2: Xjoint = X = [XT
1 . . . XT
k ]T
3: Initialise µ(0) = 0.01, Z(0) = 0, E(0) = 0, J(0) = 0, A
(0)
i = 0, ρ = 1.1
4: while not converged do
5: Set E(t+1) = Sµ(t)−1 (X − J(t) − A(t) + Z(t)/µ(t))
6: Set J(t+1) equal to a rank r SVD of X − A(t) − E(t+1) + Z(t)/µ(t)
7: for i = 1, . . . , k do
8: Set A
(t+1)
i equal to a rank ri SVD of (Xi − J
(t+1)
i − E
(t+1)
i +
Z
(t)
i /µ(t))(I − VV T ) where J(t+1) = UΣV T
9: Update Z(t+1) = Z(t) + µ(t)(X − J(t+1) − A(t+1) − E(t+1))
10: Update µ(t+1) = µ(t)ρ
11: t = t + 1
Alina Leidinger (ICL) RJICA June 8, 2016 15 / 28
16. 16/28
Robust Nuclear Norm Regularised JIVE
minJ,Ai ,E αrank(J) + i αi rank(Ai ) + λ E 1
s.t. X = J + A + E, JAT
i = 0
rank function → nuclear norm, . ∗
11
11
Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. “Guaranteed Minimum- Rank
Solutions of Linear Matrix Equations via Nuclear Norm Minimization”. In: SIAM
Review 52.3 (2010). ID: TN_siam10.1137/070697835, pp. 471–501.
Alina Leidinger (ICL) RJICA June 8, 2016 16 / 28
17. 17/28
Robust Nuclear Norm Regularised JIVE
min
J,Ai ,E
α J ∗+
i
αi Ai ∗+λ E 1
s.t. X = J + A + E, JAT
i = 0
or min
J,Ai ,Ri ,E
α J ∗+
i
αi Ri ∗+λ E 1
s.t. X = J + A + E, Ai = Ri , JAT
i = 0
= min
J,Ai ,Ri ,E
α J ∗+
i
αi Ri ∗+λ E 1
+
µ
2
X − J − A − E +
Z
µ
2
F +
i
γi
2
Ri − Ai +
Yi
γi
2
F s.t.JAT
i = 0
Alina Leidinger (ICL) RJICA June 8, 2016 17 / 28
18. 18/28
Robust Nuclear Norm Regularised JIVE: ADM Steps12
E(t+1)
= arg min
E
λ E 1+
µ(t)
2
E − (X − J(t+1)
− A(t+1)
+
Z(t)
µ(t)
) 2
F
= Sλ/µ(t) (X − J(t+1)
− A(t+1)
+ Z(t)
/µ(t)
)
12
Emmanuel Candès J. et al. “Robust principal component analysis?” In: Journal of
the ACM (JACM) 58.3 (2011). ID: TN_acm1970395, pp. 1–37.
Alina Leidinger (ICL) RJICA June 8, 2016 18 / 28
19. 19/28
Robust Nuclear Norm Regularised JIVE13
J(t+1)
= arg min
J
α J ∗+
µ(t)
2
J − (X − A(t)
− E(t)
+
Z(t)
µ(t)
) 2
F
= Dα/µ(t) (X − A(t)
− E(t)
+
Z(t)
µ(t)
)
R
(t+1)
i = arg min
Ri
αi Ri ∗+
γ
(t)
i
2
Ri − A
(t)
i +
Y
(t)
i
γ
(t)
i
2
F
= Dαi /γ
(t)
i
(A
(t+1)
i −
Y
(t)
i
γ
(t)
i
)
Dλ(X) = Udiag(max{σi − λ, 0})V T
(X = UΣV T
)
13
Jian-Feng Cai, Emmanuel J. Candès, and Zuowei Shen. “A Singular Value
Thresholding Algorithm for Matrix Completion”. In: SIAM Journal on Optimization
20.4 (2010). ID: TN_siam10.1137/080738970, pp. 1956–1982.
Alina Leidinger (ICL) RJICA June 8, 2016 19 / 28
20. 20/28
Robust Nuclear Norm Regularised JIVE:
Linearising Cost Function1415
arg min
Ai
µ
2
Xi − Ji − Ai − Ei +
Zi
µ
2
F +
γi
2
Ri − Ai +
Yi
γi
2
F
f (Ai )
s.t.JAT
i = 0
f (Ai ) ≈ f (A
(t)
i ) + f (A
(t)
i ), Ai − A
(t)
i +
1
2τ
Ai − A
(t)
i
2
F
= f (A
(t)
i ) +
1
2τ
Ai − A
(t)
i + τ f (A
(t)
i ) 2
F
14
Junfeng Yang and Xiaoming Yuan. “Linearized augmented Lagrangian and
alternating direction methods for nuclear norm minimization”. In: Mathematics of
Computation 82.281 (2013), pp. 301–329.
15
Zhouchen Lin, Risheng Liu, and Zhixun Su. “Linearized Alternating Direction
Method with Adaptive Penalty for Low-Rank Representation”. In: (2011). ID:
TN_arxiv1109.0367.
Alina Leidinger (ICL) RJICA June 8, 2016 20 / 28
21. 21/28
Robust Nuclear Norm Regularised JIVE:
Linearising Cost Function
A
(t+1)
i = arg min
Ai
1
2τ
Ai − A
(t)
i + τ f (A
(t)
i ) 2
F s.t. JAT
i = 0
= (A
(t)
i − τ f (A
(t)
i ))(I − VV T
) J(t+1)
= UΣV T
f (A
(t)
i ) =µ(t)
(A
(t)
i − Xi + J
(t+1)
i − Z
(t)
i /µ(t)
)
+ γ
(t)
i (A
(t)
i − R
(t)
i − Y
(t)
i /γ
(t)
i ))
Alina Leidinger (ICL) RJICA June 8, 2016 21 / 28
23. 23/28
Toy Example
Figure: Superposed Images
17
17
Joint and Individual Variation Explained (JIVE) for the Integrated Analysis of
Multiple Datatypes. http://www.tc.umn.edu/~elock/Talks/JIVEtalk. Accessed:
01/06/2016.
Alina Leidinger (ICL) RJICA June 8, 2016 23 / 28