7. A n
1.2
minimize f(x) =
xT
Ax
xTx
,
subject to x ∈ Rn
− {0} .
f(x)
A
x f ⇔ Ax =
xT
Ax
∥x∥2
x
⇒ x η η = x.
→
( ) 2016 3 10 6 / 67
8. 1.2 Rn
1.3
minimize f(x) = xT
Ax,
subject to x ∈ Rn
, xT
x = 1.
n − 1 Sn−1
1.4
minimize f(x) = xT
Ax,
subject to x ∈ Sn−1
.
( ) 2016 3 10 7 / 67
9. 1.1
M M
Ui Ui Rn
ϕi : Ui → ϕi(Ui)
i
Ui = M,
Ui ∩ Uj ∅
ϕi ◦ ϕ−1
j |ϕj(Ui∩Uj) : ϕj(Ui ∩ Uj) → ϕi(Ui ∩ Uj)
C∞
M Rn
M
R3
M
M
( ) 2016 3 10 8 / 67
10. p ≤ n
n − 1 Sn−1
= x ∈ Rn
| xT
x = 1 ⊂ Rn
n O(n) = X ∈ Rn×n
| XT
X = In ⊂ Rn×n
St(p, n) = Y ∈ Rn×p
| YT
Y = Ip ⊂ Rn×p
n − 1 RPn−1
= l : Rn
Grass(p, n) = W : Rn
p
( ) 2016 3 10 9 / 67
11. Rn
M
ηk M xk .
Rn
xk+1 := xk + tkηk
M
→ γ(0) = xk, ˙γ(0) = ηk M γ xk+1
R : TM → M Rx := R|TxM
xk+1 := Rxk
(tkηk), Rxk
: Txk
M → M.
( ) 2016 3 10 10 / 67
12. M R ( )
1.2
x0 ∈ M .
for k = 0, 1, 2, . . . do
ηk ∈ Txk
M tk > 0 .
xk+1 xk+1 := Rxk
(tkηk) .
end for
ηk tk
( ) 2016 3 10 11 / 67
14. M
ηk := − grad f(xk) grad M
⎧
⎪⎪⎨
⎪⎪⎩
η0 := − grad f(x0),
(?) ηk+1 := − grad f(xk+1) + βk+1ηk, k ≥ 0.
grad f ∇f
grad f(xk+1) ∈ Txk+1
M ηk ∈ Txk
M
( ) 2016 3 10 13 / 67
16. x ∈ M TxM
x ∈ M
2
M γ ˙γ(0)
f : M → R ˙γ(0)f =
d
dt
f(γ(t))|t=0
M ˙γ(0)
d
dt
γ(t)|t=0
Sn−1
:= {x ∈ Rn
| xT
x = 1}
TxSn−1
= {ξ ∈ Rn
| ξT
x = 0}.
( ) 2016 3 10 15 / 67
17. g
x ∈ M TxM gx x
Sn−1
Rn
Rn
⟨a, b⟩ = aT
b, a, b ∈ Rn
gx(ξ, η) = ξT
η, ξ, η ∈ TxSn−1
g TxM
gx(ξ, η) ⟨ξ, η⟩x
( ) 2016 3 10 16 / 67
18. f grad f(x)
M f x grad f(x) TxM
D f(x)[ξ] = gx(grad f(x), ξ), ξ ∈ TxM
Sn−1
f(x) = xT
Ax A
f Rn ¯f
¯f(x) = xT
Ax, x ∈ Rn
.
¯f Rn
∇¯f(x) = 2Ax
ξ ∈ TxSn−1
Df(x)[ξ] = 2xT
Aξ = 2xT
A(In − xxT
)ξ = gx(2(In − xxT
)Ax, ξ)
grad f(x) = 2 In − xxT
Ax.
( ) 2016 3 10 17 / 67
19. R : TM → M
R [Absil et al., 2008]
2.1
R : TM → M R
Rx := R|TxM R TxM
Rx(0x) = x, ∀x ∈ M. 0x TxM
DRx(0x)[ξ] = ξ, ∀x ∈ M, ξ ∈ TxM.
x ∈ M, ξ ∈ TxM γ(t) = Rx(tξ)
γ(0) = Rx(0) = x γ(t) x
˙γ(0) = DRx(0)[ξ] = ξ γ(t) ξ
( ) 2016 3 10 18 / 67
20. Sn−1
Rx(ξ) =
x + ξ
∥x + ξ∥
, x ∈ Sn−1
, ξ ∈ TxSn−1
R
( ) 2016 3 10 19 / 67
22. Rn
3.1 Rn
1: x0 ∈ Rn
.
2: η0 := −∇f(x0).
3: while ∇f(xk) 0 do
4: αk xk+1 := xk + αkηk .
5: βk+1
ηk+1 := −∇f(xk+1)+βk+1ηk (1)
6: k := k + 1.
7: end while
M
(1) +
grad f(x ) ∈ T M, η ∈ T M →( ) 2016 3 10 21 / 67
23. Vector transport
Vector transport
M vector transport T TM ⊕ TM → TM
x ∈ M
[Absil et al., 2008]
1 R π(Tηx
(ξx)) = R(ηx).
π(Tηx
(ξx)) Tηx
(ξx)
2 T0x
(ξx) = ξx, ξx ∈ TxM.
3 Tηx
(aξx + bζx) = aTηx
(ξx) + bTηx
(ζx), a, b ∈ R.
vector transport
( ) 2016 3 10 22 / 67
25. Vector transport
Vector transport
3.1 M
1: x0 ∈ M .
2: η0 := − grad f(x0).
3: while grad f(xk) 0 do
4: αk xk+1 := Rxk
(αkηk) .
5: βk+1 ηk+1 := − grad f(xk+1) + βk+1Tαkηk
(ηk)
6: k := k + 1.
7: end while
αk βk
( ) 2016 3 10 24 / 67
30. βk
βk
gk := ∇f(xk), yk := gk+1 − gk
Fletcher–Reeves: Rn
βFR
k+1 =
∥gk+1∥2
∥gk∥2
.
→ M
βk+1 =
⟨grad f(xk+1), grad f(xk+1)⟩xk+1
⟨grad f(xk), grad f(xk)⟩xk
Dai–Yuan: Rn
βDY
k+1 =
∥gk+1∥2
ηT
k yk
.
→ M
(?) βk+1 :=
⟨grad f(xk+1), grad f(xk+1)⟩xk+1
⟨ηk, yk⟩xk
yk = grad f(xk+1) − Tαkηk
(grad f(xk))?
( ) 2016 3 10 29 / 67
31. Fletcher–Reeves
Scaled vector transport
Rn
vector transport T
∥Tαk−1ηk−1
(ηk−1)∥xk
≤ ∥ηk−1∥xk−1
Vector transport
Vector transport T R
scaled vector transport T 0
[Sato & Iwai, 2015]
T 0
η (ξ) =
∥ξ∥x
∥T R
η (ξ)∥Rx(η)
T R
η (ξ), ξ, η ∈ TxM.
( ) 2016 3 10 30 / 67
32. Fletcher–Reeves
Scaled vector transport Fletcher–Reeves
3.2 Fletcher–Reeves
1: x0 ∈ M
2: η0 := − grad f(x0).
3: while grad f(xk) 0 do
4: αk
xk+1 := Rxk
(αkηk)
5: βk+1 :=
⟨grad f(xk+1), grad f(xk+1)⟩xk+1
⟨grad f(xk), grad f(xk)⟩xk
ηk+1 := − grad f(xk+1) + βk+1T (k)
αkηk
(ηk)
6: k := k + 1.
7: end while
T (k)
αkηk
(ηk) :=
⎧
⎪⎪⎨
⎪⎪⎩
T R
αkηk
(ηk), if ∥T R
αkηk
(ηk)∥xk+1
≤ ∥ηk∥xk
,
T 0
αkηk
(ηk), otherwise.
( ) 2016 3 10 31 / 67
33. Fletcher–Reeves
Fletcher–Reeves
3.1 (Sato & Iwai, 2015)
f C1
L > 0
|D(f ◦ Rx)(tη)[η] − D(f ◦ Rx)(0)[η]| ≤ Lt,
η ∈ TxM with ∥η∥x = 1, x ∈ M, t ≥ 0
3.2 {xk}
lim inf
k→∞
∥grad f(xk)∥xk
= 0
( ) 2016 3 10 32 / 67
34. Fletcher–Reeves
[Ring & Wirth, 2012]
k
∥T R
αk−1ηk−1
(ηk−1)∥xk
≤ ∥ηk−1∥xk−1
(11)
vector transport T R
[Sato & Iwai, 2015]
(11) (11) vector
transport scaled vector transport
( ) 2016 3 10 33 / 67
35. Fletcher–Reeves
(11)
n = 20, A = diag(1, . . . , 20) Sn−1
:= x ∈ Rn
| xT
x = 1
3.1
minimize f(x) = xT
Ax,
subject to x ∈ Sn−1
,
Sn−1
gx(ξx, ηx) := ξT
x Gxηx, ξx, ηx ∈ TxSn−1
,
Gx := diag(104
(x(1)
)2
+ 1, 1, 1, . . . , 1) x(1)
x 1
( ) 2016 3 10 34 / 67
36. Fletcher–Reeves
grad f(x) = 2 In −
G−1
x xxT
xTG−1
x x
G−1
x Ax.
Rx(ξ) =
x + ξ
(x + ξ)T(x + ξ)
, ξ ∈ TxSn−1
, x ∈ Sn−1
,
Vector transport:
T R
η (ξ) =
1
(x + η)T(x + η)
In −
(x + η)(x + η)T
(x + η)T(x + η)
ξ,
η, ξ ∈ TxSn−1
, x ∈ Sn−1
.
x∗ f(x∗) = 1
( ) 2016 3 10 35 / 67
57. [Sato & Iwai, 2013]
A ∈ Rm×n
, m ≥ n
p ≤ n N = diag(µ1, . . . , µp), µ1 > · · · > µp > 0
4.1
minimize − tr(UT
AVN),
subject to (U, V) ∈ St(p, m) × St(p, n).
(U∗, V∗) U∗, V∗
A p
2
( ) 2016 3 10 56 / 67
58. [Yger et al., 2012]
0 2 X ∈ RT×m
, Y ∈ RT×n
CX = XT
X, CY = YT
Y, CXY = XT
Y
u ∈ Rm
, v ∈ Rn
f = Xu, g = Yv
2 f g ρ
ρ =
Cov(f, g)
Var(f) Var(g)
=
uT
CXYv
√
uTCXu
√
vTCYv
.
ρ
4.2
maximize uT
CXYv,
subject to uT
CXu = vT
CYv = 1.
2
( ) 2016 3 10 57 / 67
59. [Yger et al., 2012]
u, v
4.3
maximize tr(UT
CXYV),
subject to (U, V) ∈ StCX
(p, m) × StCY
(p, n).
n G
StG(p, n)
StG(p, n) = {Y ∈ Rn×p
| YT
GY = Ip}
2
( ) 2016 3 10 58 / 67
60. [Sato & Sato, 2015]
˙x =Ax + Bu,
y =Cx.
u ∈ Rp
y ∈ Rq
x ∈ Rn
˙xm =Amxm + Bmu,
ym =Cmxm.
Am = UT
AU, Bm = UT
B, Cm = CU, U ∈ Rn×m
U
UT
U = Im
( ) 2016 3 10 59 / 67
61. [Sato & Sato, 2015]
4.4
minimize J(U),
subject to U ∈ St(m, n).
J
J(U) := ∥Ge∥2 = tr(CeEcCT
e ) = tr(BT
e EoBe)
Ae =
A 0
0 UT
AU
, Be =
B
UT
B
, Ce = C −CU Ec
Eo
AeEc + EcAT
e + BeBT
e =0, AT
e Eo + EoAe + CT
e Ce = 0.
( ) 2016 3 10 60 / 67
63. [Kasai & Mishra, 2015]
X ∈ Rn1×n2×n3
r
X = G×1U1×2U2×3U3, G ∈ Rr1×r2×r3
, Ud ∈ St(rd, nd), d = 1, 2, 3.
→ M := St(r1, n1) × St(r2, n2) × St(r3, n3) × Rr1×r2×r3
Od ∈ O(rd), d = 1, 2, 3
(U1, U2, U3, G) → (U1O1, U2O2, U3O3, G ×1 OT
1 ×2 OT
2 ×3 OT
3 )
X
M/(O(r1) × O(r2) × O(r3))
( ) 2016 3 10 62 / 67
64. [Yao et al., 2016]
1
DSIEP (Doubly Stochastic Inverse Eigenvalue Problem):
self-conjugate {λ1, λ2, . . . , λn}
n × n C
λ1, λ2, . . . , λn
λi
( ) 2016 3 10 63 / 67
65. [Yao et al., 2016]
Oblique OB := {Z ∈ Rn×n
| diag(ZZT
) = In}
Λ := diag(λ1, λ2, . . . , λn)
U:
1 Z ⊙ Z, Z ∈ OB
(Z ⊙ Z)T
1n − 1n = 0
Z ⊙ Z λ1, λ2, . . . , λn
Z ⊙ Z = Q(Λ + U)QT
, Q ∈ O(n), U ∈ U
( ) 2016 3 10 64 / 67
66. [Yao et al., 2016]
H1(Z, Q, U) := Z ⊙ Z − Q(Λ + U)QT
, H2(Z) := (Z ⊙ Z)T
1n − 1n
H(Z, Q, U) := (H1(Z, Q, U), H2(Z))
4.6
minimize h(Z, Q, U) :=
1
2
∥H(Z, Q, U)∥2
F,
subject to (Z, Q, U) ∈ OB × O(n) × U.
OB × O(n) × U
( ) 2016 3 10 65 / 67
70. I
[1] Absil, P.A., Mahony, R., Sepulchre, R.: Optimization
Algorithms on Matrix Manifolds. Princeton University Press,
Princeton, NJ (2008)
[2] Dai, Y.H., Yuan, Y.: A nonlinear conjugate gradient method
with a strong global convergence property. SIAM Journal
on Optimization 10(1), 177–182 (1999)
[3] Edelman, A., Arias, T.A., Smith, S.T.: The geometry of
algorithms with orthogonality constraints. SIAM Journal on
Matrix Analysis and Applications 20(2), 303–353 (1998)
[4] Fletcher, R., Reeves, C.M.: Function minimization by
conjugate gradients. The Computer Journal 7(2), 149–154
(1964)
( ) 2016 3 10 68 / 67
71. II
[5] Kasai, H., Mishra, B.: Riemannian preconditioning for
tensor completion. arXiv preprint arXiv:1506.02159v1
(2015)
[6] Narushima, Y., Yabe, H., Ford, J.A.: A three-term conjugate
gradient method with sufficient descent property for
unconstrained optimization. SIAM Journal on optimization
21(1), 212–230 (2011)
[7] Ring, W., Wirth, B.: Optimization methods on Riemannian
manifolds and their application to shape space. SIAM
Journal on Optimization 22(2), 596–627 (2012)
[8] Sato, H.: A Dai–Yuan-type Riemannian conjugate gradient
method with the weak Wolfe conditions. Computational
Optimization and Applications (2015)
( ) 2016 3 10 69 / 67
72. III
[9] Sato, H., Iwai, T.: A Riemannian optimization approach to
the matrix singular value decomposition. SIAM Journal on
Optimization 23(1), 188–212 (2013)
[10] Sato, H., Iwai, T.: A new, globally convergent Riemannian
conjugate gradient method. Optimization 64(4), 1011–1031
(2015)
[11] Sato, H., Sato, K.: Riemannian trust-region methods for H2
optimal model reduction. In: Proceedings of the 54th IEEE
Conference on Decision and Control, pp. 4648–4655
(2015)
[12] Tan, M., Tsang, I.W., Wang, L., Vandereycken, B., Pan,
S.J.: Riemannian pursuit for big matrix recovery. In:
Proceedings of the 31st International Conference on
Machine Learning, pp. 1539–1547 (2014)
( ) 2016 3 10 70 / 67
73. IV
[13] Yao, T.T., Bai, Z.J., Zhao, Z., Ching, W.K.: A Riemannian
Fletcher–Reeves conjugate gradient method for doubly
stochastic inverse eigenvalue problems. SIAM Journal on
Matrix Analysis and Applications 37(1), 215–234 (2016)
[14] Yger, F., Berar, M., Gasso, G., Rakotomamonjy, A.:
Adaptive canonical correlation analysis based on matrix
manifolds. In: Proceedings of the 29th International
Conference on Machine Learning (ICML-12), pp.
1071–1078 (2012)
( ) 2016 3 10 71 / 67