Difference Between Search & Browse Methods in Odoo 17
Lecture note4coordinatedescent
1. Coordinate descent optimization in Recommendation
system
Xudong Sun,sun@aisbi.de
DSOR-AISBI
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 1 / 14
2. Outline
1 Introduction
2 Case Study
3 Reference
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 2 / 14
3. Introduction
Recapturing the mathematical model behind
recommendation system
SVD with regularization
Maximum a posterior
min
x∗,y∗
u,i (rui − xT
u yi )2
+ λ( u ||x2
u || + i ||y2
i ||)
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 3 / 14
4. Introduction
Background
Drawbacks of Gradient Descent algorithm
Appropriate learning rate is hard to choose
Slow convergence with asymptotic behavior
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 4 / 14
5. Introduction
Basic idea of Coordinate descent
optimize each coordinate(dimension) sequentially to decrease the
objective f (x∗ + d ˙ei ) >= f (x∗)
iterate until the result converge
Question: Will it work?
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 5 / 14
6. Introduction
Preliminaries
What is a convex set ? What properties a convect set has?
Do you know how to compute derivative in Matrix algebra?
is CD equivalent to SGD? [f (x∗ + d ˙ei ) >= f (x∗)] ≡ [f (x∗) = min]?
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 6 / 14
7. Introduction
How it works exactly
suppose we want to optimize in the k th iteration using the result of k-1 th
x
(k)
1
:= argmin
x1
f (x1, x
(k−1)
2
, x
(k−1)
3
, ....)
x
(k)
2
:= argmin
x2
f (x
(k)
1
, x2, x
(k−1)
3
, ....)
...
x
(k)
n := argmin
xn
f (x
(k)
1
, x
(k)
2
, x
(k)
3
, ....)
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 7 / 14
8. Case Study
Algorithms
Linear regression
Lasso
SVM(SMO and DCD with python implementation)
basic matrix factorization
WRMF
Factorization machine
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 8 / 14
9. Case Study
Preliminaries:matrix dierentiation
convention of derivative of a vector with respect to a vector: keep the
orientation of the denominator vector!
∂y
∂x =
∂y1
∂x1
∂y2
∂x1
· · · ∂ym
∂x1
... ... ... ...
∂y1
∂xn
∂y2
∂xn
· · · ∂ym
∂xn
,∂yT
∂x =
∂y1
∂x1
∂y1
∂x2
· · · ∂y1
∂xn
∂y2
∂x1
∂y2
∂x2
· · · ∂y2
∂xn
... ... ... ...
∂ym
∂x1
∂ym
∂x2
· · · ∂ym
∂xn
∂Ax
∂x = AT , [Ax]i = ai,j xj , [∂Ax
∂x ]k,i = ∂[Ax]i
∂xk
= ai,k
∂xT Ax
∂x = Ax + AT x, xT Ax = n
i=1
n
j=1
aij xi xj ,
∂xT Ax
∂xk
=
∂ aij xi xj
∂xk
= ak,j xj + ai,kxi , A must be square but not
necessarily symmetric
∂xT x
∂x = ∂xT Ix
∂x = 2x
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 9 / 14
10. Case Study
Linear regression
f (x) = argmin
x
1
2
||y − Ax||2
0 = ∂f (x)
∂xi
= AT [i, :](y − Ax)(−1) = AT [i, :](A[:, i]xi + A[:, −i]x−i − y)
here A[:, −i] means all columns of A except for the index i,
x∗
i =
AT [i,:](y−A[:,−i]x−i )
AT [i,:]A[:,i]
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 10 / 14
11. Case Study
Algorithm for CD in Linear regression
question: why can't we directly solve for xi inside the bracket? (Recall
Abstract algebra or data structure or algorithm course)
Input: Design matrix A
Input: target variable for each training sample y
Output: coecient for each linear variable
while cycleMaxCycle do
x∗
i =
AT [i,:](y−A[:,−i]x−i )
AT [i,:]A[:,i]
end
Algorithm 1: CD for linear regression
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 11 / 14
12. Case Study
Simple coordinate descent for Collaborative ltering
Model:User latent feature Ui = [Ui1, Ui2, ..., Uin], Movie latent feature
Vj = [Vj1, Uj2, ..., Vjn]
Objective function ||R − UT V ||2
+ λ(||U||2
+ ||V ||2
)
||R[i, :] − U[:, i]T V ||2
+ ||R[−i, :] − U[−i, :]T V ||2
+ λ(||U||2
+ ||V ||2
)
Grad2Ui = (−2V )(R[i, :] − U[:, i]T V )T + 2λU[:, i] = 0,
U[:, i] = (VV T + λI)−1
VR[i, :]T
||R[:, j] − UT V [:, j]||2
+ ||R[:, −j] − UT V [:, −i]||2
+ λ(||U||2
+ ||V ||2
)
Grad2Vj = (−2U)(R[:, j] − UT V [:, j]) + 2λV [:, j] = 0
UR[:, j] = (UUT + λI)V [:, j]
V [:, j] = (UUT + λI)−1
UR[:, j]
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 12 / 14
13. Case Study
WRMF coordinate descent algorithm: ALS
objective function min
x∗,y∗
u,i cui (pui − xT
u yi )2
+ λ( u ||x2
u || + i ||y2
i ||)
Alternating Least Squares solution (one iteration)
xu = (Y T CuY + λIf ×f )−1
Y T Cup(u)
yu = (XX Ci X + λIf ×f )−1
XT Ci p(i)
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 13 / 14