Lecture note4coordinatedescent

•

0 likes•240 views

Xudong Sun

I give lecture in recommendation system

Education

Coordinate descent optimization in Recommendation
system
Xudong Sun,sun@aisbi.de
DSOR-AISBI
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 1 / 14

Outline
1 Introduction
2 Case Study
3 Reference
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 2 / 14

Introduction
Recapturing the mathematical model behind
recommendation system
SVD with regularization
Maximum a posterior
min
x∗,y∗
u,i (rui − xT
u yi )2
+ λ( u ||x2
u || + i ||y2
i ||)
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 3 / 14

Introduction
Background
Drawbacks of Gradient Descent algorithm
Appropriate learning rate is hard to choose
Slow convergence with asymptotic behavior
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 4 / 14

Introduction
Basic idea of Coordinate descent
optimize each coordinate(dimension) sequentially to decrease the
objective f (x∗ + d ˙ei ) >= f (x∗)
iterate until the result converge
Question: Will it work?
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 5 / 14

Introduction
Preliminaries
What is a convex set ? What properties a convect set has?
Do you know how to compute derivative in Matrix algebra?
is CD equivalent to SGD? [f (x∗ + d ˙ei ) >= f (x∗)] ≡ [f (x∗) = min]?
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 6 / 14

Introduction
How it works exactly
suppose we want to optimize in the k th iteration using the result of k-1 th
x
(k)
1
:= argmin
x1
f (x1, x
(k−1)
2
, x
(k−1)
3
, ....)
x
(k)
2
:= argmin
x2
f (x
(k)
1
, x2, x
(k−1)
3
, ....)
...
x
(k)
n := argmin
xn
f (x
(k)
1
, x
(k)
2
, x
(k)
3
, ....)
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 7 / 14

Case Study
Algorithms
Linear regression
Lasso
SVM(SMO and DCD with python implementation)
basic matrix factorization
WRMF
Factorization machine
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 8 / 14

Case Study
Preliminaries:matrix dierentiation
convention of derivative of a vector with respect to a vector: keep the
orientation of the denominator vector!
∂y
∂x =
∂y1
∂x1
∂y2
∂x1
· · · ∂ym
∂x1
... ... ... ...
∂y1
∂xn
∂y2
∂xn
· · · ∂ym
∂xn
,∂yT
∂x =
∂y1
∂x1
∂y1
∂x2
· · · ∂y1
∂xn
∂y2
∂x1
∂y2
∂x2
· · · ∂y2
∂xn
... ... ... ...
∂ym
∂x1
∂ym
∂x2
· · · ∂ym
∂xn
∂Ax
∂x = AT , [Ax]i = ai,j xj , [∂Ax
∂x ]k,i = ∂[Ax]i
∂xk
= ai,k
∂xT Ax
∂x = Ax + AT x, xT Ax = n
i=1
n
j=1
aij xi xj ,
∂xT Ax
∂xk
=
∂ aij xi xj
∂xk
= ak,j xj + ai,kxi , A must be square but not
necessarily symmetric
∂xT x
∂x = ∂xT Ix
∂x = 2x
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 9 / 14

Case Study
Linear regression
f (x) = argmin
x
1
2
||y − Ax||2
0 = ∂f (x)
∂xi
= AT [i, :](y − Ax)(−1) = AT [i, :](A[:, i]xi + A[:, −i]x−i − y)
here A[:, −i] means all columns of A except for the index i,
x∗
i =
AT [i,:](y−A[:,−i]x−i )
AT [i,:]A[:,i]
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 10 / 14

Case Study
Algorithm for CD in Linear regression
question: why can't we directly solve for xi inside the bracket? (Recall
Abstract algebra or data structure or algorithm course)
Input: Design matrix A
Input: target variable for each training sample y
Output: coecient for each linear variable
while cycleMaxCycle do
x∗
i =
AT [i,:](y−A[:,−i]x−i )
AT [i,:]A[:,i]
end
Algorithm 1: CD for linear regression
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 11 / 14

Case Study
Simple coordinate descent for Collaborative ltering
Model:User latent feature Ui = [Ui1, Ui2, ..., Uin], Movie latent feature
Vj = [Vj1, Uj2, ..., Vjn]
Objective function ||R − UT V ||2
+ λ(||U||2
+ ||V ||2
)
||R[i, :] − U[:, i]T V ||2
+ ||R[−i, :] − U[−i, :]T V ||2
+ λ(||U||2
+ ||V ||2
)
Grad2Ui = (−2V )(R[i, :] − U[:, i]T V )T + 2λU[:, i] = 0,
U[:, i] = (VV T + λI)−1
VR[i, :]T
||R[:, j] − UT V [:, j]||2
+ ||R[:, −j] − UT V [:, −i]||2
+ λ(||U||2
+ ||V ||2
)
Grad2Vj = (−2U)(R[:, j] − UT V [:, j]) + 2λV [:, j] = 0
UR[:, j] = (UUT + λI)V [:, j]
V [:, j] = (UUT + λI)−1
UR[:, j]
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 12 / 14

Case Study
WRMF coordinate descent algorithm: ALS
objective function min
x∗,y∗
u,i cui (pui − xT
u yi )2
+ λ( u ||x2
u || + i ||y2
i ||)
Alternating Least Squares solution (one iteration)
xu = (Y T CuY + λIf ×f )−1
Y T Cup(u)
yu = (XX Ci X + λIf ×f )−1
XT Ci p(i)
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 13 / 14

Reference
Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 14 / 14

What's hot

Lecture 4: Stochastic Hydrology (Site Characterization)Amro Elfeki

Solovay Kitaev theoremJamesMa54

Lecture 5: Stochastic Hydrology Amro Elfeki

Lecture 6: Stochastic Hydrology (Estimation Problem-Kriging-, Conditional Sim...Amro Elfeki

MLP輪読スパース8章トレースノルム正則化Akira Tanimoto

MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB

A brief survey of tensorsBerton Earnshaw

Stochastic Hydrology Lecture 1: Introduction Amro Elfeki

Brief Introduction About Topological Interference Management (TIM)Pei-Che Chang

Lecture 2: Stochastic Hydrology Amro Elfeki

Hyperparameter optimization with approximate gradientFabian Pedregosa

Optimal Budget Allocation: Theoretical Guarantee and Efficient AlgorithmTasuku Soma

Maximizing Submodular Function over the Integer LatticeTasuku Soma

Lecture 3: Stochastic HydrologyAmro Elfeki

Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata

DSP 05 _ Sheet FiveAmr E. Mohamed

Pseudo Random Number GeneratorsDarshini Parikh

cheb_conf_aksenov.pdfAlexey Vasyukov

Writing your own Neural Network.shafkatdu9212

Lecture9 xingTianlu Wang

What's hot (20)

Lecture 4: Stochastic Hydrology (Site Characterization)

Solovay Kitaev theorem

Lecture 5: Stochastic Hydrology

Lecture 6: Stochastic Hydrology (Estimation Problem-Kriging-, Conditional Sim...

MLP輪読スパース8章トレースノルム正則化

MVPA with SpaceNet: sparse structured priors

A brief survey of tensors

Stochastic Hydrology Lecture 1: Introduction

Brief Introduction About Topological Interference Management (TIM)

Lecture 2: Stochastic Hydrology

Hyperparameter optimization with approximate gradient

Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm

Maximizing Submodular Function over the Integer Lattice

Lecture 3: Stochastic Hydrology

Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...

DSP 05 _ Sheet Five

Pseudo Random Number Generators

cheb_conf_aksenov.pdf

Writing your own Neural Network.

Lecture9 xing

Similar to Lecture note4coordinatedescent

Projection methods for stochastic structural dynamicsUniversity of Glasgow

Matrix Factorizations for Recommender SystemsDmitriy Selivanov

block-mdp-masters-defense.pdfJunghyun Lee

Stochastic Alternating Direction Method of MultipliersTaiji Suzuki

MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...The Statistical and Applied Mathematical Sciences Institute

MLHEP Lectures - day 2, basic trackarogozhnikov

Hybrid dynamics in large-scale logistics networksMKosmykov

Sampled-Data Piecewise Affine Slab Systems: A Time-Delay ApproachBehzad Samadi

WITMSE 2013Joe Suzuki

2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...asahiushio1

Nonconvex Compressed Sensing with the Sum-of-Squares MethodTasuku Soma

SASA 2016Mzabalazo Ngwenya

Improving EV Lateral Dynamics Control Using Infinity Norm Approach with Close...Valerio Salvucci

Phase Retrieval: Motivation and TechniquesVaibhav Dixit

Visual Impression Localization of Autonomous Robots_#CASE2015Soma Boubou

[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...Yuko Kuroki (黒木祐子)

Backstepping for Piecewise Affine Systems: A SOS ApproachBehzad Samadi

QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...The Statistical and Applied Mathematical Sciences Institute

engineeringmathematics-iv_unit-iiKundan Kumar

Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-IIRai University

Similar to Lecture note4coordinatedescent (20)

Projection methods for stochastic structural dynamics

Matrix Factorizations for Recommender Systems

block-mdp-masters-defense.pdf

Stochastic Alternating Direction Method of Multipliers

MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...

MLHEP Lectures - day 2, basic track

Hybrid dynamics in large-scale logistics networks

Sampled-Data Piecewise Affine Slab Systems: A Time-Delay Approach

WITMSE 2013

2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...

Nonconvex Compressed Sensing with the Sum-of-Squares Method

SASA 2016

Improving EV Lateral Dynamics Control Using Infinity Norm Approach with Close...

Phase Retrieval: Motivation and Techniques

Visual Impression Localization of Autonomous Robots_#CASE2015

[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...

Backstepping for Piecewise Affine Systems: A SOS Approach

QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...

engineeringmathematics-iv_unit-ii

Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-II

Recently uploaded

9953330565 Low Rate Call Girls In Rohini Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

ESSENTIAL of (CS/IT/IS) class 06 (database)Dr. Mazin Mohamed alkathiri

Meghan Sutherland In Media Res Media ComponentInMediaRes1

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR

Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar

CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood

OS-operating systems- ch04 (Threads) ...Dr. Mazin Mohamed alkathiri

What is Model Inheritance in Odoo 17 ERPCeline George

Alper Gobel In Media Res Media ComponentInMediaRes1

Earth Day Presentation wow hello nice greatYousafMalik24

Types of Journalistic Writing Grade 8.pptxEyham Joco

Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc

Crayon Activity Handout For the Crayon AUnboundStockton

Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton

Hierarchy of management that covers different levels of managementmkooblal

MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001

Difference Between Search & Browse Methods in Odoo 17Celine George

Recently uploaded (20)

9953330565 Low Rate Call Girls In Rohini Delhi NCR

ESSENTIAL of (CS/IT/IS) class 06 (database)

Meghan Sutherland In Media Res Media Component

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx

CELL CYCLE Division Science 8 quarter IV.pptx

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT

OS-operating systems- ch04 (Threads) ...

What is Model Inheritance in Odoo 17 ERP

Alper Gobel In Media Res Media Component

Earth Day Presentation wow hello nice great

Types of Journalistic Writing Grade 8.pptx

Procuring digital preservation CAN be quick and painless with our new dynamic...

Crayon Activity Handout For the Crayon A

Blooming Together_ Growing a Community Garden Worksheet.docx

Hierarchy of management that covers different levels of management

MICROBIOLOGY biochemical test detailed.pptx

Difference Between Search & Browse Methods in Odoo 17

Lecture note4coordinatedescent

1. Coordinate descent optimization in Recommendation system Xudong Sun,sun@aisbi.de DSOR-AISBI Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 1 / 14

2. Outline 1 Introduction 2 Case Study 3 Reference Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 2 / 14

3. Introduction Recapturing the mathematical model behind recommendation system SVD with regularization Maximum a posterior min x∗,y∗ u,i (rui − xT u yi )2 + λ( u ||x2 u || + i ||y2 i ||) Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 3 / 14

4. Introduction Background Drawbacks of Gradient Descent algorithm Appropriate learning rate is hard to choose Slow convergence with asymptotic behavior Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 4 / 14

5. Introduction Basic idea of Coordinate descent optimize each coordinate(dimension) sequentially to decrease the objective f (x∗ + d ˙ei ) >= f (x∗) iterate until the result converge Question: Will it work? Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 5 / 14

6. Introduction Preliminaries What is a convex set ? What properties a convect set has? Do you know how to compute derivative in Matrix algebra? is CD equivalent to SGD? [f (x∗ + d ˙ei ) >= f (x∗)] ≡ [f (x∗) = min]? Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 6 / 14

7. Introduction How it works exactly suppose we want to optimize in the k th iteration using the result of k-1 th x (k) 1 := argmin x1 f (x1, x (k−1) 2 , x (k−1) 3 , ....) x (k) 2 := argmin x2 f (x (k) 1 , x2, x (k−1) 3 , ....) ... x (k) n := argmin xn f (x (k) 1 , x (k) 2 , x (k) 3 , ....) Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 7 / 14

8. Case Study Algorithms Linear regression Lasso SVM(SMO and DCD with python implementation) basic matrix factorization WRMF Factorization machine Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 8 / 14

9. Case Study Preliminaries:matrix dierentiation convention of derivative of a vector with respect to a vector: keep the orientation of the denominator vector! ∂y ∂x = ∂y1 ∂x1 ∂y2 ∂x1 · · · ∂ym ∂x1 ... ... ... ... ∂y1 ∂xn ∂y2 ∂xn · · · ∂ym ∂xn ,∂yT ∂x = ∂y1 ∂x1 ∂y1 ∂x2 · · · ∂y1 ∂xn ∂y2 ∂x1 ∂y2 ∂x2 · · · ∂y2 ∂xn ... ... ... ... ∂ym ∂x1 ∂ym ∂x2 · · · ∂ym ∂xn ∂Ax ∂x = AT , [Ax]i = ai,j xj , [∂Ax ∂x ]k,i = ∂[Ax]i ∂xk = ai,k ∂xT Ax ∂x = Ax + AT x, xT Ax = n i=1 n j=1 aij xi xj , ∂xT Ax ∂xk = ∂ aij xi xj ∂xk = ak,j xj + ai,kxi , A must be square but not necessarily symmetric ∂xT x ∂x = ∂xT Ix ∂x = 2x Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 9 / 14

10. Case Study Linear regression f (x) = argmin x 1 2 ||y − Ax||2 0 = ∂f (x) ∂xi = AT [i, :](y − Ax)(−1) = AT [i, :](A[:, i]xi + A[:, −i]x−i − y) here A[:, −i] means all columns of A except for the index i, x∗ i = AT [i,:](y−A[:,−i]x−i ) AT [i,:]A[:,i] Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 10 / 14

11. Case Study Algorithm for CD in Linear regression question: why can't we directly solve for xi inside the bracket? (Recall Abstract algebra or data structure or algorithm course) Input: Design matrix A Input: target variable for each training sample y Output: coecient for each linear variable while cycleMaxCycle do x∗ i = AT [i,:](y−A[:,−i]x−i ) AT [i,:]A[:,i] end Algorithm 1: CD for linear regression Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 11 / 14

12. Case Study Simple coordinate descent for Collaborative ltering Model:User latent feature Ui = [Ui1, Ui2, ..., Uin], Movie latent feature Vj = [Vj1, Uj2, ..., Vjn] Objective function ||R − UT V ||2 + λ(||U||2 + ||V ||2 ) ||R[i, :] − U[:, i]T V ||2 + ||R[−i, :] − U[−i, :]T V ||2 + λ(||U||2 + ||V ||2 ) Grad2Ui = (−2V )(R[i, :] − U[:, i]T V )T + 2λU[:, i] = 0, U[:, i] = (VV T + λI)−1 VR[i, :]T ||R[:, j] − UT V [:, j]||2 + ||R[:, −j] − UT V [:, −i]||2 + λ(||U||2 + ||V ||2 ) Grad2Vj = (−2U)(R[:, j] − UT V [:, j]) + 2λV [:, j] = 0 UR[:, j] = (UUT + λI)V [:, j] V [:, j] = (UUT + λI)−1 UR[:, j] Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 12 / 14

13. Case Study WRMF coordinate descent algorithm: ALS objective function min x∗,y∗ u,i cui (pui − xT u yi )2 + λ( u ||x2 u || + i ||y2 i ||) Alternating Least Squares solution (one iteration) xu = (Y T CuY + λIf ×f )−1 Y T Cup(u) yu = (XX Ci X + λIf ×f )−1 XT Ci p(i) Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 13 / 14

14. Reference Xudong Sun,sun@aisbi.de (DSOR-AISBI)Coordinate descent optimization in Recommendation system 14 / 14

Lecture note4coordinatedescent

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Lecture note4coordinatedescent

Similar to Lecture note4coordinatedescent (20)

Recently uploaded

Recently uploaded (20)

Lecture note4coordinatedescent