SlideShare a Scribd company logo
Coordinate descent method 
2013.11.21 
SanghyukChun 
Many contents are from 
Large Scale Optimization Lecture 5 by Caramanis& Sanghaviin Texas Austin 
Optimization Lecture 25 by Geoff Gordon and Ryan Tibshiraniin CMU 
Convex Optimization Lecture 20 by SuvritSrain UC Berkeley 1
Contents 
•Overview 
•Convergence Analysis 
•Examples 
2
Overview of Coordinate descent method 
•Idea 
•Recall: unconstrained minimization problem 
•From Lecture 1, the formation of an unconstrained optimization problem is as follows 
•min푓푥 
•Where 푓:푅푛→푅is convex and smooth 
•In this problem, the necessary and sufficient condition for optimal solution x0 is 
•훻푓푥=0푎푡푥=푥0 
•훻푓푥= 휕푓 휕푥1 풆ퟏ+⋯+ 휕푓 휕푥푛 풆풏=0 
•Thus, in this situation, 휕푓 휕푥1=⋯= 휕푓 휕푥푛 =0 
•What if minimize for each basis respectively? 
3
Overview of Coordinate descent method 
•Description 
•퐿푒푡푒1,푒2,…,푒푛is basis for function 푓 
•If 푥푖 푘is given, the 푖thcoordinate of 푥푖 푘+1is given by 
•푥푖 푘+1←푎푟푔푚푖푛푦∈R푓(푥1 푘+1,…,푥푖−1 푘+1,푦,푥푖+1 푘,…,푥푛 푘) 
•푥푖 푘+1overwrites value in 푥푖 푘(in actual implementation) 
•Algorithm 
•Initialize with guess 푥=푥1,푥2,…,푥푛 푇 
•repeatfor all j in 1,2,…,n do 푥푗←푎푟푔푚푖푛푥푗푓푥 end foruntil convergence 
4
Overview of Coordinate descent method 
•Start with some initial guess 푥(0), and repeat for k = 1,2,3… 
•푥1(푘)∈푎푟푔푚푖푛푥1푓(푥1,푥2 푘−1,푥3 푘−1,…,푥푛 푘−1) 
•푥2(푘)∈푎푟푔푚푖푛푥2푓(푥1 푘,푥2,푥3 푘−1,…,푥푛 푘−1) 
•푥3(푘)∈푎푟푔푚푖푛푥3푓(푥1 푘,푥2 푘,푥3,…,푥푛 푘−1) 
… 
•푥푛 (푘)∈푎푟푔푚푖푛푥푛푓푥1 푘,푥2 푘,푥3 푘,…,푥푛 
•Every iteration, it goes each coordinate basis direction 
•c.f. Gradient Descent Method 
•Every iteration (step), it goes 훻푓= 휕푓 휕푥1 풆ퟏ+⋯+ 휕푓 휕푥푛 풆풏direction 
5
Properties of Coordinate Descent 
•Note: 
•Order of cycle through coordinates is arbitrary, can use any permutation of {1,2,…,n} 
•Cyclic order: 1,2,…,n,1,…, repeat 
•Almost Cyclic: Each coordinate 1<i<n picked at least once every B successive iterations (B>n) 
•Double sweep: 1,2,…,n,n-1,…,2,1, repeat 
•Cyclic with permutation: random order each cycle 
•Random sampling: pick random index at each iteration 
•Can everywhere replace individual coordinates with blocks of coordinates (Block Coordinate Descent Method) 
•“One-at-time” update scheme is critical, and “all-at-once” scheme does not necessarily converge 
6
Properties of Coordinate Descent 
•Advantages 
•Parallel algorithm is possible 
•No step size tuning 
•Each iteration usually cheap (single variable optimization) 
•No extra storage vectors needed 
•No other pesky parameters (usually) that must be tuned 
•Works well for large-scale problems 
•Very useful in cases where the actual gradient of 푓is not known 
•Easy to implement 
•Disadvantages 
•Tricky if single variable optimization is hard 
•Convergence theory can be complicated 
•Can be slower near optimum than more sophisticated methods 
•Non smooth case more tricky 
7
Convergence of Coordinate descent 
•Recall: 푥푖 푘+1←푎푟푔푚푖푛푦∈R푓(푥1 푘+1,…,푥푖−1 푘+1,푦,푥푖+1 푘,…,푥푛 푘) 
•Thus, one beings with an initial 푥0for a local minimum on F, and get a sequence 푿0,푿1,푿2,…iteratively 
•By doing line search in each iteration, we automatically have 
•퐹푿0≥퐹푿1≥퐹푿2≥⋯, 
•It can be shown that this sequence has similar convergence properties as steepest descent 
•No improvement after one cycle of line search along coordinate directions implies a stationary point is reached 
8
Convergence Analysis 
•For continuously differentiable cost functions, it can be shown to generate sequences whose limit points are stationary 
•Lemma 5.4 
•Proof 
•In the Caramanislecture note 
•Idea: show that limj→∞ 푥1(푘푗+1) −푥1 푘푗=0using limj→∞ 푧1 푘푗−푥1 푘푗=0 푤ℎ푒푟푒,푧푖 (푘)=(푥1 푘+1,…,푥푖 푘+1,푥푖+1 푘,…,푥푛 (푘)) 
9
Convergence Analysis 
•Question 
•Given convex, differentiable 푓:푅푛→푅, if we are at a point 푥s.t. 푓푥is minimized along each coordinate axis, have we found a global minimizer? 
•i.e., does 푓푥+푑∙푒푖≥푓푥푓표푟∀푑,푖→푓푥=min 푧 푓푧? 
•Here, 푒푖=0,…,1,…,0∈푅푛, the 푖-thstandard basis vector 
•Answer 
•Yes 
•Proof 
•훻푓푥= 휕푓 휕푥1 풆ퟏ+⋯+ 휕푓 휕푥푛 풆풏=0 
10
Convergence Analysis 
•Question 
•Same question but 푓is non differentiable? 
•Answer 
•No 
•Proof: Counterexample 
11
Convergence Analysis 
•Question 
•Same again, but now 푓푥=푔푥+Σ푖=1 푛ℎ푖푥푖 
•Where 푔convex, differentiable and each ℎ푖convex? 
•Here, non-smooth part called separable 
•Answer 
•Yes 
•Proof: for any 푦 
•푓푦−푓푥≥훻푔푥푇푦−푥+Σ푖=1 푛ℎ푖푦푖−ℎ푖푥푖 = 푖=1 푛 훻푖푔푥푦푖−푥푖+ℎ푖푦푖−ℎ푖푥푖≥0 
12 
≥0
Example 
13 
•Example Matlabcode 
•Reuse source code from http://www.mathworks.com/matlabcentral/fileexchange/35535- simplified-gradient-descent-optimization
14 
END OF DOCUMENT

More Related Content

What's hot

07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
Naive bayes
Naive bayesNaive bayes
Naive bayes
Ashraf Uddin
 
Gram-Schmidt and QR Decomposition (Factorization) of Matrices
Gram-Schmidt and QR Decomposition (Factorization) of MatricesGram-Schmidt and QR Decomposition (Factorization) of Matrices
Gram-Schmidt and QR Decomposition (Factorization) of Matrices
Isaac Yowetu
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
Junya Tanaka
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems
Ceni Babaoglu, PhD
 
2 discrete markov chain
2 discrete markov chain2 discrete markov chain
2 discrete markov chain
Windie Chan
 
Romberg's Integration
Romberg's IntegrationRomberg's Integration
Romberg's Integration
VARUN KUMAR
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
CloudxLab
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
VARUN KUMAR
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
Macha Pujitha
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
Syed Atif Naseem
 
Eigenvalue eigenvector slides
Eigenvalue eigenvector slidesEigenvalue eigenvector slides
Eigenvalue eigenvector slides
AmanSaeed11
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
Krish_ver2
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Sanghyuk Chun
 
Matrices and System of Linear Equations ppt
Matrices and System of Linear Equations pptMatrices and System of Linear Equations ppt
Matrices and System of Linear Equations ppt
Drazzer_Dhruv
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Edureka!
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
Brad Klingenberg
 
Valutazione della Qualità dei Servizi - The Space Cinema
Valutazione della Qualità dei Servizi - The Space CinemaValutazione della Qualità dei Servizi - The Space Cinema
Valutazione della Qualità dei Servizi - The Space Cinemamacdario
 

What's hot (20)

07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
07 approximate inference in bn
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Gram-Schmidt and QR Decomposition (Factorization) of Matrices
Gram-Schmidt and QR Decomposition (Factorization) of MatricesGram-Schmidt and QR Decomposition (Factorization) of Matrices
Gram-Schmidt and QR Decomposition (Factorization) of Matrices
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems
 
2 discrete markov chain
2 discrete markov chain2 discrete markov chain
2 discrete markov chain
 
Romberg's Integration
Romberg's IntegrationRomberg's Integration
Romberg's Integration
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
Eigenvalue eigenvector slides
Eigenvalue eigenvector slidesEigenvalue eigenvector slides
Eigenvalue eigenvector slides
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Matrices and System of Linear Equations ppt
Matrices and System of Linear Equations pptMatrices and System of Linear Equations ppt
Matrices and System of Linear Equations ppt
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
 
Valutazione della Qualità dei Servizi - The Space Cinema
Valutazione della Qualità dei Servizi - The Space CinemaValutazione della Qualità dei Servizi - The Space Cinema
Valutazione della Qualità dei Servizi - The Space Cinema
 

Viewers also liked

Introduction to E-book
Introduction to E-bookIntroduction to E-book
Introduction to E-book
Sanghyuk Chun
 
Markov Chain Basic
Markov Chain BasicMarkov Chain Basic
Markov Chain Basic
Sanghyuk Chun
 
Proms' portpolio ppt(~2014.2)
Proms' portpolio ppt(~2014.2)Proms' portpolio ppt(~2014.2)
Proms' portpolio ppt(~2014.2)
함 건수
 
알서포트 Rsupport
알서포트 Rsupport알서포트 Rsupport
알서포트 Rsupport
tistrue
 
Presentation skill up 프레젠테이션 기획과 표현
Presentation skill up 프레젠테이션 기획과 표현Presentation skill up 프레젠테이션 기획과 표현
Presentation skill up 프레젠테이션 기획과 표현
권수 김
 
Lesser tat presentation_111004
Lesser tat presentation_111004Lesser tat presentation_111004
Lesser tat presentation_111004
WONSEOK YI
 
Kwon portpolio
Kwon portpolioKwon portpolio
Kwon portpolio
Taek Hun Kwon
 
영화와 함께하는 ICT 기술-창원대학교 과학영재교육원
영화와 함께하는 ICT 기술-창원대학교 과학영재교육원영화와 함께하는 ICT 기술-창원대학교 과학영재교육원
영화와 함께하는 ICT 기술-창원대학교 과학영재교육원
Changwon National University
 
Portpolio
PortpolioPortpolio
Portpoliochakm
 
소프트웨어 테스팅
소프트웨어 테스팅소프트웨어 테스팅
소프트웨어 테스팅영기 김
 
이민의 포트폴리오
이민의 포트폴리오이민의 포트폴리오
이민의 포트폴리오Min Lee
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
Sanghyuk Chun
 
[2015-11월 정기 세미나] Open stack tokyo_summit_후기
[2015-11월 정기 세미나] Open stack tokyo_summit_후기[2015-11월 정기 세미나] Open stack tokyo_summit_후기
[2015-11월 정기 세미나] Open stack tokyo_summit_후기
OpenStack Korea Community
 

Viewers also liked (14)

Introduction to E-book
Introduction to E-bookIntroduction to E-book
Introduction to E-book
 
Markov Chain Basic
Markov Chain BasicMarkov Chain Basic
Markov Chain Basic
 
Proms' portpolio ppt(~2014.2)
Proms' portpolio ppt(~2014.2)Proms' portpolio ppt(~2014.2)
Proms' portpolio ppt(~2014.2)
 
1213 j wise sns
1213 j wise sns1213 j wise sns
1213 j wise sns
 
알서포트 Rsupport
알서포트 Rsupport알서포트 Rsupport
알서포트 Rsupport
 
Presentation skill up 프레젠테이션 기획과 표현
Presentation skill up 프레젠테이션 기획과 표현Presentation skill up 프레젠테이션 기획과 표현
Presentation skill up 프레젠테이션 기획과 표현
 
Lesser tat presentation_111004
Lesser tat presentation_111004Lesser tat presentation_111004
Lesser tat presentation_111004
 
Kwon portpolio
Kwon portpolioKwon portpolio
Kwon portpolio
 
영화와 함께하는 ICT 기술-창원대학교 과학영재교육원
영화와 함께하는 ICT 기술-창원대학교 과학영재교육원영화와 함께하는 ICT 기술-창원대학교 과학영재교육원
영화와 함께하는 ICT 기술-창원대학교 과학영재교육원
 
Portpolio
PortpolioPortpolio
Portpolio
 
소프트웨어 테스팅
소프트웨어 테스팅소프트웨어 테스팅
소프트웨어 테스팅
 
이민의 포트폴리오
이민의 포트폴리오이민의 포트폴리오
이민의 포트폴리오
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
[2015-11월 정기 세미나] Open stack tokyo_summit_후기
[2015-11월 정기 세미나] Open stack tokyo_summit_후기[2015-11월 정기 세미나] Open stack tokyo_summit_후기
[2015-11월 정기 세미나] Open stack tokyo_summit_후기
 

Similar to Coordinate Descent method

13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
KarasuLee
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
sagayalavanya2
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
KNaveenKumarECE
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
gnans Kgnanshek
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the the
sanjaibalajeessn
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
Sungchul Kim
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Maninda Edirisooriya
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
Jia-Bin Huang
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Sungchul Kim
 
Optimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methodsOptimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methods
SantiagoGarridoBulln
 
Optim_methods.pdf
Optim_methods.pdfOptim_methods.pdf
Optim_methods.pdf
SantiagoGarridoBulln
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
Fares Al-Qunaieer
 
cos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptcos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.ppt
devesh604174
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
ChenYiHuang5
 
Undecidable Problems and Approximation Algorithms
Undecidable Problems and Approximation AlgorithmsUndecidable Problems and Approximation Algorithms
Undecidable Problems and Approximation Algorithms
Muthu Vinayagam
 
ICANN19: Model-Agnostic Explanations for Decisions using Minimal Pattern
ICANN19: Model-Agnostic Explanations for Decisions using Minimal PatternICANN19: Model-Agnostic Explanations for Decisions using Minimal Pattern
ICANN19: Model-Agnostic Explanations for Decisions using Minimal Pattern
Kohei Asano
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
ChenYiHuang5
 
Sudoku
SudokuSudoku
Sudoku
Yara Ali
 
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
sleepy_yoshi
 

Similar to Coordinate Descent method (20)

13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the the
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Optimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methodsOptimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methods
 
Optim_methods.pdf
Optim_methods.pdfOptim_methods.pdf
Optim_methods.pdf
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
cos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptcos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.ppt
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
Undecidable Problems and Approximation Algorithms
Undecidable Problems and Approximation AlgorithmsUndecidable Problems and Approximation Algorithms
Undecidable Problems and Approximation Algorithms
 
ICANN19: Model-Agnostic Explanations for Decisions using Minimal Pattern
ICANN19: Model-Agnostic Explanations for Decisions using Minimal PatternICANN19: Model-Agnostic Explanations for Decisions using Minimal Pattern
ICANN19: Model-Agnostic Explanations for Decisions using Minimal Pattern
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 
Sudoku
SudokuSudoku
Sudoku
 
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
 

Recently uploaded

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

Coordinate Descent method

  • 1. Coordinate descent method 2013.11.21 SanghyukChun Many contents are from Large Scale Optimization Lecture 5 by Caramanis& Sanghaviin Texas Austin Optimization Lecture 25 by Geoff Gordon and Ryan Tibshiraniin CMU Convex Optimization Lecture 20 by SuvritSrain UC Berkeley 1
  • 2. Contents •Overview •Convergence Analysis •Examples 2
  • 3. Overview of Coordinate descent method •Idea •Recall: unconstrained minimization problem •From Lecture 1, the formation of an unconstrained optimization problem is as follows •min푓푥 •Where 푓:푅푛→푅is convex and smooth •In this problem, the necessary and sufficient condition for optimal solution x0 is •훻푓푥=0푎푡푥=푥0 •훻푓푥= 휕푓 휕푥1 풆ퟏ+⋯+ 휕푓 휕푥푛 풆풏=0 •Thus, in this situation, 휕푓 휕푥1=⋯= 휕푓 휕푥푛 =0 •What if minimize for each basis respectively? 3
  • 4. Overview of Coordinate descent method •Description •퐿푒푡푒1,푒2,…,푒푛is basis for function 푓 •If 푥푖 푘is given, the 푖thcoordinate of 푥푖 푘+1is given by •푥푖 푘+1←푎푟푔푚푖푛푦∈R푓(푥1 푘+1,…,푥푖−1 푘+1,푦,푥푖+1 푘,…,푥푛 푘) •푥푖 푘+1overwrites value in 푥푖 푘(in actual implementation) •Algorithm •Initialize with guess 푥=푥1,푥2,…,푥푛 푇 •repeatfor all j in 1,2,…,n do 푥푗←푎푟푔푚푖푛푥푗푓푥 end foruntil convergence 4
  • 5. Overview of Coordinate descent method •Start with some initial guess 푥(0), and repeat for k = 1,2,3… •푥1(푘)∈푎푟푔푚푖푛푥1푓(푥1,푥2 푘−1,푥3 푘−1,…,푥푛 푘−1) •푥2(푘)∈푎푟푔푚푖푛푥2푓(푥1 푘,푥2,푥3 푘−1,…,푥푛 푘−1) •푥3(푘)∈푎푟푔푚푖푛푥3푓(푥1 푘,푥2 푘,푥3,…,푥푛 푘−1) … •푥푛 (푘)∈푎푟푔푚푖푛푥푛푓푥1 푘,푥2 푘,푥3 푘,…,푥푛 •Every iteration, it goes each coordinate basis direction •c.f. Gradient Descent Method •Every iteration (step), it goes 훻푓= 휕푓 휕푥1 풆ퟏ+⋯+ 휕푓 휕푥푛 풆풏direction 5
  • 6. Properties of Coordinate Descent •Note: •Order of cycle through coordinates is arbitrary, can use any permutation of {1,2,…,n} •Cyclic order: 1,2,…,n,1,…, repeat •Almost Cyclic: Each coordinate 1<i<n picked at least once every B successive iterations (B>n) •Double sweep: 1,2,…,n,n-1,…,2,1, repeat •Cyclic with permutation: random order each cycle •Random sampling: pick random index at each iteration •Can everywhere replace individual coordinates with blocks of coordinates (Block Coordinate Descent Method) •“One-at-time” update scheme is critical, and “all-at-once” scheme does not necessarily converge 6
  • 7. Properties of Coordinate Descent •Advantages •Parallel algorithm is possible •No step size tuning •Each iteration usually cheap (single variable optimization) •No extra storage vectors needed •No other pesky parameters (usually) that must be tuned •Works well for large-scale problems •Very useful in cases where the actual gradient of 푓is not known •Easy to implement •Disadvantages •Tricky if single variable optimization is hard •Convergence theory can be complicated •Can be slower near optimum than more sophisticated methods •Non smooth case more tricky 7
  • 8. Convergence of Coordinate descent •Recall: 푥푖 푘+1←푎푟푔푚푖푛푦∈R푓(푥1 푘+1,…,푥푖−1 푘+1,푦,푥푖+1 푘,…,푥푛 푘) •Thus, one beings with an initial 푥0for a local minimum on F, and get a sequence 푿0,푿1,푿2,…iteratively •By doing line search in each iteration, we automatically have •퐹푿0≥퐹푿1≥퐹푿2≥⋯, •It can be shown that this sequence has similar convergence properties as steepest descent •No improvement after one cycle of line search along coordinate directions implies a stationary point is reached 8
  • 9. Convergence Analysis •For continuously differentiable cost functions, it can be shown to generate sequences whose limit points are stationary •Lemma 5.4 •Proof •In the Caramanislecture note •Idea: show that limj→∞ 푥1(푘푗+1) −푥1 푘푗=0using limj→∞ 푧1 푘푗−푥1 푘푗=0 푤ℎ푒푟푒,푧푖 (푘)=(푥1 푘+1,…,푥푖 푘+1,푥푖+1 푘,…,푥푛 (푘)) 9
  • 10. Convergence Analysis •Question •Given convex, differentiable 푓:푅푛→푅, if we are at a point 푥s.t. 푓푥is minimized along each coordinate axis, have we found a global minimizer? •i.e., does 푓푥+푑∙푒푖≥푓푥푓표푟∀푑,푖→푓푥=min 푧 푓푧? •Here, 푒푖=0,…,1,…,0∈푅푛, the 푖-thstandard basis vector •Answer •Yes •Proof •훻푓푥= 휕푓 휕푥1 풆ퟏ+⋯+ 휕푓 휕푥푛 풆풏=0 10
  • 11. Convergence Analysis •Question •Same question but 푓is non differentiable? •Answer •No •Proof: Counterexample 11
  • 12. Convergence Analysis •Question •Same again, but now 푓푥=푔푥+Σ푖=1 푛ℎ푖푥푖 •Where 푔convex, differentiable and each ℎ푖convex? •Here, non-smooth part called separable •Answer •Yes •Proof: for any 푦 •푓푦−푓푥≥훻푔푥푇푦−푥+Σ푖=1 푛ℎ푖푦푖−ℎ푖푥푖 = 푖=1 푛 훻푖푔푥푦푖−푥푖+ℎ푖푦푖−ℎ푖푥푖≥0 12 ≥0
  • 13. Example 13 •Example Matlabcode •Reuse source code from http://www.mathworks.com/matlabcentral/fileexchange/35535- simplified-gradient-descent-optimization
  • 14. 14 END OF DOCUMENT