SlideShare a Scribd company logo
Alternating Minimizations Converge to
Second-order Optimal Solutions
Qiuwei Li1
1 Colorado School of Mines
2 Johns Hopkins University
Joint work with Zhihui Zhu2 and Gongguo Tang1
1
Why is Alternating Minimization so popular?
⋆
v1( j)
u2(i)
v2( j)
u3(i)
v3( j)
min
X,Y
∥XY⊤
− M⋆
∥2
Ω
Many optimization problems have variables with natural partitions
Nonnegative MF Matrix sensing/completion
Tensor decomposition
Dictionary learning
Games
Blind deconvolution ……
EM algorithm
minimize
x,y
f(x, y)
2
Why is Alternating Minimization so popular?
yk = argminy f(xk−1, y)
xk = argminx f(x, yk)
Our Approach
Provide the 2nd-order convergence to partially solve the issue of “no
global optimality guarantee”.
✤Simple to implement : No
stepsize tuning
✤Good empirical performance
Advantages Disadvantages
❖No global optimality guarantee
for general problems
❖Only exists 1st-order
convergence
2
Why is Alternating Minimization so popular?
yk = argminy f(xk−1, y)
xk = argminx f(x, yk)
Theorem 1
Assume f is strongly bi-convex with a full-rank cross Hessian at all
strict saddles. Then AltMin almost surely converges to a 2nd-order
stationary point from random initialization.
Disadvantages
❖No global optimality guarantee
for general problems
❖Only exists 1st-order
convergence
✤Simple to implement : No
stepsize tuning
✤Good empirical performance
Advantages
3
Why second-order convergence is enough?
All local minima are globally optimal
No spurious local minima
[1] Jain et al. Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot
[2] Bhojanapalli et al. Global Optimality of Local Search for Low Rank Matrix Recovery
[3] Ge et al. Matrix Completion Has No Spurious Local Minimum
[4] Sun et al. Complete Dictionary Recovery over The Sphere
[5] Zhang et al. On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution
[6] Ge et al. Online Stochastic Gradient for Tensor Decomposition
All saddles are strict
Negative curvature
2nd-order optimal solution = globally optimal solution
Matrix factorization [1] Matrix sensing [2]
Tensor decomposition [6]
Dictionary learning [4]
Matrix completion [3]
Blind deconvolution [5]
3
Why second-order convergence is enough?
1st-order convergence + avoid strict saddles = 2nd-order convergence
It suffices to show alternating minimization avoids strict saddles!
All local minima are globally optimal
No spurious local minima All saddles are strict
Negative curvature
2nd-order optimal solution = globally optimal solution
Matrix factorization [1] Matrix sensing [2]
Tensor decomposition [6]
Dictionary learning [4]
Matrix completion [3]
Blind deconvolution [5]
4
How to show avoiding strict saddles?
A Key Result
Lee et al [1,2] use Stable Manifold Theorem [3] to
show that iterations defined by a global diffeom
avoids unstable fixed points.
An Improved Version (Zero-Property Theorem [4] + Max-Rank Theorem [5])
This work relaxes the global diffeom condition to show that a local
diffeom (at all unstable fixed points) can avoid unstable fixed points.
[1] Lee et al. Gradient Descent Converges to Minimizers.
[2] Lee et al. First-order Methods Almost Always Avoid Saddle Points
[3] Shub. Global Stability of Dynamical Systems
[4] Ponomarev et al. Submersions and Preimages of Sets of Measure Zero
[5] Bamber and Van. How Many Parameters Can A Model Have and still Be Testable
General Recipe
(1)Construct algorithm mapping g and show it is a local diffeom (i.e.,
Show Dg is nonsingular);
(2)Show all strict saddles of f are unstable fixed points of g;
A Proof Sketch
{
yk = ϕ(xk−1) = argminy f(xk−1, y)
xk = ψ(yk) = argminx f(x, yk)
⟹ xk = g(xk−1) ≐ ψ(ϕ(xk−1))
Dg(x⋆
) ∼ (∇2
x f(x⋆
, y⋆
)−1
2 ∇2
xy f(x⋆
, y⋆
)∇2
y f(x⋆
, y⋆
)−1
2
) (∇2
x f(x⋆
, y⋆
)−1
2 ∇2
xy f(x⋆
, y⋆
)∇2
y f(x⋆
, y⋆
)−1
2
)
⊤
LL⊤
∇2
f(x⋆
, y⋆
) =
[
∇2
x f(x⋆
, y⋆
)1/2
∇2
y f(x⋆
, y⋆
)1/2
] [
In L
L⊤
Im]
Φ
[
∇2
x f(x⋆
, y⋆
)1/2
∇2
y f(x⋆
, y⋆
)1/2
]
Construct the mapping
Compute the Jacobian (use Implicit function theorem and chain rule)
Show all strict saddles are “unstable” (Connect Dg with “Schur
complement” of the Hessian)
Finally, by using a Schur complement theorem:
∇2
f(x⋆
, y⋆
) ⪰̸ 0 ⟺ Φ ⪰̸ 0 ⟺ Φ/I ≐ I − LL⊤
⪰̸ 0 ⟺ ∥L∥ > 1 ⟺ ρ(Dg(x⋆
)) > 1. □
5
Proximal Alternating Minimization
xk = argminx f(x, yk−1) +
λ
2
∥x − xk−1∥2
2
yk = argminy f(xk, y) +
λ
2
∥y − yk−1∥2
2
Experiments on
Key Assumption (Lipschitz bi-smoothness)
max{∥∇2
x f(x, y)∥,∥∇2
y f(x, y)∥} ≤ L, ∀x, y
6
minimize
x,y
f(x, y)
Proximal Alternating Minimization
Theorem 2
Assume f is L-Lipschitz bi-smooth and . Then Proximal AltMin
almost surely converges to a 2nd-order stationary point from
random initialization.
λ > L
110

More Related Content

Similar to Slides_ICML v6.pdf

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa
 
exponen dan logaritma
exponen dan logaritmaexponen dan logaritma
exponen dan logaritma
Hanifa Zulfitri
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions result
Piyush Agarwal
 
A short study on minima distribution
A short study on minima distributionA short study on minima distribution
A short study on minima distribution
Loc Nguyen
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
The Statistical and Applied Mathematical Sciences Institute
 
Direct method for soliton solution
Direct method for soliton solutionDirect method for soliton solution
Direct method for soliton solution
MOHANRAJ PHYSICS
 
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
The Statistical and Applied Mathematical Sciences Institute
 
Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Software Development for Space-group Analysis: Magnetic Space Group and Irred...Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Kohei Shinohara
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
JelaiAujero
 
Gradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdfGradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdf
MTrang34
 
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorDual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Sebastian De Haro
 
735
735735
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
The Statistical and Applied Mathematical Sciences Institute
 
FUNCTIONS L.1.pdf
FUNCTIONS L.1.pdfFUNCTIONS L.1.pdf
FUNCTIONS L.1.pdf
Marjorie Malveda
 
Fractional programming (A tool for optimization)
Fractional programming (A tool for optimization)Fractional programming (A tool for optimization)
Fractional programming (A tool for optimization)
VARUN KUMAR
 
Recursive Compressed Sensing
Recursive Compressed SensingRecursive Compressed Sensing
Recursive Compressed Sensing
Pantelis Sopasakis
 
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Thoma Itoh
 
poster2
poster2poster2
poster2
Ryan Grove
 
Holographic Cotton Tensor
Holographic Cotton TensorHolographic Cotton Tensor
Holographic Cotton Tensor
Sebastian De Haro
 
Secure Domination in graphs
Secure Domination in graphsSecure Domination in graphs
Secure Domination in graphs
Mahesh Gadhwal
 

Similar to Slides_ICML v6.pdf (20)

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
exponen dan logaritma
exponen dan logaritmaexponen dan logaritma
exponen dan logaritma
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions result
 
A short study on minima distribution
A short study on minima distributionA short study on minima distribution
A short study on minima distribution
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
 
Direct method for soliton solution
Direct method for soliton solutionDirect method for soliton solution
Direct method for soliton solution
 
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
 
Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Software Development for Space-group Analysis: Magnetic Space Group and Irred...Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Software Development for Space-group Analysis: Magnetic Space Group and Irred...
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
 
Gradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdfGradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdf
 
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorDual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
 
735
735735
735
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
FUNCTIONS L.1.pdf
FUNCTIONS L.1.pdfFUNCTIONS L.1.pdf
FUNCTIONS L.1.pdf
 
Fractional programming (A tool for optimization)
Fractional programming (A tool for optimization)Fractional programming (A tool for optimization)
Fractional programming (A tool for optimization)
 
Recursive Compressed Sensing
Recursive Compressed SensingRecursive Compressed Sensing
Recursive Compressed Sensing
 
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
 
poster2
poster2poster2
poster2
 
Holographic Cotton Tensor
Holographic Cotton TensorHolographic Cotton Tensor
Holographic Cotton Tensor
 
Secure Domination in graphs
Secure Domination in graphsSecure Domination in graphs
Secure Domination in graphs
 

Recently uploaded

Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
cannyengineerings
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
Shiny Christobel
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
TIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptxTIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptx
CVCSOfficial
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
um7474492
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
ijseajournal
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
AlvianRamadhani5
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 

Recently uploaded (20)

Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
TIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptxTIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptx
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 

Slides_ICML v6.pdf

  • 1. Alternating Minimizations Converge to Second-order Optimal Solutions Qiuwei Li1 1 Colorado School of Mines 2 Johns Hopkins University Joint work with Zhihui Zhu2 and Gongguo Tang1
  • 2. 1 Why is Alternating Minimization so popular? ⋆ v1( j) u2(i) v2( j) u3(i) v3( j) min X,Y ∥XY⊤ − M⋆ ∥2 Ω Many optimization problems have variables with natural partitions Nonnegative MF Matrix sensing/completion Tensor decomposition Dictionary learning Games Blind deconvolution …… EM algorithm minimize x,y f(x, y)
  • 3. 2 Why is Alternating Minimization so popular? yk = argminy f(xk−1, y) xk = argminx f(x, yk) Our Approach Provide the 2nd-order convergence to partially solve the issue of “no global optimality guarantee”. ✤Simple to implement : No stepsize tuning ✤Good empirical performance Advantages Disadvantages ❖No global optimality guarantee for general problems ❖Only exists 1st-order convergence
  • 4. 2 Why is Alternating Minimization so popular? yk = argminy f(xk−1, y) xk = argminx f(x, yk) Theorem 1 Assume f is strongly bi-convex with a full-rank cross Hessian at all strict saddles. Then AltMin almost surely converges to a 2nd-order stationary point from random initialization. Disadvantages ❖No global optimality guarantee for general problems ❖Only exists 1st-order convergence ✤Simple to implement : No stepsize tuning ✤Good empirical performance Advantages
  • 5. 3 Why second-order convergence is enough? All local minima are globally optimal No spurious local minima [1] Jain et al. Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot [2] Bhojanapalli et al. Global Optimality of Local Search for Low Rank Matrix Recovery [3] Ge et al. Matrix Completion Has No Spurious Local Minimum [4] Sun et al. Complete Dictionary Recovery over The Sphere [5] Zhang et al. On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution [6] Ge et al. Online Stochastic Gradient for Tensor Decomposition All saddles are strict Negative curvature 2nd-order optimal solution = globally optimal solution Matrix factorization [1] Matrix sensing [2] Tensor decomposition [6] Dictionary learning [4] Matrix completion [3] Blind deconvolution [5]
  • 6. 3 Why second-order convergence is enough? 1st-order convergence + avoid strict saddles = 2nd-order convergence It suffices to show alternating minimization avoids strict saddles! All local minima are globally optimal No spurious local minima All saddles are strict Negative curvature 2nd-order optimal solution = globally optimal solution Matrix factorization [1] Matrix sensing [2] Tensor decomposition [6] Dictionary learning [4] Matrix completion [3] Blind deconvolution [5]
  • 7. 4 How to show avoiding strict saddles? A Key Result Lee et al [1,2] use Stable Manifold Theorem [3] to show that iterations defined by a global diffeom avoids unstable fixed points. An Improved Version (Zero-Property Theorem [4] + Max-Rank Theorem [5]) This work relaxes the global diffeom condition to show that a local diffeom (at all unstable fixed points) can avoid unstable fixed points. [1] Lee et al. Gradient Descent Converges to Minimizers. [2] Lee et al. First-order Methods Almost Always Avoid Saddle Points [3] Shub. Global Stability of Dynamical Systems [4] Ponomarev et al. Submersions and Preimages of Sets of Measure Zero [5] Bamber and Van. How Many Parameters Can A Model Have and still Be Testable General Recipe (1)Construct algorithm mapping g and show it is a local diffeom (i.e., Show Dg is nonsingular); (2)Show all strict saddles of f are unstable fixed points of g;
  • 8. A Proof Sketch { yk = ϕ(xk−1) = argminy f(xk−1, y) xk = ψ(yk) = argminx f(x, yk) ⟹ xk = g(xk−1) ≐ ψ(ϕ(xk−1)) Dg(x⋆ ) ∼ (∇2 x f(x⋆ , y⋆ )−1 2 ∇2 xy f(x⋆ , y⋆ )∇2 y f(x⋆ , y⋆ )−1 2 ) (∇2 x f(x⋆ , y⋆ )−1 2 ∇2 xy f(x⋆ , y⋆ )∇2 y f(x⋆ , y⋆ )−1 2 ) ⊤ LL⊤ ∇2 f(x⋆ , y⋆ ) = [ ∇2 x f(x⋆ , y⋆ )1/2 ∇2 y f(x⋆ , y⋆ )1/2 ] [ In L L⊤ Im] Φ [ ∇2 x f(x⋆ , y⋆ )1/2 ∇2 y f(x⋆ , y⋆ )1/2 ] Construct the mapping Compute the Jacobian (use Implicit function theorem and chain rule) Show all strict saddles are “unstable” (Connect Dg with “Schur complement” of the Hessian) Finally, by using a Schur complement theorem: ∇2 f(x⋆ , y⋆ ) ⪰̸ 0 ⟺ Φ ⪰̸ 0 ⟺ Φ/I ≐ I − LL⊤ ⪰̸ 0 ⟺ ∥L∥ > 1 ⟺ ρ(Dg(x⋆ )) > 1. □ 5
  • 9. Proximal Alternating Minimization xk = argminx f(x, yk−1) + λ 2 ∥x − xk−1∥2 2 yk = argminy f(xk, y) + λ 2 ∥y − yk−1∥2 2 Experiments on Key Assumption (Lipschitz bi-smoothness) max{∥∇2 x f(x, y)∥,∥∇2 y f(x, y)∥} ≤ L, ∀x, y 6 minimize x,y f(x, y) Proximal Alternating Minimization Theorem 2 Assume f is L-Lipschitz bi-smooth and . Then Proximal AltMin almost surely converges to a 2nd-order stationary point from random initialization. λ > L
  • 10. 110