SlideShare a Scribd company logo
1 of 10
Download to read offline
Alternating Minimizations Converge to
Second-order Optimal Solutions
Qiuwei Li1
1 Colorado School of Mines
2 Johns Hopkins University
Joint work with Zhihui Zhu2 and Gongguo Tang1
1
Why is Alternating Minimization so popular?
⋆
v1( j)
u2(i)
v2( j)
u3(i)
v3( j)
min
X,Y
∥XY⊤
− M⋆
∥2
Ω
Many optimization problems have variables with natural partitions
Nonnegative MF Matrix sensing/completion
Tensor decomposition
Dictionary learning
Games
Blind deconvolution ……
EM algorithm
minimize
x,y
f(x, y)
2
Why is Alternating Minimization so popular?
yk = argminy f(xk−1, y)
xk = argminx f(x, yk)
Our Approach
Provide the 2nd-order convergence to partially solve the issue of “no
global optimality guarantee”.
✤Simple to implement : No
stepsize tuning
✤Good empirical performance
Advantages Disadvantages
❖No global optimality guarantee
for general problems
❖Only exists 1st-order
convergence
2
Why is Alternating Minimization so popular?
yk = argminy f(xk−1, y)
xk = argminx f(x, yk)
Theorem 1
Assume f is strongly bi-convex with a full-rank cross Hessian at all
strict saddles. Then AltMin almost surely converges to a 2nd-order
stationary point from random initialization.
Disadvantages
❖No global optimality guarantee
for general problems
❖Only exists 1st-order
convergence
✤Simple to implement : No
stepsize tuning
✤Good empirical performance
Advantages
3
Why second-order convergence is enough?
All local minima are globally optimal
No spurious local minima
[1] Jain et al. Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot
[2] Bhojanapalli et al. Global Optimality of Local Search for Low Rank Matrix Recovery
[3] Ge et al. Matrix Completion Has No Spurious Local Minimum
[4] Sun et al. Complete Dictionary Recovery over The Sphere
[5] Zhang et al. On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution
[6] Ge et al. Online Stochastic Gradient for Tensor Decomposition
All saddles are strict
Negative curvature
2nd-order optimal solution = globally optimal solution
Matrix factorization [1] Matrix sensing [2]
Tensor decomposition [6]
Dictionary learning [4]
Matrix completion [3]
Blind deconvolution [5]
3
Why second-order convergence is enough?
1st-order convergence + avoid strict saddles = 2nd-order convergence
It suffices to show alternating minimization avoids strict saddles!
All local minima are globally optimal
No spurious local minima All saddles are strict
Negative curvature
2nd-order optimal solution = globally optimal solution
Matrix factorization [1] Matrix sensing [2]
Tensor decomposition [6]
Dictionary learning [4]
Matrix completion [3]
Blind deconvolution [5]
4
How to show avoiding strict saddles?
A Key Result
Lee et al [1,2] use Stable Manifold Theorem [3] to
show that iterations defined by a global diffeom
avoids unstable fixed points.
An Improved Version (Zero-Property Theorem [4] + Max-Rank Theorem [5])
This work relaxes the global diffeom condition to show that a local
diffeom (at all unstable fixed points) can avoid unstable fixed points.
[1] Lee et al. Gradient Descent Converges to Minimizers.
[2] Lee et al. First-order Methods Almost Always Avoid Saddle Points
[3] Shub. Global Stability of Dynamical Systems
[4] Ponomarev et al. Submersions and Preimages of Sets of Measure Zero
[5] Bamber and Van. How Many Parameters Can A Model Have and still Be Testable
General Recipe
(1)Construct algorithm mapping g and show it is a local diffeom (i.e.,
Show Dg is nonsingular);
(2)Show all strict saddles of f are unstable fixed points of g;
A Proof Sketch
{
yk = ϕ(xk−1) = argminy f(xk−1, y)
xk = ψ(yk) = argminx f(x, yk)
⟹ xk = g(xk−1) ≐ ψ(ϕ(xk−1))
Dg(x⋆
) ∼ (∇2
x f(x⋆
, y⋆
)−1
2 ∇2
xy f(x⋆
, y⋆
)∇2
y f(x⋆
, y⋆
)−1
2
) (∇2
x f(x⋆
, y⋆
)−1
2 ∇2
xy f(x⋆
, y⋆
)∇2
y f(x⋆
, y⋆
)−1
2
)
⊤
LL⊤
∇2
f(x⋆
, y⋆
) =
[
∇2
x f(x⋆
, y⋆
)1/2
∇2
y f(x⋆
, y⋆
)1/2
] [
In L
L⊤
Im]
Φ
[
∇2
x f(x⋆
, y⋆
)1/2
∇2
y f(x⋆
, y⋆
)1/2
]
Construct the mapping
Compute the Jacobian (use Implicit function theorem and chain rule)
Show all strict saddles are “unstable” (Connect Dg with “Schur
complement” of the Hessian)
Finally, by using a Schur complement theorem:
∇2
f(x⋆
, y⋆
) ⪰̸ 0 ⟺ Φ ⪰̸ 0 ⟺ Φ/I ≐ I − LL⊤
⪰̸ 0 ⟺ ∥L∥ > 1 ⟺ ρ(Dg(x⋆
)) > 1. □
5
Proximal Alternating Minimization
xk = argminx f(x, yk−1) +
λ
2
∥x − xk−1∥2
2
yk = argminy f(xk, y) +
λ
2
∥y − yk−1∥2
2
Experiments on
Key Assumption (Lipschitz bi-smoothness)
max{∥∇2
x f(x, y)∥,∥∇2
y f(x, y)∥} ≤ L, ∀x, y
6
minimize
x,y
f(x, y)
Proximal Alternating Minimization
Theorem 2
Assume f is L-Lipschitz bi-smooth and . Then Proximal AltMin
almost surely converges to a 2nd-order stationary point from
random initialization.
λ > L
110

More Related Content

Similar to Slides_ICML v6.pdf

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultPiyush Agarwal
 
A short study on minima distribution
A short study on minima distributionA short study on minima distribution
A short study on minima distributionLoc Nguyen
 
Direct method for soliton solution
Direct method for soliton solutionDirect method for soliton solution
Direct method for soliton solutionMOHANRAJ PHYSICS
 
Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Software Development for Space-group Analysis: Magnetic Space Group and Irred...Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Software Development for Space-group Analysis: Magnetic Space Group and Irred...Kohei Shinohara
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite IntegralJelaiAujero
 
Gradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdfGradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdfMTrang34
 
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorDual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorSebastian De Haro
 
Fractional programming (A tool for optimization)
Fractional programming (A tool for optimization)Fractional programming (A tool for optimization)
Fractional programming (A tool for optimization)VARUN KUMAR
 
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)Thoma Itoh
 
Secure Domination in graphs
Secure Domination in graphsSecure Domination in graphs
Secure Domination in graphsMahesh Gadhwal
 

Similar to Slides_ICML v6.pdf (20)

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
exponen dan logaritma
exponen dan logaritmaexponen dan logaritma
exponen dan logaritma
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions result
 
A short study on minima distribution
A short study on minima distributionA short study on minima distribution
A short study on minima distribution
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
 
Direct method for soliton solution
Direct method for soliton solutionDirect method for soliton solution
Direct method for soliton solution
 
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
 
Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Software Development for Space-group Analysis: Magnetic Space Group and Irred...Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Software Development for Space-group Analysis: Magnetic Space Group and Irred...
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
 
Gradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdfGradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdf
 
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorDual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
 
735
735735
735
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
FUNCTIONS L.1.pdf
FUNCTIONS L.1.pdfFUNCTIONS L.1.pdf
FUNCTIONS L.1.pdf
 
Fractional programming (A tool for optimization)
Fractional programming (A tool for optimization)Fractional programming (A tool for optimization)
Fractional programming (A tool for optimization)
 
Recursive Compressed Sensing
Recursive Compressed SensingRecursive Compressed Sensing
Recursive Compressed Sensing
 
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
 
poster2
poster2poster2
poster2
 
Holographic Cotton Tensor
Holographic Cotton TensorHolographic Cotton Tensor
Holographic Cotton Tensor
 
Secure Domination in graphs
Secure Domination in graphsSecure Domination in graphs
Secure Domination in graphs
 

Recently uploaded

Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 

Recently uploaded (20)

Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 

Slides_ICML v6.pdf

  • 1. Alternating Minimizations Converge to Second-order Optimal Solutions Qiuwei Li1 1 Colorado School of Mines 2 Johns Hopkins University Joint work with Zhihui Zhu2 and Gongguo Tang1
  • 2. 1 Why is Alternating Minimization so popular? ⋆ v1( j) u2(i) v2( j) u3(i) v3( j) min X,Y ∥XY⊤ − M⋆ ∥2 Ω Many optimization problems have variables with natural partitions Nonnegative MF Matrix sensing/completion Tensor decomposition Dictionary learning Games Blind deconvolution …… EM algorithm minimize x,y f(x, y)
  • 3. 2 Why is Alternating Minimization so popular? yk = argminy f(xk−1, y) xk = argminx f(x, yk) Our Approach Provide the 2nd-order convergence to partially solve the issue of “no global optimality guarantee”. ✤Simple to implement : No stepsize tuning ✤Good empirical performance Advantages Disadvantages ❖No global optimality guarantee for general problems ❖Only exists 1st-order convergence
  • 4. 2 Why is Alternating Minimization so popular? yk = argminy f(xk−1, y) xk = argminx f(x, yk) Theorem 1 Assume f is strongly bi-convex with a full-rank cross Hessian at all strict saddles. Then AltMin almost surely converges to a 2nd-order stationary point from random initialization. Disadvantages ❖No global optimality guarantee for general problems ❖Only exists 1st-order convergence ✤Simple to implement : No stepsize tuning ✤Good empirical performance Advantages
  • 5. 3 Why second-order convergence is enough? All local minima are globally optimal No spurious local minima [1] Jain et al. Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot [2] Bhojanapalli et al. Global Optimality of Local Search for Low Rank Matrix Recovery [3] Ge et al. Matrix Completion Has No Spurious Local Minimum [4] Sun et al. Complete Dictionary Recovery over The Sphere [5] Zhang et al. On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution [6] Ge et al. Online Stochastic Gradient for Tensor Decomposition All saddles are strict Negative curvature 2nd-order optimal solution = globally optimal solution Matrix factorization [1] Matrix sensing [2] Tensor decomposition [6] Dictionary learning [4] Matrix completion [3] Blind deconvolution [5]
  • 6. 3 Why second-order convergence is enough? 1st-order convergence + avoid strict saddles = 2nd-order convergence It suffices to show alternating minimization avoids strict saddles! All local minima are globally optimal No spurious local minima All saddles are strict Negative curvature 2nd-order optimal solution = globally optimal solution Matrix factorization [1] Matrix sensing [2] Tensor decomposition [6] Dictionary learning [4] Matrix completion [3] Blind deconvolution [5]
  • 7. 4 How to show avoiding strict saddles? A Key Result Lee et al [1,2] use Stable Manifold Theorem [3] to show that iterations defined by a global diffeom avoids unstable fixed points. An Improved Version (Zero-Property Theorem [4] + Max-Rank Theorem [5]) This work relaxes the global diffeom condition to show that a local diffeom (at all unstable fixed points) can avoid unstable fixed points. [1] Lee et al. Gradient Descent Converges to Minimizers. [2] Lee et al. First-order Methods Almost Always Avoid Saddle Points [3] Shub. Global Stability of Dynamical Systems [4] Ponomarev et al. Submersions and Preimages of Sets of Measure Zero [5] Bamber and Van. How Many Parameters Can A Model Have and still Be Testable General Recipe (1)Construct algorithm mapping g and show it is a local diffeom (i.e., Show Dg is nonsingular); (2)Show all strict saddles of f are unstable fixed points of g;
  • 8. A Proof Sketch { yk = ϕ(xk−1) = argminy f(xk−1, y) xk = ψ(yk) = argminx f(x, yk) ⟹ xk = g(xk−1) ≐ ψ(ϕ(xk−1)) Dg(x⋆ ) ∼ (∇2 x f(x⋆ , y⋆ )−1 2 ∇2 xy f(x⋆ , y⋆ )∇2 y f(x⋆ , y⋆ )−1 2 ) (∇2 x f(x⋆ , y⋆ )−1 2 ∇2 xy f(x⋆ , y⋆ )∇2 y f(x⋆ , y⋆ )−1 2 ) ⊤ LL⊤ ∇2 f(x⋆ , y⋆ ) = [ ∇2 x f(x⋆ , y⋆ )1/2 ∇2 y f(x⋆ , y⋆ )1/2 ] [ In L L⊤ Im] Φ [ ∇2 x f(x⋆ , y⋆ )1/2 ∇2 y f(x⋆ , y⋆ )1/2 ] Construct the mapping Compute the Jacobian (use Implicit function theorem and chain rule) Show all strict saddles are “unstable” (Connect Dg with “Schur complement” of the Hessian) Finally, by using a Schur complement theorem: ∇2 f(x⋆ , y⋆ ) ⪰̸ 0 ⟺ Φ ⪰̸ 0 ⟺ Φ/I ≐ I − LL⊤ ⪰̸ 0 ⟺ ∥L∥ > 1 ⟺ ρ(Dg(x⋆ )) > 1. □ 5
  • 9. Proximal Alternating Minimization xk = argminx f(x, yk−1) + λ 2 ∥x − xk−1∥2 2 yk = argminy f(xk, y) + λ 2 ∥y − yk−1∥2 2 Experiments on Key Assumption (Lipschitz bi-smoothness) max{∥∇2 x f(x, y)∥,∥∇2 y f(x, y)∥} ≤ L, ∀x, y 6 minimize x,y f(x, y) Proximal Alternating Minimization Theorem 2 Assume f is L-Lipschitz bi-smooth and . Then Proximal AltMin almost surely converges to a 2nd-order stationary point from random initialization. λ > L
  • 10. 110