SlideShare a Scribd company logo
1 of 58
Download to read offline
Extending Preconditioned GMRES
    to Nonlinear Optimization

        Hans De Sterck
        Department of Applied Mathematics
        University of Waterloo, Canada


        SIAM Conference on Applied Linear Algebra
        Valencia, Spain, June 2012
this talk is about
        “accelerating convergence of iterative
            nonlinear optimization methods”

the approach will be to
  “extend concepts of preconditioned GMRES for
       linear systems to nonlinear optimization”
        (note: fully nonlinear preconditioning)
hi Evelyne (born May 30, 2012)
1. background: convergence acceleration
                 for linear systems
•  start from simple example: finite difference discretization
   of Poisson equation on unit square with homogeneous
   Dirichlet boundary conditions




•  linear system:
convergence acceleration for linear systems

•  simple iterative method: Gauss-Seidel (GS)




•  stationary iterative method:
convergence acceleration for linear systems

•  GS converges slowly
•  number of iterations
   grows as grid is refined
•  can we accelerate GS?
   yes!
    –  GMRES acceleration
       (generalized minimal
       residual method)
    –  multigrid acceleration
convergence acceleration for linear systems

•  GS converges slowly
•  number of iterations
   grows as grid is refined
•  can we accelerate GS?
   yes!
    –  GMRES acceleration
       (generalized minimal
       residual method)
    –  multigrid acceleration
GMRES as an acceleration mechanism
                                       (Washio and
GMRES for linear systems:              Oosterlee, ETNA,
                                       1997)

•  stationary iterative method
•  generates residuals recursively:

•  define
GMRES as an acceleration mechanism
                                        (Washio and Oosterlee,
GMRES for linear systems:               ETNA, 1997)


•  stationary iterative process
   generates preconditioned residuals that build
   Krylov space



•  GMRES: take optimal linear combination of
   residuals in Krylov space to minimize the
   residual
GMRES as an acceleration mechanism
                                          (Washio and Oosterlee,
                                          ETNA, 1997)




•  seek optimal approximation
                             that minimizes
•  this is mathematically “essentially equivalent” to
   right-preconditioned GMRES
•  the stationary iterative method is the preconditioner
•  GMRES accelerates the stationary iterative method
alternative: multigrid as an acceleration mechanism




•  multigrid accelerates the smoother (Gauss-Seidel)

•  GMRES (or: CG, ...) and multigrid are ways to accelerate
   Gauss-Seidel
2. nonlinear optimization – tensor decomposition

•  consider simple iterative optimization methods
   for smooth nonlinear optimization problem




•  can we accelerate the convergence of simple
   iterative optimization methods?
application: canonical tensor decomposition

•  tensor = element of tensor product of real vector
   spaces (N-dimensional array)
•  N=3:



           (from “Tensor Decompositions and Applications”, Kolda and Bader, SIAM Rev., 2009 [1])


•  canonical decomposition: decompose tensor in
   sum of R rank-one terms (approximately)
canonical tensor decomposition




(problem is non-convex, multiple (local) minima, solution may not exist
(ill-posed), ... ; but smooth, and we assume there is a local minimum)
                      (de Silva and Lim, SIMAX, 2009)
link with singular value decomposition

•  SVD of



•  canonical decomposition of tensor
a difference with the SVD

truncated SVD is best rank-R approximation:




BUT best rank-R tensor cannot be obtained by
truncation: different optimization problems for different R!
tensor approximation applications

(1) “Discussion Tracking in Enron Email Using PARAFAC” by
   Bader, Berry and Browne (2008) (sparse, nonnegative)
tensor approximation applications

(2) chemometrics: analyze
   spectrofluorometer data
   (dense) (Bro et al.,
     http://www.models.life.ku.dk/nwaydata1)
•    5 x 201 x 61 tensor: 5 samples (with
     different mixtures of three amino
     acids), 61 excitation wavelengths,
     201 emission wavelengths
•    goal: recover emission spectra of
     the three amino acids (to determine
     what was in each sample, and in
     which concentration)
•    also: psychometrics, ...

                                               (from [1])
‘workhorse’ algorithm: alternating least squares (ALS)




 (1) freeze all ar(2), ar(3), compute optimal ar(1) via a
     least-squares solution (linear, overdetermined)
 (2) freeze ar(1), ar(3), compute ar(2)
 (3) freeze ar(1), ar(2), compute ar(3)
 •  repeat


                                            (from [1])
alternating least squares (ALS)




•    “simple iterative optimization method”
•    ALS is block nonlinear Gauss-Seidel
•    ALS is monotone
•    ALS is sometimes fast, but can also be extremely
     slow (depending on problem and initial condition)
     (convergence: Uschmajew’s talk)
alternating least squares (ALS)



fast case                                     slow case




     (we used Matlab with Tensor Toolbox (Bader and Kolda) and
     Poblano Toolbox (Dunlavy et al.) for all computations)
alternating least squares (ALS)




•  for linear systems           : accelerate simple iterative method with
               − GMRES
               − multigrid
•  the simple iterative method is called the ‘preconditioner’
•  let’s try to accelerate ALS for the tensor optimization problem
•  for nonlinear optimization problems, general approaches to accelerate
   simple iterative methods (like ALS) appear uncommon (e.g., not in
   Nocedal and Wright) (“nonlinear preconditioning” does not appear to
   be a common notion in optimization)
•  issues: nonlinear, optimization context
3. existing methods: convergence acceleration for
                nonlinear systems
•  nonlinear system:



•  equivalent: fixed-point equation


                                      (‘simple iterative method’)
convergence acceleration for nonlinear systems
•  existing convergence acceleration method for nonlinear
   systems           : (“iterate recombination”)
   –  Anderson acceleration (1965) (popular in electronic structure calculation)
   –  Pulay mixing – direct inversion in the iterative subspace (DIIS) (1980)
   –  Washio and Oosterlee (1997): Krylov acceleration for nonlinear multigrid
   also related to:                                                   (in PETSc)
   –  GMRES (Saad and Schultz, 1986), flexible GMRES (Saad, 1993)
   recent papers on Anderson mixing/DIIS for nonlinear systems:
   –  Fang and Saad (2009)
   –  Walker and Ni (2011)
   –  Rohwedder and Schneider (2011)
   plus many other papers with similar ideas and developments
•  we will use this type of approach as a building block of our
   N-GMRES nonlinearly preconditioned optimization algorithm
4. nonlinear GMRES optimization method (N-GMRES)




                                             (ALS)




                               (Moré-Thuente line search,
                               satisfies Wolfe conditions)
step II: N-GMRES acceleration:
step II: N-GMRES acceleration:



•  windowed implementation
•  several possibilities to solve the
   small least-squares problems:
    –  normal equations (with
       reuse of scalar products;
       cost 2nw) (Washio and
       Oosterlee, 1997)
    –  QR with factor updating
       (Walker and Ni, 2011)
    –  SVD and rank-revealing QR
       (Fang and Saad, 2009)
numerical results for ALS-preconditioned N-GMRES
             applied to tensor problem
•  dense test problem (from Tomasi and Bro; Acar et al.):
    –  random rank-R tensor modified to obtain specific column
       collinearity, with added noise
numerical results: dense test problem
dense test problem: optimal window size
dense test problem: comparison




                (N-CG from Acar, Dunlavy and Kolda, 2011)
numerical results: sparse test problem

•  sparse test problem:
    –  d-dimensional finite difference Laplacian (2d-way tensor)
comparing N-GMRES to GMRES     (Washio and
                                       Oosterlee, ETNA,
                                       1997)




•  GMRES: minimize
•  seek optimal approximation




                          same as for N-GMRES
N-GMRES: what’s in a name...
•  CG for
        
  N-CG for
conjugate gradient (CG) for




                  (Nocedal and Wright, 2006)
nonlinear conjugate gradient (N-CG)




                   (Nocedal and Wright, 2006)
N-GMRES
N-GMRES: what’s in a name...
•  CG for
        
  N-CG for

•  GMRES for
       "
 #N-GMRES for

        (symmetry...)
5. N-GMRES as a general optimization method

general methods for nonlinear optimization (smooth, unconstrained)
  (“Numerical Optimization”, Nocedal and Wright, 2006)

1.    steepest descent with line search
2.    Newton with line search
3.    nonlinear conjugate gradient (N-CG) with line search
4.    trust-region methods
5.    quasi-Newton methods (includes Broyden–Fletcher–Goldfarb–
      Shanno (BFGS) and limited memory version L-BFGS)

6.  preconditioned N-GMRES as a general optimization method?
general N-GMRES optimization method

•    first question: what would be a general preconditioner?




•    idea: general N-GMRES preconditioner
        = update in direction of steepest descent
        (or: use N-GMRES to accelerate steepest descent)
steepest-descent preconditioning




•  option A: steepest descent with line search
•  option B: steepest descent with predefined small step
•  claim: steepest descent is the ‘natural’ preconditioner for
   N-GMRES (note also: link with fixed-point equation)
steepest-descent preconditioning
•  claim: steepest descent is the ‘natural’ preconditioner for
   N-GMRES
•  example: consider simple quadratic optimization problem



•  we know                                            so
                          becomes

•  this gives the same residuals as
   with         : steepest-descent N-GMRES preconditioner
   corresponds to identity preconditioner for linear GMRES
                    (and: small step is sufficient)
numerical results: steepest-descent
          preconditioning



                         •  steepest descent by
                            itself is slow
                         •  N-GMRES with
                            steepest descent
                            preconditioning is
                            competitive with N-
                            CG and L-BFGS
                         •  option A slower than
                            option B (small step)
convergence of steepest-descent preconditioned
           N-GMRES optimization
•  assume line searches give solutions that satisfy
   Wolfe conditions:




                           (Nocedal and
                           Wright, 2006)
convergence of steepest-descent preconditioned N-
              GMRES optimization
convergence of steepest-descent preconditioned N-
              GMRES optimization
sketch of (simple!) proof
• 

•  use Zoutendijk’s theorem:
   with                   and thus

•  all ui are followed by a steepest descent step, so

•  global convergence to a stationary point for general f(u)
                                (note also: Absil and Gallivan,
                                2009)
general N-GMRES optimization method

general methods for nonlinear optimization (smooth, unconstrained)
  (“Numerical Optimization”, Nocedal and Wright, 2006)

1.    steepest descent with line search
2.    Newton with line search
3.    nonlinear conjugate gradient (N-CG) with line search
4.    trust-region methods
5.    quasi-Newton methods (includes Broyden–Fletcher–Goldfarb–
      Shanno (BFGS) and limited memory version L-BFGS)

6.  preconditioned N-GMRES as a general optimization method
a few more notes on related methods for
•  acceleration step in N-GMRES is similar to Anderson
   acceleration and DIIS, but not exactly the same
•  “mathematical equivalence” with GMRES in the linear
   case is discussed by Washio and Oosterlee (1997), Walker and Ni
  (2011), Rohwedder and Schneider (2011), and others
•  equivalence of Anderson/DIIS with certain multisecant
   update methods (Broyden) is discussed by Fang and Saad
  (2009), Rohwedder and Schneider (2011), and others
•  ‘nonlinear preconditioning’ for N-CG, L-BFGS,
   multisecant methods is not commonly considered
•  ‘nonlinear preconditioning’ view is natural for the N-
   GMRES optimization framework
a few more notes on related methods for
•  as mentioned before, there are quite a few other papers that present
   related ideas:
    –  Eirola and Nevanlinna, Accelerating with rank-one updates, 1989
    –  Brown and Saad, Hybrid Krylov methods for nonlinear systems
       of equations, 1990
    –  Vuik and van der Vorst, A comparison of some GMRES-like
       methods, 1992
    –  Saad (1993): flexible GMRES
    –  Fokkema, Sleijpen and van der Vorst, Accelerated Inexact
       Newton Schemes for Large Systems of Nonlinear Equations,
       1998
    –  etc.
6. conclusions
•  N-GMRES optimization method:
   –  extends concept of preconditioned GMRES to
      nonlinear optimization (nonlinear preconditioning)
   –  uses iterate recombination (like Anderson
      acceleration, DIIS) as an important building block
conclusions
•  N-GMRES optimization method:
   accelerates simple iterative optimization method
   (the nonlinear preconditioner) (ALS)
•  just like GMRES accelerates stationary iterative
   method (preconditioner) for
conclusions
•  N-GMRES (with ALS preconditioning) gives strongly
   improved solver for canonical tensor decomposition
•  note: we can also do nonlinear adaptive multigrid
   acceleration for the tensor nonlinear optimization problem!
  see MS 37. Optimization methods for tensor decomposition, Wednesday
  12:15–12:40 “An algebraic multigrid optimization method for low-rank canonical tensor
  decomposition”, Killian Miller (ALS smoother)
                                                                         multigrid
conclusions
•  N-GMRES optimization method:
   –  a general, convergent method (with steepest-descent
     preconditioning)

   –  appears competitive with N-CG, L-BFGS
   –  ‘nonlinear preconditioning’ viewpoint is key
open questions: convergence speed of N-GMRES

•  GMRES (linear case): convergence rate can be
   analyzed by considering optimal polynomials etc.
•  convergence speed of N-GMRES for
   optimization: many open questions
  (cfr. Rohwedder and Schneider (2011) for DIIS, superlinear convergence?,
  multisecant)



•  we should try more applications... (to ALS for
   other tensor decompositions (see e.g.
   Grasedyck’s talk), and to other problems)
conclusions

•  real power of N-GMRES: N-GMRES optimization
   framework can employ sophisticated nonlinear
   preconditioners (use ALS in tensor case)
•  power lies in good preconditioners (like case of
   GMRES for linear systems)
the power of N-GMRES optimization
         (tensor problem)



                               steepest
                               descent is not
                               good enough
                               as a
                               preconditioner
                               for N-GMRES
                               (for the tensor
                               problem)
the power of N-GMRES optimization
         (tensor problem)




                               ALS is a much
                               better
                               preconditioner!
•  thank you
                             •  questions?

  Hans De Sterck, ‘A Nonlinear GMRES Optimization Algorithm for Canonical
   Tensor Decomposition’, SIAM J. Sci. Comp., 2012
  Hans De Sterck, ‘Steepest Descent Preconditioning for Nonlinear GMRES
   Optimization’, Numer. Linear Algebra Appl., 2012
  Hans De Sterck and Killian Miller, ‘An Adaptive Algebraic Multigrid Algorithm
   for Low-Rank Canonical Tensor Decomposition’, submitted to SIAM J. Sci.
   Comp., 2011

More Related Content

What's hot

Chester Nov08 Terry Lynch
Chester Nov08 Terry LynchChester Nov08 Terry Lynch
Chester Nov08 Terry LynchTerry Lynch
 
Particle filter
Particle filterParticle filter
Particle filterbugway
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Chiheb Ben Hammouda
 
02 2d systems matrix
02 2d systems matrix02 2d systems matrix
02 2d systems matrixRumah Belajar
 
On gradient Ricci solitons
On gradient Ricci solitonsOn gradient Ricci solitons
On gradient Ricci solitonsmnlfdzlpz
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata
 
The univalence of some integral operators
The univalence of some integral operatorsThe univalence of some integral operators
The univalence of some integral operatorsAlexander Decker
 
11.the univalence of some integral operators
11.the univalence of some integral operators11.the univalence of some integral operators
11.the univalence of some integral operatorsAlexander Decker
 
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...grssieee
 
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
SOCG: Linear-Size Approximations to the Vietoris-Rips FiltrationSOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
SOCG: Linear-Size Approximations to the Vietoris-Rips FiltrationDon Sheehy
 
Mixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete ObservationsMixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete Observationssipij
 
Probing nucleon structure from Lattice QCD simulations
Probing nucleon structure from Lattice QCD simulationsProbing nucleon structure from Lattice QCD simulations
Probing nucleon structure from Lattice QCD simulationsChristos Kallidonis
 
Cemsinan Deliduman "Astrophysics with Weyl Gravity"
Cemsinan Deliduman "Astrophysics with Weyl Gravity"Cemsinan Deliduman "Astrophysics with Weyl Gravity"
Cemsinan Deliduman "Astrophysics with Weyl Gravity"SEENET-MTP
 
Strum Liouville Problems in Eigenvalues and Eigenfunctions
Strum Liouville Problems in Eigenvalues and EigenfunctionsStrum Liouville Problems in Eigenvalues and Eigenfunctions
Strum Liouville Problems in Eigenvalues and Eigenfunctionsijtsrd
 
Neuronal self-organized criticality
Neuronal self-organized criticalityNeuronal self-organized criticality
Neuronal self-organized criticalityOsame Kinouchi
 

What's hot (20)

Chester Nov08 Terry Lynch
Chester Nov08 Terry LynchChester Nov08 Terry Lynch
Chester Nov08 Terry Lynch
 
Particle filter
Particle filterParticle filter
Particle filter
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
 
02 2d systems matrix
02 2d systems matrix02 2d systems matrix
02 2d systems matrix
 
On gradient Ricci solitons
On gradient Ricci solitonsOn gradient Ricci solitons
On gradient Ricci solitons
 
Md2521102111
Md2521102111Md2521102111
Md2521102111
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
 
The univalence of some integral operators
The univalence of some integral operatorsThe univalence of some integral operators
The univalence of some integral operators
 
11.the univalence of some integral operators
11.the univalence of some integral operators11.the univalence of some integral operators
11.the univalence of some integral operators
 
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
 
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
SOCG: Linear-Size Approximations to the Vietoris-Rips FiltrationSOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
 
SSA slides
SSA slidesSSA slides
SSA slides
 
03 image transform
03 image transform03 image transform
03 image transform
 
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
 
Mixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete ObservationsMixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete Observations
 
Probing nucleon structure from Lattice QCD simulations
Probing nucleon structure from Lattice QCD simulationsProbing nucleon structure from Lattice QCD simulations
Probing nucleon structure from Lattice QCD simulations
 
Cemsinan Deliduman "Astrophysics with Weyl Gravity"
Cemsinan Deliduman "Astrophysics with Weyl Gravity"Cemsinan Deliduman "Astrophysics with Weyl Gravity"
Cemsinan Deliduman "Astrophysics with Weyl Gravity"
 
Bird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysisBird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysis
 
Strum Liouville Problems in Eigenvalues and Eigenfunctions
Strum Liouville Problems in Eigenvalues and EigenfunctionsStrum Liouville Problems in Eigenvalues and Eigenfunctions
Strum Liouville Problems in Eigenvalues and Eigenfunctions
 
Neuronal self-organized criticality
Neuronal self-organized criticalityNeuronal self-organized criticality
Neuronal self-organized criticality
 

Similar to Extending Preconditioned GMRES to Nonlinear Optimization

Grds international conference on pure and applied science (6)
Grds international conference on pure and applied science (6)Grds international conference on pure and applied science (6)
Grds international conference on pure and applied science (6)Global R & D Services
 
2005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_20052005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_2005CosmoSantiago
 
maths_ presentation.pptx
maths_ presentation.pptxmaths_ presentation.pptx
maths_ presentation.pptxMoinPasha12
 
Prestation_ClydeShen
Prestation_ClydeShenPrestation_ClydeShen
Prestation_ClydeShenClyde Shen
 
Support Vector Machine Techniques for Nonlinear Equalization
Support Vector Machine Techniques for Nonlinear EqualizationSupport Vector Machine Techniques for Nonlinear Equalization
Support Vector Machine Techniques for Nonlinear EqualizationShamman Noor Shoudha
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
Sparsenet
SparsenetSparsenet
Sparsenetndronen
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptxThAnhonc
 
Data Analysis and Algorithms Lecture 1: Introduction
 Data Analysis and Algorithms Lecture 1: Introduction Data Analysis and Algorithms Lecture 1: Introduction
Data Analysis and Algorithms Lecture 1: IntroductionTayyabSattar5
 
My PhD defence
My PhD defenceMy PhD defence
My PhD defenceJialin LIU
 
Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Grigory Yaroslavtsev
 
CVPR2008 tutorial generalized pca
CVPR2008 tutorial generalized pcaCVPR2008 tutorial generalized pca
CVPR2008 tutorial generalized pcazukun
 
Bias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimizationBias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimizationOlivier Teytaud
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...Taiji Suzuki
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...Taiji Suzuki
 
Turbulence numerical modelling
Turbulence numerical modellingTurbulence numerical modelling
Turbulence numerical modellingRobertoSorba
 
2011 santiago marchi_souza_araki_cobem_2011
2011 santiago marchi_souza_araki_cobem_20112011 santiago marchi_souza_araki_cobem_2011
2011 santiago marchi_souza_araki_cobem_2011CosmoSantiago
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 

Similar to Extending Preconditioned GMRES to Nonlinear Optimization (20)

Grds international conference on pure and applied science (6)
Grds international conference on pure and applied science (6)Grds international conference on pure and applied science (6)
Grds international conference on pure and applied science (6)
 
Szeliski_NLS1.ppt
Szeliski_NLS1.pptSzeliski_NLS1.ppt
Szeliski_NLS1.ppt
 
2005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_20052005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_2005
 
maths_ presentation.pptx
maths_ presentation.pptxmaths_ presentation.pptx
maths_ presentation.pptx
 
Prestation_ClydeShen
Prestation_ClydeShenPrestation_ClydeShen
Prestation_ClydeShen
 
Support Vector Machine Techniques for Nonlinear Equalization
Support Vector Machine Techniques for Nonlinear EqualizationSupport Vector Machine Techniques for Nonlinear Equalization
Support Vector Machine Techniques for Nonlinear Equalization
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Sparsenet
SparsenetSparsenet
Sparsenet
 
Me mv-16-17 unit-5
Me mv-16-17 unit-5Me mv-16-17 unit-5
Me mv-16-17 unit-5
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptx
 
Data Analysis and Algorithms Lecture 1: Introduction
 Data Analysis and Algorithms Lecture 1: Introduction Data Analysis and Algorithms Lecture 1: Introduction
Data Analysis and Algorithms Lecture 1: Introduction
 
My PhD defence
My PhD defenceMy PhD defence
My PhD defence
 
Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)
 
CVPR2008 tutorial generalized pca
CVPR2008 tutorial generalized pcaCVPR2008 tutorial generalized pca
CVPR2008 tutorial generalized pca
 
Bias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimizationBias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimization
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
 
Turbulence numerical modelling
Turbulence numerical modellingTurbulence numerical modelling
Turbulence numerical modelling
 
2011 santiago marchi_souza_araki_cobem_2011
2011 santiago marchi_souza_araki_cobem_20112011 santiago marchi_souza_araki_cobem_2011
2011 santiago marchi_souza_araki_cobem_2011
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 

Recently uploaded

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Extending Preconditioned GMRES to Nonlinear Optimization

  • 1. Extending Preconditioned GMRES to Nonlinear Optimization Hans De Sterck Department of Applied Mathematics University of Waterloo, Canada SIAM Conference on Applied Linear Algebra Valencia, Spain, June 2012
  • 2. this talk is about “accelerating convergence of iterative nonlinear optimization methods” the approach will be to “extend concepts of preconditioned GMRES for linear systems to nonlinear optimization” (note: fully nonlinear preconditioning)
  • 3. hi Evelyne (born May 30, 2012)
  • 4. 1. background: convergence acceleration for linear systems •  start from simple example: finite difference discretization of Poisson equation on unit square with homogeneous Dirichlet boundary conditions •  linear system:
  • 5. convergence acceleration for linear systems •  simple iterative method: Gauss-Seidel (GS) •  stationary iterative method:
  • 6. convergence acceleration for linear systems •  GS converges slowly •  number of iterations grows as grid is refined •  can we accelerate GS? yes! –  GMRES acceleration (generalized minimal residual method) –  multigrid acceleration
  • 7. convergence acceleration for linear systems •  GS converges slowly •  number of iterations grows as grid is refined •  can we accelerate GS? yes! –  GMRES acceleration (generalized minimal residual method) –  multigrid acceleration
  • 8. GMRES as an acceleration mechanism (Washio and GMRES for linear systems: Oosterlee, ETNA, 1997) •  stationary iterative method •  generates residuals recursively: •  define
  • 9. GMRES as an acceleration mechanism (Washio and Oosterlee, GMRES for linear systems: ETNA, 1997) •  stationary iterative process generates preconditioned residuals that build Krylov space •  GMRES: take optimal linear combination of residuals in Krylov space to minimize the residual
  • 10. GMRES as an acceleration mechanism (Washio and Oosterlee, ETNA, 1997) •  seek optimal approximation that minimizes •  this is mathematically “essentially equivalent” to right-preconditioned GMRES •  the stationary iterative method is the preconditioner •  GMRES accelerates the stationary iterative method
  • 11. alternative: multigrid as an acceleration mechanism •  multigrid accelerates the smoother (Gauss-Seidel) •  GMRES (or: CG, ...) and multigrid are ways to accelerate Gauss-Seidel
  • 12. 2. nonlinear optimization – tensor decomposition •  consider simple iterative optimization methods for smooth nonlinear optimization problem •  can we accelerate the convergence of simple iterative optimization methods?
  • 13. application: canonical tensor decomposition •  tensor = element of tensor product of real vector spaces (N-dimensional array) •  N=3: (from “Tensor Decompositions and Applications”, Kolda and Bader, SIAM Rev., 2009 [1]) •  canonical decomposition: decompose tensor in sum of R rank-one terms (approximately)
  • 14. canonical tensor decomposition (problem is non-convex, multiple (local) minima, solution may not exist (ill-posed), ... ; but smooth, and we assume there is a local minimum) (de Silva and Lim, SIMAX, 2009)
  • 15. link with singular value decomposition •  SVD of •  canonical decomposition of tensor
  • 16. a difference with the SVD truncated SVD is best rank-R approximation: BUT best rank-R tensor cannot be obtained by truncation: different optimization problems for different R!
  • 17. tensor approximation applications (1) “Discussion Tracking in Enron Email Using PARAFAC” by Bader, Berry and Browne (2008) (sparse, nonnegative)
  • 18. tensor approximation applications (2) chemometrics: analyze spectrofluorometer data (dense) (Bro et al., http://www.models.life.ku.dk/nwaydata1) •  5 x 201 x 61 tensor: 5 samples (with different mixtures of three amino acids), 61 excitation wavelengths, 201 emission wavelengths •  goal: recover emission spectra of the three amino acids (to determine what was in each sample, and in which concentration) •  also: psychometrics, ... (from [1])
  • 19. ‘workhorse’ algorithm: alternating least squares (ALS) (1) freeze all ar(2), ar(3), compute optimal ar(1) via a least-squares solution (linear, overdetermined) (2) freeze ar(1), ar(3), compute ar(2) (3) freeze ar(1), ar(2), compute ar(3) •  repeat (from [1])
  • 20. alternating least squares (ALS) •  “simple iterative optimization method” •  ALS is block nonlinear Gauss-Seidel •  ALS is monotone •  ALS is sometimes fast, but can also be extremely slow (depending on problem and initial condition) (convergence: Uschmajew’s talk)
  • 21. alternating least squares (ALS) fast case slow case (we used Matlab with Tensor Toolbox (Bader and Kolda) and Poblano Toolbox (Dunlavy et al.) for all computations)
  • 22. alternating least squares (ALS) •  for linear systems : accelerate simple iterative method with − GMRES − multigrid •  the simple iterative method is called the ‘preconditioner’ •  let’s try to accelerate ALS for the tensor optimization problem •  for nonlinear optimization problems, general approaches to accelerate simple iterative methods (like ALS) appear uncommon (e.g., not in Nocedal and Wright) (“nonlinear preconditioning” does not appear to be a common notion in optimization) •  issues: nonlinear, optimization context
  • 23. 3. existing methods: convergence acceleration for nonlinear systems •  nonlinear system: •  equivalent: fixed-point equation (‘simple iterative method’)
  • 24. convergence acceleration for nonlinear systems •  existing convergence acceleration method for nonlinear systems : (“iterate recombination”) –  Anderson acceleration (1965) (popular in electronic structure calculation) –  Pulay mixing – direct inversion in the iterative subspace (DIIS) (1980) –  Washio and Oosterlee (1997): Krylov acceleration for nonlinear multigrid also related to: (in PETSc) –  GMRES (Saad and Schultz, 1986), flexible GMRES (Saad, 1993) recent papers on Anderson mixing/DIIS for nonlinear systems: –  Fang and Saad (2009) –  Walker and Ni (2011) –  Rohwedder and Schneider (2011) plus many other papers with similar ideas and developments •  we will use this type of approach as a building block of our N-GMRES nonlinearly preconditioned optimization algorithm
  • 25. 4. nonlinear GMRES optimization method (N-GMRES) (ALS) (Moré-Thuente line search, satisfies Wolfe conditions)
  • 26. step II: N-GMRES acceleration:
  • 27. step II: N-GMRES acceleration: •  windowed implementation •  several possibilities to solve the small least-squares problems: –  normal equations (with reuse of scalar products; cost 2nw) (Washio and Oosterlee, 1997) –  QR with factor updating (Walker and Ni, 2011) –  SVD and rank-revealing QR (Fang and Saad, 2009)
  • 28. numerical results for ALS-preconditioned N-GMRES applied to tensor problem •  dense test problem (from Tomasi and Bro; Acar et al.): –  random rank-R tensor modified to obtain specific column collinearity, with added noise
  • 29. numerical results: dense test problem
  • 30. dense test problem: optimal window size
  • 31. dense test problem: comparison (N-CG from Acar, Dunlavy and Kolda, 2011)
  • 32. numerical results: sparse test problem •  sparse test problem: –  d-dimensional finite difference Laplacian (2d-way tensor)
  • 33. comparing N-GMRES to GMRES (Washio and Oosterlee, ETNA, 1997) •  GMRES: minimize •  seek optimal approximation same as for N-GMRES
  • 34. N-GMRES: what’s in a name... •  CG for  N-CG for
  • 35. conjugate gradient (CG) for (Nocedal and Wright, 2006)
  • 36. nonlinear conjugate gradient (N-CG) (Nocedal and Wright, 2006)
  • 38. N-GMRES: what’s in a name... •  CG for  N-CG for •  GMRES for " #N-GMRES for (symmetry...)
  • 39. 5. N-GMRES as a general optimization method general methods for nonlinear optimization (smooth, unconstrained) (“Numerical Optimization”, Nocedal and Wright, 2006) 1.  steepest descent with line search 2.  Newton with line search 3.  nonlinear conjugate gradient (N-CG) with line search 4.  trust-region methods 5.  quasi-Newton methods (includes Broyden–Fletcher–Goldfarb– Shanno (BFGS) and limited memory version L-BFGS) 6.  preconditioned N-GMRES as a general optimization method?
  • 40. general N-GMRES optimization method •  first question: what would be a general preconditioner? •  idea: general N-GMRES preconditioner = update in direction of steepest descent (or: use N-GMRES to accelerate steepest descent)
  • 41. steepest-descent preconditioning •  option A: steepest descent with line search •  option B: steepest descent with predefined small step •  claim: steepest descent is the ‘natural’ preconditioner for N-GMRES (note also: link with fixed-point equation)
  • 42. steepest-descent preconditioning •  claim: steepest descent is the ‘natural’ preconditioner for N-GMRES •  example: consider simple quadratic optimization problem •  we know so becomes •  this gives the same residuals as with : steepest-descent N-GMRES preconditioner corresponds to identity preconditioner for linear GMRES (and: small step is sufficient)
  • 43. numerical results: steepest-descent preconditioning •  steepest descent by itself is slow •  N-GMRES with steepest descent preconditioning is competitive with N- CG and L-BFGS •  option A slower than option B (small step)
  • 44. convergence of steepest-descent preconditioned N-GMRES optimization •  assume line searches give solutions that satisfy Wolfe conditions: (Nocedal and Wright, 2006)
  • 45. convergence of steepest-descent preconditioned N- GMRES optimization
  • 46. convergence of steepest-descent preconditioned N- GMRES optimization sketch of (simple!) proof •  •  use Zoutendijk’s theorem: with and thus •  all ui are followed by a steepest descent step, so •  global convergence to a stationary point for general f(u) (note also: Absil and Gallivan, 2009)
  • 47. general N-GMRES optimization method general methods for nonlinear optimization (smooth, unconstrained) (“Numerical Optimization”, Nocedal and Wright, 2006) 1.  steepest descent with line search 2.  Newton with line search 3.  nonlinear conjugate gradient (N-CG) with line search 4.  trust-region methods 5.  quasi-Newton methods (includes Broyden–Fletcher–Goldfarb– Shanno (BFGS) and limited memory version L-BFGS) 6.  preconditioned N-GMRES as a general optimization method
  • 48. a few more notes on related methods for •  acceleration step in N-GMRES is similar to Anderson acceleration and DIIS, but not exactly the same •  “mathematical equivalence” with GMRES in the linear case is discussed by Washio and Oosterlee (1997), Walker and Ni (2011), Rohwedder and Schneider (2011), and others •  equivalence of Anderson/DIIS with certain multisecant update methods (Broyden) is discussed by Fang and Saad (2009), Rohwedder and Schneider (2011), and others •  ‘nonlinear preconditioning’ for N-CG, L-BFGS, multisecant methods is not commonly considered •  ‘nonlinear preconditioning’ view is natural for the N- GMRES optimization framework
  • 49. a few more notes on related methods for •  as mentioned before, there are quite a few other papers that present related ideas: –  Eirola and Nevanlinna, Accelerating with rank-one updates, 1989 –  Brown and Saad, Hybrid Krylov methods for nonlinear systems of equations, 1990 –  Vuik and van der Vorst, A comparison of some GMRES-like methods, 1992 –  Saad (1993): flexible GMRES –  Fokkema, Sleijpen and van der Vorst, Accelerated Inexact Newton Schemes for Large Systems of Nonlinear Equations, 1998 –  etc.
  • 50. 6. conclusions •  N-GMRES optimization method: –  extends concept of preconditioned GMRES to nonlinear optimization (nonlinear preconditioning) –  uses iterate recombination (like Anderson acceleration, DIIS) as an important building block
  • 51. conclusions •  N-GMRES optimization method: accelerates simple iterative optimization method (the nonlinear preconditioner) (ALS) •  just like GMRES accelerates stationary iterative method (preconditioner) for
  • 52. conclusions •  N-GMRES (with ALS preconditioning) gives strongly improved solver for canonical tensor decomposition •  note: we can also do nonlinear adaptive multigrid acceleration for the tensor nonlinear optimization problem! see MS 37. Optimization methods for tensor decomposition, Wednesday 12:15–12:40 “An algebraic multigrid optimization method for low-rank canonical tensor decomposition”, Killian Miller (ALS smoother) multigrid
  • 53. conclusions •  N-GMRES optimization method: –  a general, convergent method (with steepest-descent preconditioning) –  appears competitive with N-CG, L-BFGS –  ‘nonlinear preconditioning’ viewpoint is key
  • 54. open questions: convergence speed of N-GMRES •  GMRES (linear case): convergence rate can be analyzed by considering optimal polynomials etc. •  convergence speed of N-GMRES for optimization: many open questions (cfr. Rohwedder and Schneider (2011) for DIIS, superlinear convergence?, multisecant) •  we should try more applications... (to ALS for other tensor decompositions (see e.g. Grasedyck’s talk), and to other problems)
  • 55. conclusions •  real power of N-GMRES: N-GMRES optimization framework can employ sophisticated nonlinear preconditioners (use ALS in tensor case) •  power lies in good preconditioners (like case of GMRES for linear systems)
  • 56. the power of N-GMRES optimization (tensor problem) steepest descent is not good enough as a preconditioner for N-GMRES (for the tensor problem)
  • 57. the power of N-GMRES optimization (tensor problem) ALS is a much better preconditioner!
  • 58. •  thank you •  questions?   Hans De Sterck, ‘A Nonlinear GMRES Optimization Algorithm for Canonical Tensor Decomposition’, SIAM J. Sci. Comp., 2012   Hans De Sterck, ‘Steepest Descent Preconditioning for Nonlinear GMRES Optimization’, Numer. Linear Algebra Appl., 2012   Hans De Sterck and Killian Miller, ‘An Adaptive Algebraic Multigrid Algorithm for Low-Rank Canonical Tensor Decomposition’, submitted to SIAM J. Sci. Comp., 2011