Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

An oral presentation I gave for my PhD qualifier examination

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. Qualifier Exam in HPC February 10 th , 2010
  2. 2. Quasi-Newton methods Alexandru Cioaca
  3. 3. Quasi-Newton methods (nonlinear systems) <ul><li>Nonlinear systems: </li></ul><ul><li>F(x) = 0, F : R n  R n </li></ul><ul><li>F(x) = [ f i (x 1 ,…,x n ) ] T </li></ul><ul><li>Such systems appear in the simulation of processes (physical, chemical, etc.) </li></ul><ul><li>Iterative algorithm to solve nonlinear systems </li></ul><ul><li>Newton’s method != Nonlinear least-squares </li></ul>
  4. 4. Quasi-Newton methods (nonlinear systems) <ul><li>Standard assumptions </li></ul><ul><li>F – continuously differentiable in an open convex set D </li></ul><ul><li>F – Lipschitz continuous on D </li></ul><ul><li>There is x * in D s.t. F(x * )=0, F’(x * ) nonsingular </li></ul><ul><li>Newton’s method: </li></ul><ul><li>Starting from x 0 (initial iterate) </li></ul><ul><li>x k+1 = x k – F’(x k ) -1 * F(x k ), {x k }  x * </li></ul><ul><li>Until termination criterion is satisfied </li></ul>
  5. 5. Quasi-Newton methods (nonlinear systems) <ul><li>Linear model around x k : </li></ul><ul><li>M n (x) = F(x n ) + F’(x n )(x-x n ) </li></ul><ul><li>M n (x) = 0  x n+1 = x n - F’(x n ) -1 *F(x n ) </li></ul><ul><li>Iterates are computed as: </li></ul><ul><li>F’(x n ) * s n = F(x n ) </li></ul><ul><li>x n+1 = x n - s n </li></ul>
  6. 6. Quasi-Newton methods (nonlinear systems) <ul><li>Evaluate F’(x n ) </li></ul><ul><li>Symbolically </li></ul><ul><li>Numerically with finite differences </li></ul><ul><li>Automatic differentiation </li></ul><ul><li>Solve the linear system F’(x n ) * s n = F(x n ) </li></ul><ul><li>Direct solve: LU, Cholesky </li></ul><ul><li>Iterative methods: GMRES, CG </li></ul>
  7. 7. Quasi-Newton methods (nonlinear systems) <ul><li>Computation: </li></ul><ul><li>F(xk) n scalar functions </li></ul><ul><li>F’(xk) n 2 scalar functions </li></ul><ul><li>LU O(2n 3 /3) </li></ul><ul><li>Cholesky O(n 3 /3) </li></ul><ul><li>Krylov methods (depends on condition number) </li></ul>
  8. 8. Quasi-Newton methods (nonlinear systems) <ul><li>LU and Cholesky are useful when we want to reuse the factorization (quasi-implicit) </li></ul><ul><li>Difficult to parallelize and balance the workload </li></ul><ul><li>Cholesky is faster and more stable but needs SPD (!) </li></ul><ul><li>For n large, factorization is very impractical (n~10 6 ) </li></ul><ul><li>Krylov methods contain elements easily parallelizable (updates, inner products, matrix-vector products) </li></ul><ul><li>CG is faster and more stable but needs SPD </li></ul>
  9. 9. Quasi-Newton methods (nonlinear systems) <ul><li>Advantages: </li></ul><ul><li>Under standard assumptions, Newton’s method converges locally and quadratically </li></ul><ul><li>There exists a domain of attraction S which contains the solution </li></ul><ul><li>Once the iterates enter S, they stay in S and eventually converge to x* </li></ul><ul><li>The algorithm is memoryless (self-corrective) </li></ul>
  10. 10. Quasi-Newton methods (nonlinear systems) <ul><li>Disadvantages: </li></ul><ul><li>Convergence depends on the choice of x 0 </li></ul><ul><li>F’(x) has to be evaluated for each x k </li></ul><ul><li>Computation can be expensive: F(x k ), F’(x k ), s k </li></ul>
  11. 11. Quasi-Newton methods (nonlinear systems) <ul><li>Implicit schemes for ODEs </li></ul><ul><li>y’ = f(t,y) </li></ul><ul><li>Forward Euler: y n+1 = y n + hf(t n ,y n ) (explicit) </li></ul><ul><li>Backward Euler: y n+1 = y n + hf(t n+1 , y n+1 ) (implicit) </li></ul><ul><li>Implicit schemes need the solution of a nonlinear system </li></ul><ul><li>(also CN, RK, LMF) </li></ul>
  12. 12. Quasi-Newton methods (nonlinear systems) <ul><li>How to circumvent evaluating F’(x k ) ? </li></ul><ul><li>Broyden’s method </li></ul><ul><li>B k+1 = B k + (y k – B k *s k )*s k T / <s k , s k > </li></ul><ul><li>x k+1 = x k – B k -1 * F(x k ) </li></ul><ul><li>Inverse update (Sherman-Morrison formula) </li></ul><ul><li>H k+1 =H k +(s k -H k *y k )*s k T *H k /<s k ,H k *y k > </li></ul><ul><li> x k+1 = x k – H k * F(x k ) </li></ul><ul><li>( s k+1 = x k+1 – x k , y k+1 = F(x k+1 ) – F(x k ) ) </li></ul>
  13. 13. Quasi-Newton methods (nonlinear systems) <ul><li>Advantages: </li></ul><ul><li>No need to compute F’(x k ) </li></ul><ul><li>For inverse update – no linear system to solve </li></ul><ul><li>Disadvantages: </li></ul><ul><li>Superlinear convergence </li></ul><ul><li>No longer memoryless </li></ul>
  14. 14. Quasi-Newton methods (unconstrained optimization) <ul><li>Problem: </li></ul><ul><li>Find the global minimizer of a cost function </li></ul><ul><li>f : R n  R, x * = arg min f </li></ul><ul><li>f differentiable means the problem can be attacked by looking for zeros of the gradient </li></ul>
  15. 15. Quasi-Newton methods (unconstrained optimization) <ul><li>Descent methods </li></ul><ul><li>x k+1 =x k – λ k *P k *  f(x k ) </li></ul><ul><li>P k = I n - steepest descent </li></ul><ul><li>P k =  2 f(x k ) -1 - Newton’s method </li></ul><ul><li>P k = B k -1 - Quasi-Newton </li></ul><ul><li>Angle between P k ,  f(x k ) less than 90 </li></ul><ul><li>B k has to mimic the behavior of the Hessian </li></ul>
  16. 16. Quasi-Newton methods (unconstrained optimization) <ul><li>Global convergence </li></ul><ul><li>Line search </li></ul><ul><li>Step length: backtracking, interpolation </li></ul><ul><li>Sufficient decrease: Wolfe conditions </li></ul><ul><li>Trust regions </li></ul>
  17. 17. Quasi-Newton methods (unconstrained optimization) <ul><li>For Quasi-Newton, B k has to resemble  2 f(x k ) </li></ul><ul><li>Single-Rank: </li></ul><ul><li>Symmetry: </li></ul><ul><li>Positive def.: </li></ul><ul><li>Inverse update: </li></ul>
  18. 18. Quasi-Newton methods (unconstrained optimization) <ul><li>Computation </li></ul><ul><li>Matrix updates, inner products </li></ul><ul><li>DFP, PSB 3 matrix-vector products </li></ul><ul><li>BFGS 2 matrix-matrix products </li></ul><ul><li>Storage </li></ul><ul><li>Limited memory versions (L-BFGS) </li></ul><ul><li>Store {sk, yk} for the last m iterations and recompute H </li></ul>
  19. 19. Further improvements <ul><li>Preconditioning the linear system </li></ul><ul><li>For faster convergence one may solve K*B k *p k = K*F(x k ) </li></ul><ul><li>If B is spd (and sparse) we can use sparse approximate inverses to generate the preconditioner </li></ul><ul><li>This preconditioner can be refined on a subspace of B k using an algebraic multigrid technique </li></ul><ul><li>We need to solve the eigenvalue problem </li></ul>
  20. 20. Further improvements <ul><li>Model reduction </li></ul><ul><li>Sometimes the dimension of the system is very large </li></ul><ul><li>Smaller model that captures the essence of the original </li></ul><ul><li>An approximation of the model variability can be retrieved from an ensemble of forward simulations </li></ul><ul><li>The covariance matrix gives the subspace </li></ul><ul><li>We need to solve the eigenvalue problem </li></ul>
  21. 21. QR/QL algorithms for symmetric matrices <ul><li>Solves the eigenvalue problem </li></ul><ul><li>Iterative algorithm </li></ul><ul><li>Uses QR/QL factorization at each step </li></ul><ul><li>(A=Q*R, Q unitary, R upper triangular) </li></ul><ul><li>for k = 1,2,.. </li></ul><ul><li>A k =Q k *R k </li></ul><ul><li>A k+1 =R k *Q k </li></ul><ul><li>end </li></ul><ul><li>Diagonal of A k converges to eigenvalues of A </li></ul>
  22. 22. QR/QL algorithms for symmetric matrices <ul><li>The matrix A is reduced to upper Hessenberg form before starting the iterations </li></ul><ul><li>Householder reflections (U=I-v*v’) </li></ul><ul><li>Reduction is made column-wise </li></ul><ul><li>If A is symmetric, it is reduced to tridiagonal form </li></ul>
  23. 23. QR/QL algorithms for symmetric matrices <ul><li>Convergence to a triangular form can be slow </li></ul><ul><li>Origin shifts are used to accelerate it </li></ul><ul><li>for k = 1,2,.. </li></ul><ul><li>A k -z k *I=Q k *R k </li></ul><ul><li>A k+1 =R k *Q k +z k *I </li></ul><ul><li>end </li></ul><ul><li>Wilkinson shift </li></ul><ul><li>QR makes heavy use of matrix-matrix products </li></ul>
  24. 24. Alternatives to quasi-Newton <ul><li>Inexact Newton methods </li></ul><ul><li>Inner iteration – determine a search direction by solving the linear system with a certain tolerance </li></ul><ul><li>Only Hessian-vector products are necessary </li></ul><ul><li>Outer iteration – line search on the search direction </li></ul><ul><li>Nonlinear CG </li></ul><ul><li>Residual replaced by gradient of cost function </li></ul><ul><li>Line search </li></ul><ul><li>Different flavors </li></ul>
  25. 25. Alternatives to quasi-Newton <ul><li>Direct search </li></ul><ul><li>Does not involve derivatives of the cost function </li></ul><ul><li>Uses a structure called simplex to search for decrease in f </li></ul><ul><li>Stops when further progress cannot be achieved </li></ul><ul><li>Can get stuck in a local minima </li></ul>
  26. 26. More alternatives <ul><li>Monte Carlo </li></ul><ul><li>Computational method relying on random sampling </li></ul><ul><li>Can be used for optimization (MDO), inverse problems by using random walks </li></ul><ul><li>In the case where we have multiple correlated variables, the correlation matrix is spd so we can use Cholesky to factorize it </li></ul>
  27. 27. Conclusions <ul><li>Newton’s method is a very powerful method with many applications and uses (solving nonlinear systems, finding minima of cost functions). Newton’s method can be used together with many other numerical algorithms (factorizations, linear solvers) </li></ul><ul><li>The optimization and parallelization of matrix-vector, matrix-matrix products, decompositions and other numerical methods can have a significant impact in overall performance </li></ul>
  28. 28. <ul><li>Thank you for your time! </li></ul>