Successfully reported this slideshow.
Upcoming SlideShare
×

# Qualifier

692 views

Published on

An oral presentation I gave for my PhD qualifier examination

Published in: Education, Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Qualifier

1. 1. Qualifier Exam in HPC February 10 th , 2010
2. 2. Quasi-Newton methods Alexandru Cioaca
3. 3. Quasi-Newton methods (nonlinear systems) <ul><li>Nonlinear systems: </li></ul><ul><li>F(x) = 0, F : R n  R n </li></ul><ul><li>F(x) = [ f i (x 1 ,…,x n ) ] T </li></ul><ul><li>Such systems appear in the simulation of processes (physical, chemical, etc.) </li></ul><ul><li>Iterative algorithm to solve nonlinear systems </li></ul><ul><li>Newton’s method != Nonlinear least-squares </li></ul>
4. 4. Quasi-Newton methods (nonlinear systems) <ul><li>Standard assumptions </li></ul><ul><li>F – continuously differentiable in an open convex set D </li></ul><ul><li>F – Lipschitz continuous on D </li></ul><ul><li>There is x * in D s.t. F(x * )=0, F’(x * ) nonsingular </li></ul><ul><li>Newton’s method: </li></ul><ul><li>Starting from x 0 (initial iterate) </li></ul><ul><li>x k+1 = x k – F’(x k ) -1 * F(x k ), {x k }  x * </li></ul><ul><li>Until termination criterion is satisfied </li></ul>
5. 5. Quasi-Newton methods (nonlinear systems) <ul><li>Linear model around x k : </li></ul><ul><li>M n (x) = F(x n ) + F’(x n )(x-x n ) </li></ul><ul><li>M n (x) = 0  x n+1 = x n - F’(x n ) -1 *F(x n ) </li></ul><ul><li>Iterates are computed as: </li></ul><ul><li>F’(x n ) * s n = F(x n ) </li></ul><ul><li>x n+1 = x n - s n </li></ul>
6. 6. Quasi-Newton methods (nonlinear systems) <ul><li>Evaluate F’(x n ) </li></ul><ul><li>Symbolically </li></ul><ul><li>Numerically with finite differences </li></ul><ul><li>Automatic differentiation </li></ul><ul><li>Solve the linear system F’(x n ) * s n = F(x n ) </li></ul><ul><li>Direct solve: LU, Cholesky </li></ul><ul><li>Iterative methods: GMRES, CG </li></ul>
7. 7. Quasi-Newton methods (nonlinear systems) <ul><li>Computation: </li></ul><ul><li>F(xk) n scalar functions </li></ul><ul><li>F’(xk) n 2 scalar functions </li></ul><ul><li>LU O(2n 3 /3) </li></ul><ul><li>Cholesky O(n 3 /3) </li></ul><ul><li>Krylov methods (depends on condition number) </li></ul>
8. 8. Quasi-Newton methods (nonlinear systems) <ul><li>LU and Cholesky are useful when we want to reuse the factorization (quasi-implicit) </li></ul><ul><li>Difficult to parallelize and balance the workload </li></ul><ul><li>Cholesky is faster and more stable but needs SPD (!) </li></ul><ul><li>For n large, factorization is very impractical (n~10 6 ) </li></ul><ul><li>Krylov methods contain elements easily parallelizable (updates, inner products, matrix-vector products) </li></ul><ul><li>CG is faster and more stable but needs SPD </li></ul>
9. 9. Quasi-Newton methods (nonlinear systems) <ul><li>Advantages: </li></ul><ul><li>Under standard assumptions, Newton’s method converges locally and quadratically </li></ul><ul><li>There exists a domain of attraction S which contains the solution </li></ul><ul><li>Once the iterates enter S, they stay in S and eventually converge to x* </li></ul><ul><li>The algorithm is memoryless (self-corrective) </li></ul>
10. 10. Quasi-Newton methods (nonlinear systems) <ul><li>Disadvantages: </li></ul><ul><li>Convergence depends on the choice of x 0 </li></ul><ul><li>F’(x) has to be evaluated for each x k </li></ul><ul><li>Computation can be expensive: F(x k ), F’(x k ), s k </li></ul>
11. 11. Quasi-Newton methods (nonlinear systems) <ul><li>Implicit schemes for ODEs </li></ul><ul><li>y’ = f(t,y) </li></ul><ul><li>Forward Euler: y n+1 = y n + hf(t n ,y n ) (explicit) </li></ul><ul><li>Backward Euler: y n+1 = y n + hf(t n+1 , y n+1 ) (implicit) </li></ul><ul><li>Implicit schemes need the solution of a nonlinear system </li></ul><ul><li>(also CN, RK, LMF) </li></ul>
12. 12. Quasi-Newton methods (nonlinear systems) <ul><li>How to circumvent evaluating F’(x k ) ? </li></ul><ul><li>Broyden’s method </li></ul><ul><li>B k+1 = B k + (y k – B k *s k )*s k T / <s k , s k > </li></ul><ul><li>x k+1 = x k – B k -1 * F(x k ) </li></ul><ul><li>Inverse update (Sherman-Morrison formula) </li></ul><ul><li>H k+1 =H k +(s k -H k *y k )*s k T *H k /<s k ,H k *y k > </li></ul><ul><li> x k+1 = x k – H k * F(x k ) </li></ul><ul><li>( s k+1 = x k+1 – x k , y k+1 = F(x k+1 ) – F(x k ) ) </li></ul>
13. 13. Quasi-Newton methods (nonlinear systems) <ul><li>Advantages: </li></ul><ul><li>No need to compute F’(x k ) </li></ul><ul><li>For inverse update – no linear system to solve </li></ul><ul><li>Disadvantages: </li></ul><ul><li>Superlinear convergence </li></ul><ul><li>No longer memoryless </li></ul>
14. 14. Quasi-Newton methods (unconstrained optimization) <ul><li>Problem: </li></ul><ul><li>Find the global minimizer of a cost function </li></ul><ul><li>f : R n  R, x * = arg min f </li></ul><ul><li>f differentiable means the problem can be attacked by looking for zeros of the gradient </li></ul>
15. 15. Quasi-Newton methods (unconstrained optimization) <ul><li>Descent methods </li></ul><ul><li>x k+1 =x k – λ k *P k *  f(x k ) </li></ul><ul><li>P k = I n - steepest descent </li></ul><ul><li>P k =  2 f(x k ) -1 - Newton’s method </li></ul><ul><li>P k = B k -1 - Quasi-Newton </li></ul><ul><li>Angle between P k ,  f(x k ) less than 90 </li></ul><ul><li>B k has to mimic the behavior of the Hessian </li></ul>
16. 16. Quasi-Newton methods (unconstrained optimization) <ul><li>Global convergence </li></ul><ul><li>Line search </li></ul><ul><li>Step length: backtracking, interpolation </li></ul><ul><li>Sufficient decrease: Wolfe conditions </li></ul><ul><li>Trust regions </li></ul>
17. 17. Quasi-Newton methods (unconstrained optimization) <ul><li>For Quasi-Newton, B k has to resemble  2 f(x k ) </li></ul><ul><li>Single-Rank: </li></ul><ul><li>Symmetry: </li></ul><ul><li>Positive def.: </li></ul><ul><li>Inverse update: </li></ul>
18. 18. Quasi-Newton methods (unconstrained optimization) <ul><li>Computation </li></ul><ul><li>Matrix updates, inner products </li></ul><ul><li>DFP, PSB 3 matrix-vector products </li></ul><ul><li>BFGS 2 matrix-matrix products </li></ul><ul><li>Storage </li></ul><ul><li>Limited memory versions (L-BFGS) </li></ul><ul><li>Store {sk, yk} for the last m iterations and recompute H </li></ul>
19. 19. Further improvements <ul><li>Preconditioning the linear system </li></ul><ul><li>For faster convergence one may solve K*B k *p k = K*F(x k ) </li></ul><ul><li>If B is spd (and sparse) we can use sparse approximate inverses to generate the preconditioner </li></ul><ul><li>This preconditioner can be refined on a subspace of B k using an algebraic multigrid technique </li></ul><ul><li>We need to solve the eigenvalue problem </li></ul>
20. 20. Further improvements <ul><li>Model reduction </li></ul><ul><li>Sometimes the dimension of the system is very large </li></ul><ul><li>Smaller model that captures the essence of the original </li></ul><ul><li>An approximation of the model variability can be retrieved from an ensemble of forward simulations </li></ul><ul><li>The covariance matrix gives the subspace </li></ul><ul><li>We need to solve the eigenvalue problem </li></ul>
21. 21. QR/QL algorithms for symmetric matrices <ul><li>Solves the eigenvalue problem </li></ul><ul><li>Iterative algorithm </li></ul><ul><li>Uses QR/QL factorization at each step </li></ul><ul><li>(A=Q*R, Q unitary, R upper triangular) </li></ul><ul><li>for k = 1,2,.. </li></ul><ul><li>A k =Q k *R k </li></ul><ul><li>A k+1 =R k *Q k </li></ul><ul><li>end </li></ul><ul><li>Diagonal of A k converges to eigenvalues of A </li></ul>
22. 22. QR/QL algorithms for symmetric matrices <ul><li>The matrix A is reduced to upper Hessenberg form before starting the iterations </li></ul><ul><li>Householder reflections (U=I-v*v’) </li></ul><ul><li>Reduction is made column-wise </li></ul><ul><li>If A is symmetric, it is reduced to tridiagonal form </li></ul>
23. 23. QR/QL algorithms for symmetric matrices <ul><li>Convergence to a triangular form can be slow </li></ul><ul><li>Origin shifts are used to accelerate it </li></ul><ul><li>for k = 1,2,.. </li></ul><ul><li>A k -z k *I=Q k *R k </li></ul><ul><li>A k+1 =R k *Q k +z k *I </li></ul><ul><li>end </li></ul><ul><li>Wilkinson shift </li></ul><ul><li>QR makes heavy use of matrix-matrix products </li></ul>
24. 24. Alternatives to quasi-Newton <ul><li>Inexact Newton methods </li></ul><ul><li>Inner iteration – determine a search direction by solving the linear system with a certain tolerance </li></ul><ul><li>Only Hessian-vector products are necessary </li></ul><ul><li>Outer iteration – line search on the search direction </li></ul><ul><li>Nonlinear CG </li></ul><ul><li>Residual replaced by gradient of cost function </li></ul><ul><li>Line search </li></ul><ul><li>Different flavors </li></ul>
25. 25. Alternatives to quasi-Newton <ul><li>Direct search </li></ul><ul><li>Does not involve derivatives of the cost function </li></ul><ul><li>Uses a structure called simplex to search for decrease in f </li></ul><ul><li>Stops when further progress cannot be achieved </li></ul><ul><li>Can get stuck in a local minima </li></ul>
26. 26. More alternatives <ul><li>Monte Carlo </li></ul><ul><li>Computational method relying on random sampling </li></ul><ul><li>Can be used for optimization (MDO), inverse problems by using random walks </li></ul><ul><li>In the case where we have multiple correlated variables, the correlation matrix is spd so we can use Cholesky to factorize it </li></ul>
27. 27. Conclusions <ul><li>Newton’s method is a very powerful method with many applications and uses (solving nonlinear systems, finding minima of cost functions). Newton’s method can be used together with many other numerical algorithms (factorizations, linear solvers) </li></ul><ul><li>The optimization and parallelization of matrix-vector, matrix-matrix products, decompositions and other numerical methods can have a significant impact in overall performance </li></ul>
28. 28. <ul><li>Thank you for your time! </li></ul>