Ric walter (auth.) numerical methods and optimization a consumer guide-springer international publishing (2014)

Numerical
Methods and
Optimization
ÉricWalter
A Consumer Guide

Numerical Methods and Optimization

Éric Walter
Numerical Methods
and Optimization
A Consumer Guide
123

Éric Walter
Laboratoire des Signaux et Systèmes
CNRS-SUPÉLEC-Université Paris-Sud
Gif-sur-Yvette
France
ISBN 978-3-319-07670-6 ISBN 978-3-319-07671-3 (eBook)
DOI 10.1007/978-3-319-07671-3
Springer Cham Heidelberg New York Dordrecht London
Library of Congress Control Number: 2014940746
Ó Springer International Publishing Switzerland 2014
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief
excerpts in connection with reviews or scholarly analysis or material supplied specifically for the
purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the
work. Duplication of this publication or parts thereof is permitted only under the provisions of
the Copyright Law of the Publisher’s location, in its current version, and permission for use must
always be obtained from Springer. Permissions for use may be obtained through RightsLink at the
Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

Contents
1 From Calculus to Computation. . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Why Not Use Naive Mathematical Methods?. . . . . . . . . . . . . 3
1.1.1 Too Many Operations . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Too Sensitive to Numerical Errors . . . . . . . . . . . . . . 3
1.1.3 Unavailable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 What to Do, Then? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 How Is This Book Organized? . . . . . . . . . . . . . . . . . . . . . . . 4
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Notation and Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Scalars, Vectors, and Matrices . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Little o and Big O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Norms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.1 Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.2 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.3 Convergence Speeds . . . . . . . . . . . . . . . . . . . . . . . . 15
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Solving Systems of Linear Equations. . . . . . . . . . . . . . . . . . . . . . 17
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Condition Number(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Approaches Best Avoided . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Questions About A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.6 Direct Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6.1 Backward or Forward Substitution . . . . . . . . . . . . . . 23
3.6.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . 25
3.6.3 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6.4 Iterative Improvement . . . . . . . . . . . . . . . . . . . . . . . 29
3.6.5 QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6.6 Singular Value Decomposition . . . . . . . . . . . . . . . . . 33
vii

3.7 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7.1 Classical Iterative Methods . . . . . . . . . . . . . . . . . . . 35
3.7.2 Krylov Subspace Iteration . . . . . . . . . . . . . . . . . . . . 38
3.8 Taking Advantage of the Structure of A . . . . . . . . . . . . . . . . 42
3.8.1 A Is Symmetric Positive Definite . . . . . . . . . . . . . . . 42
3.8.2 A Is Toeplitz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.8.3 A Is Vandermonde . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.8.4 A Is Sparse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.9 Complexity Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.9.1 Counting Flops. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.9.2 Getting the Job Done Quickly . . . . . . . . . . . . . . . . . 45
3.10 MATLAB Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.10.1 A Is Dense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.10.2 A Is Dense and Symmetric Positive Definite . . . . . . . 52
3.10.3 A Is Sparse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.10.4 A Is Sparse and Symmetric Positive Definite . . . . . . . 54
3.11 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 Solving Other Problems in Linear Algebra . . . . . . . . . . . . . . . . . 59
4.1 Inverting Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Computing Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Computing Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . 61
4.3.1 Approach Best Avoided . . . . . . . . . . . . . . . . . . . . . 61
4.3.2 Examples of Applications . . . . . . . . . . . . . . . . . . . . 62
4.3.3 Power Iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.4 Inverse Power Iteration . . . . . . . . . . . . . . . . . . . . . . 65
4.3.5 Shifted Inverse Power Iteration . . . . . . . . . . . . . . . . 66
4.3.6 QR Iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3.7 Shifted QR Iteration . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 MATLAB Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.1 Inverting a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.2 Evaluating a Determinant . . . . . . . . . . . . . . . . . . . . 71
4.4.3 Computing Eigenvalues. . . . . . . . . . . . . . . . . . . . . . 72
4.4.4 Computing Eigenvalues and Eigenvectors . . . . . . . . . 74
4.5 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Interpolating and Extrapolating . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
viii Contents

5.3 Univariate Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3.1 Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . 80
5.3.2 Interpolation by Cubic Splines . . . . . . . . . . . . . . . . . 84
5.3.3 Rational Interpolation . . . . . . . . . . . . . . . . . . . . . . . 86
5.3.4 Richardson’s Extrapolation . . . . . . . . . . . . . . . . . . . 88
5.4 Multivariate Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4.1 Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . 89
5.4.2 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.3 Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5 MATLAB Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.6 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6 Integrating and Differentiating Functions . . . . . . . . . . . . . . . . . . 99
6.1 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2 Integrating Univariate Functions. . . . . . . . . . . . . . . . . . . . . . 101
6.2.1 Newton–Cotes Methods. . . . . . . . . . . . . . . . . . . . . . 102
6.2.2 Romberg’s Method . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2.3 Gaussian Quadrature . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2.4 Integration via the Solution of an ODE . . . . . . . . . . . 109
6.3 Integrating Multivariate Functions . . . . . . . . . . . . . . . . . . . . 109
6.3.1 Nested One-Dimensional Integrations . . . . . . . . . . . . 110
6.3.2 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . 111
6.4 Differentiating Univariate Functions . . . . . . . . . . . . . . . . . . . 112
6.4.1 First-Order Derivatives . . . . . . . . . . . . . . . . . . . . . . 113
6.4.2 Second-Order Derivatives . . . . . . . . . . . . . . . . . . . . 116
6.4.3 Richardson’s Extrapolation . . . . . . . . . . . . . . . . . . . 117
6.5 Differentiating Multivariate Functions. . . . . . . . . . . . . . . . . . 119
6.6 Automatic Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.6.1 Drawbacks of Finite-Difference Evaluation . . . . . . . . 120
6.6.2 Basic Idea of Automatic Differentiation . . . . . . . . . . 121
6.6.3 Backward Evaluation . . . . . . . . . . . . . . . . . . . . . . . 123
6.6.4 Forward Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . 127
6.6.5 Extension to the Computation of Hessians. . . . . . . . . 129
6.7 MATLAB Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.7.1 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.7.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.8 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7 Solving Systems of Nonlinear Equations . . . . . . . . . . . . . . . . . . . 139
7.1 What Are the Differences with the Linear Case? . . . . . . . . . . 139
7.2 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Contents ix

7.3 One Equation in One Unknown . . . . . . . . . . . . . . . . . . . . . . 141
7.3.1 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.3.2 Fixed-Point Iteration . . . . . . . . . . . . . . . . . . . . . . . . 143
7.3.3 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.3.4 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.4 Multivariate Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.4.1 Fixed-Point Iteration . . . . . . . . . . . . . . . . . . . . . . . . 148
7.4.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.4.3 Quasi–Newton Methods . . . . . . . . . . . . . . . . . . . . . 150
7.5 Where to Start From? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.6 When to Stop? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.7 MATLAB Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.7.1 One Equation in One Unknown . . . . . . . . . . . . . . . . 155
7.7.2 Multivariate Systems. . . . . . . . . . . . . . . . . . . . . . . . 160
7.8 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8 Introduction to Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.1 A Word of Caution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.2 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.3 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.4 How About a Free Lunch? . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.4.1 There Is No Such Thing . . . . . . . . . . . . . . . . . . . . . 173
8.4.2 You May Still Get a Pretty Inexpensive Meal . . . . . . 174
8.5 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9 Optimizing Without Constraint. . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.1 Theoretical Optimality Conditions . . . . . . . . . . . . . . . . . . . . 177
9.2 Linear Least Squares. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.2.1 Quadratic Cost in the Error . . . . . . . . . . . . . . . . . . . 183
9.2.2 Quadratic Cost in the Decision Variables . . . . . . . . . 184
9.2.3 Linear Least Squares via QR Factorization . . . . . . . . 188
9.2.4 Linear Least Squares via Singular Value
Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
9.2.5 What to Do if FT
F Is Not Invertible? . . . . . . . . . . . . 194
9.2.6 Regularizing Ill-Conditioned Problems . . . . . . . . . . . 194
9.3 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.3.1 Separable Least Squares . . . . . . . . . . . . . . . . . . . . . 195
9.3.2 Line Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
9.3.3 Combining Line Searches . . . . . . . . . . . . . . . . . . . . 200
x Contents

9.3.4 Methods Based on a Taylor Expansion
of the Cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
9.3.5 A Method That Can Deal with Nondifferentiable
Costs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
9.4 Additional Topics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.4.1 Robust Optimization . . . . . . . . . . . . . . . . . . . . . . . . 220
9.4.2 Global Optimization . . . . . . . . . . . . . . . . . . . . . . . . 222
9.4.3 Optimization on a Budget . . . . . . . . . . . . . . . . . . . . 225
9.4.4 Multi-Objective Optimization. . . . . . . . . . . . . . . . . . 226
9.5 MATLAB Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
9.5.1 Least Squares on a Multivariate Polynomial
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
9.5.2 Nonlinear Estimation . . . . . . . . . . . . . . . . . . . . . . . 236
9.6 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
10 Optimizing Under Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 245
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
10.1.1 Topographical Analogy . . . . . . . . . . . . . . . . . . . . . . 245
10.1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
10.1.3 Desirable Properties of the Feasible Set . . . . . . . . . . 246
10.1.4 Getting Rid of Constraints. . . . . . . . . . . . . . . . . . . . 247
10.2 Theoretical Optimality Conditions . . . . . . . . . . . . . . . . . . . . 248
10.2.1 Equality Constraints . . . . . . . . . . . . . . . . . . . . . . . . 248
10.2.2 Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . 252
10.2.3 General Case: The KKT Conditions . . . . . . . . . . . . . 256
10.3 Solving the KKT Equations with Newton’s Method . . . . . . . . 256
10.4 Using Penalty or Barrier Functions . . . . . . . . . . . . . . . . . . . . 257
10.4.1 Penalty Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 257
10.4.2 Barrier Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 259
10.4.3 Augmented Lagrangians . . . . . . . . . . . . . . . . . . . . . 260
10.5 Sequential Quadratic Programming. . . . . . . . . . . . . . . . . . . . 261
10.6 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
10.6.1 Standard Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
10.6.2 Principle of Dantzig’s Simplex Method. . . . . . . . . . . 266
10.6.3 The Interior-Point Revolution. . . . . . . . . . . . . . . . . . 271
10.7 Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
10.7.1 Convex Feasible Sets . . . . . . . . . . . . . . . . . . . . . . . 273
10.7.2 Convex Cost Functions . . . . . . . . . . . . . . . . . . . . . . 273
10.7.3 Theoretical Optimality Conditions . . . . . . . . . . . . . . 275
10.7.4 Lagrangian Formulation . . . . . . . . . . . . . . . . . . . . . 275
10.7.5 Interior-Point Methods . . . . . . . . . . . . . . . . . . . . . . 277
10.7.6 Back to Linear Programming . . . . . . . . . . . . . . . . . . 278
10.8 Constrained Optimization on a Budget . . . . . . . . . . . . . . . . . 280
Contents xi

10.9 MATLAB Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
10.9.1 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . 281
10.9.2 Nonlinear Programming . . . . . . . . . . . . . . . . . . . . . 282
10.10 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
11 Combinatorial Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
11.2 Simulated Annealing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
11.3 MATLAB Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
12 Solving Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . 299
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
12.2 Initial-Value Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
12.2.1 Linear Time-Invariant Case . . . . . . . . . . . . . . . . . . . 304
12.2.2 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
12.2.3 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
12.2.4 Choosing Step-Size. . . . . . . . . . . . . . . . . . . . . . . . . 314
12.2.5 Stiff ODEs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
12.2.6 Differential Algebraic Equations. . . . . . . . . . . . . . . . 326
12.3 Boundary-Value Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 328
12.3.1 A Tiny Battlefield Example. . . . . . . . . . . . . . . . . . . 329
12.3.2 Shooting Methods. . . . . . . . . . . . . . . . . . . . . . . . . . 330
12.3.3 Finite-Difference Method . . . . . . . . . . . . . . . . . . . . 331
12.3.4 Projection Methods. . . . . . . . . . . . . . . . . . . . . . . . . 333
12.4 MATLAB Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
12.4.1 Absolute Stability Regions for Dahlquist’s Test . . . . . 337
12.4.2 Influence of Stiffness . . . . . . . . . . . . . . . . . . . . . . . 341
12.4.3 Simulation for Parameter Estimation. . . . . . . . . . . . . 343
12.4.4 Boundary Value Problem. . . . . . . . . . . . . . . . . . . . . 346
12.5 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
13 Solving Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . 359
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
13.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
13.2.1 Linear and Nonlinear PDEs . . . . . . . . . . . . . . . . . . . 360
13.2.2 Order of a PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
13.2.3 Types of Boundary Conditions. . . . . . . . . . . . . . . . . 361
13.2.4 Classification of Second-Order Linear PDEs . . . . . . . 361
13.3 Finite-Difference Method. . . . . . . . . . . . . . . . . . . . . . . . . . . 364
13.3.1 Discretization of the PDE . . . . . . . . . . . . . . . . . . . . 365
13.3.2 Explicit and Implicit Methods . . . . . . . . . . . . . . . . . 365
xii Contents

13.3.3 Illustration: The Crank–Nicolson Scheme . . . . . . . . . 366
13.3.4 Main Drawback of the Finite-Difference
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
13.4 A Few Words About the Finite-Element Method . . . . . . . . . . 368
13.4.1 FEM Building Blocks . . . . . . . . . . . . . . . . . . . . . . . 368
13.4.2 Finite-Element Approximation of the Solution . . . . . . 371
13.4.3 Taking the PDE into Account . . . . . . . . . . . . . . . . . 371
13.5 MATLAB Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
13.6 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
14 Assessing Numerical Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
14.2 Types of Numerical Algorithms . . . . . . . . . . . . . . . . . . . . . . 379
14.2.1 Verifiable Algorithms . . . . . . . . . . . . . . . . . . . . . . . 379
14.2.2 Exact Finite Algorithms . . . . . . . . . . . . . . . . . . . . . 380
14.2.3 Exact Iterative Algorithms. . . . . . . . . . . . . . . . . . . . 380
14.2.4 Approximate Algorithms . . . . . . . . . . . . . . . . . . . . . 381
14.3 Rounding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
14.3.1 Real and Floating-Point Numbers . . . . . . . . . . . . . . . 383
14.3.2 IEEE Standard 754 . . . . . . . . . . . . . . . . . . . . . . . . . 384
14.3.3 Rounding Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . 385
14.3.4 Rounding Modes . . . . . . . . . . . . . . . . . . . . . . . . . . 385
14.3.5 Rounding-Error Bounds. . . . . . . . . . . . . . . . . . . . . . 386
14.4 Cumulative Effect of Rounding Errors . . . . . . . . . . . . . . . . . 386
14.4.1 Normalized Binary Representations . . . . . . . . . . . . . 386
14.4.2 Addition (and Subtraction). . . . . . . . . . . . . . . . . . . . 387
14.4.3 Multiplication (and Division) . . . . . . . . . . . . . . . . . . 388
14.4.4 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
14.4.5 Loss of Precision Due to n Arithmetic Operations . . . 388
14.4.6 Special Case of the Scalar Product . . . . . . . . . . . . . . 389
14.5 Classes of Methods for Assessing Numerical Errors . . . . . . . . 389
14.5.1 Prior Mathematical Analysis . . . . . . . . . . . . . . . . . . 389
14.5.2 Computer Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 390
14.6 CESTAC/CADNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
14.6.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
14.6.2 Validity Conditions. . . . . . . . . . . . . . . . . . . . . . . . . 400
14.7 MATLAB Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
14.7.1 Switching the Direction of Rounding . . . . . . . . . . . . 403
14.7.2 Computing with Intervals . . . . . . . . . . . . . . . . . . . . 404
14.7.3 Using CESTAC/CADNA. . . . . . . . . . . . . . . . . . . . . 404
14.8 In Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Contents xiii

15 WEB Resources to Go Further . . . . . . . . . . . . . . . . . . . . . . . . . . 409
15.1 Search Engines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
15.2 Encyclopedias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
15.3 Repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
15.4 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
15.4.1 High-Level Interpreted Languages . . . . . . . . . . . . . . 411
15.4.2 Libraries for Compiled Languages . . . . . . . . . . . . . . 413
15.4.3 Other Resources for Scientific Computing. . . . . . . . . 413
15.5 OpenCourseWare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
16 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
16.1 Ranking Web Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
16.2 Designing a Cooking Recipe . . . . . . . . . . . . . . . . . . . . . . . . 416
16.3 Landing on the Moon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
16.4 Characterizing Toxic Emissions by Paints . . . . . . . . . . . . . . . 419
16.5 Maximizing the Income of a Scraggy Smuggler. . . . . . . . . . . 421
16.6 Modeling the Growth of Trees . . . . . . . . . . . . . . . . . . . . . . . 423
16.6.1 Bypassing ODE Integration . . . . . . . . . . . . . . . . . . . 423
16.6.2 Using ODE Integration . . . . . . . . . . . . . . . . . . . . . . 423
16.7 Detecting Defects in Hardwood Logs . . . . . . . . . . . . . . . . . . 424
16.8 Modeling Black-Box Nonlinear Systems . . . . . . . . . . . . . . . . 426
16.8.1 Modeling a Static System by Combining
Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
16.8.2 LOLIMOT for Static Systems . . . . . . . . . . . . . . . . . 428
16.8.3 LOLIMOT for Dynamical Systems. . . . . . . . . . . . . . 429
16.9 Designing a Predictive Controller with l2 and l1 Norms . . . . . 429
16.9.1 Estimating the Model Parameters . . . . . . . . . . . . . . . 430
16.9.2 Computing the Input Sequence. . . . . . . . . . . . . . . . . 431
16.9.3 From an l2 Norm to an l1 Norm. . . . . . . . . . . . . . . . 433
16.10 Discovering and Using Recursive Least Squares . . . . . . . . . . 434
16.10.1 Batch Linear Least Squares . . . . . . . . . . . . . . . . . . . 435
16.10.2 Recursive Linear Least Squares . . . . . . . . . . . . . . . . 436
16.10.3 Process Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
16.11 Building a Lotka–Volterra Model . . . . . . . . . . . . . . . . . . . . . 438
16.12 Modeling Signals by Prony’s Method . . . . . . . . . . . . . . . . . . 440
16.13 Maximizing Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . 441
16.13.1 Modeling Performance . . . . . . . . . . . . . . . . . . . . . . 441
16.13.2 Tuning the Design Factors. . . . . . . . . . . . . . . . . . . . 443
16.14 Modeling AIDS Infection . . . . . . . . . . . . . . . . . . . . . . . . . . 443
16.14.1 Model Analysis and Simulation . . . . . . . . . . . . . . . . 444
16.14.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . 444
16.15 Looking for Causes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
16.16 Maximizing Chemical Production. . . . . . . . . . . . . . . . . . . . . 446
xiv Contents

16.17 Discovering the Response-Surface Methodology . . . . . . . . . . 448
16.18 Estimating Microparameters via Macroparameters . . . . . . . . . 450
16.19 Solving Cauchy Problems for Linear ODEs. . . . . . . . . . . . . . 451
16.19.1 Using Generic Methods. . . . . . . . . . . . . . . . . . . . . . 452
16.19.2 Computing Matrix Exponentials . . . . . . . . . . . . . . . . 452
16.20 Estimating Parameters Under Constraints . . . . . . . . . . . . . . . 453
16.21 Estimating Parameters with lp Norms . . . . . . . . . . . . . . . . . . 454
16.22 Dealing with an Ambiguous Compartmental Model . . . . . . . . 456
16.23 Inertial Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
16.24 Modeling a District Heating Network . . . . . . . . . . . . . . . . . . 459
16.24.1 Schematic of the Network . . . . . . . . . . . . . . . . . . . . 459
16.24.2 Economic Model . . . . . . . . . . . . . . . . . . . . . . . . . . 459
16.24.3 Pump Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
16.24.4 Computing Flows and Pressures . . . . . . . . . . . . . . . . 460
16.24.5 Energy Propagation in the Pipes. . . . . . . . . . . . . . . . 461
16.24.6 Modeling the Heat Exchangers. . . . . . . . . . . . . . . . . 461
16.24.7 Managing the Network . . . . . . . . . . . . . . . . . . . . . . 462
16.25 Optimizing Drug Administration . . . . . . . . . . . . . . . . . . . . . 462
16.26 Shooting at a Tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
16.27 Sparse Estimation Based on POCS . . . . . . . . . . . . . . . . . . . . 465
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Contents xv

Chapter 1
From Calculus to Computation
High-school education has led us to view problem solving in physics and chemistry
as the process of elaborating explicit closed-form solutions in terms of unknown
parameters, and then using these solutions in numerical applications for specific
numerical values of these parameters. As a result, we were only able to consider a
very limited set of problems that were simple enough for us to find such closed-form
solutions.
Unfortunately, most real-life problems in pure and applied sciences are not
amenable to such an explicit mathematical solution. One must then often move from
formal calculus to numerical computation. This is particularly obvious in engineer-
ing, where computer-aided design based on numerical simulations is the rule.
This book is about numerical computation, and says next to nothing about formal
computation as made possible by computer algebra, although they usefully comple-
ment one another. Using floating-point approximations of real numbers means that
approximate operations are carried out on approximate numbers. To protect oneself
against potential numerical disasters, one should then select methods that keep final
errors as small as possible. It turns out that many of the methods learnt in high school
or college to solve elementary mathematical problems are ill suited to floating-point
computation and should be replaced.
Shifting paradigm from calculus to computation, we will attempt to
• discover how to escape the dictatorship of those particular cases that are simple
enoughtoreceiveaclosed-formsolution,andthusgaintheabilitytosolvecomplex,
real-life problems,
• understand the principles behind recognized methods used in state-of-the-art
numerical software,
• stress the advantages and limitations of these methods, thus gaining the ability to
choose what pre-existing bricks to assemble for solving a given problem.
Presentation is at an introductory level, nowhere near the level of detail required
for implementing methods efficiently. Our main aim is to help the reader become
É. Walter, Numerical Methods and Optimization, 1
DOI: 10.1007/978-3-319-07671-3_1,
© Springer International Publishing Switzerland 2014

2 1 From Calculus to Computation
a better consumer of numerical methods, with some ability to choose among those
available for a given task, some understanding of what they can and cannot do, and
some power to perform a critical appraisal of the validity of their results.
By the way, the desire to write down every line of the code one plans to use should
be resisted. So much time and effort have been spent polishing code that implements
standard numerical methods that the probability one might do better seems remote
at best. Coding should be limited to what cannot be avoided or can be expected to
improve on the state of the art in easily available software (a tall order). One will
thus save time to think about the big picture:
• what is the actual problem that I want to solve? (As Richard Hamming puts it [1]:
Computing is, or at least should be, intimately bound up with both the source of
the problem and the use that is going to be made of the answers—it is not a step
to be taken in isolation.)
• how can I put this problem in mathematical form without betraying its meaning?
• howshouldIsplittheresultingmathematicalproblemintowell-definedandnumer-
ically achievable subtasks?
• what are the advantages and limitations of the numerical methods readily available
for these subtasks?
• should I choose among these methods or find an alternative route?
• what is the most efficient use of my resources (time, computers, libraries of rou-
tines, etc.)?
• how can I check the quality of my results?
• what measures should I take, if it turns out that my choices have failed to yield a
satisfactory solution to the initial problem?
A deservedly popular series of books on numerical algorithms [2] includes Numer-
ical Recipes in their titles. Carrying on with this culinary metaphor, one should get
a much more sophisticated dinner by choosing and assembling proper dishes from
the menu of easily available scientific routines than by making up the equivalent
of a turkey sandwich with mayo in one’s numerical kitchen. To take another anal-
ogy, electrical engineers tend to avoid building systems from elementary transis-
tors, capacitors, resistors and inductors when they can take advantage of carefully
designed, readily available integrated circuits.
Deciding not to code algorithms for which professional-grade routines are avail-
able does not mean we have to treat them as magical black boxes, so the basic
principles behind the main methods for solving a given class of problems will be
explained.
The level of mathematical proficiency required to read what follows is a basic
understanding of linear algebra as taught in introductory college courses. It is hoped
that those who hate mathematics will find here reasons to reconsider their position
in view of how useful it turns out to be for the solution of real-life problems, and that
those who love it will forgive me for daring simplifications and discover fascinating,
practical aspects of mathematics in action.
The main ingredients will be classical Cuisine Bourgeoise, with a few words about
recipes best avoided, and a dash of Nouvelle Cuisine.

1.1 Why Not Use Naive Mathematical Methods? 3
1.1 Why Not Use Naive Mathematical Methods?
There are at least three reasons why naive methods learnt in high school or college
may not be suitable.
1.1.1 Too Many Operations
Consider a (not-so-common) problem for which an algorithm is available that would
give an exact solution in a finite number of steps if all of the operations required
were carried out exactly. A first reason why such an exact finite algorithm may not
be suitable is when it requires an unnecessarily large number of operations.
Example 1.1 Evaluating determinants
Evaluating the determinant of a dense (n × n) matrix A by cofactor expansion
requires more than n! floating-points operations (or flops), whereas methods based
on a factorization of A do so in about n3 flops. For n = 100, for instance, n! is slightly
less than 10158, when the number of atoms in the observable universe is estimated
to be less than 1081, and n3 = 106.
1.1.2 Too Sensitive to Numerical Errors
Because they were developed without taking the effect of rounding into account,
classical methods for solving numerical problems may yield totally erroneous results
in a context of floating-point computation.
Example 1.2 Evaluating the roots of a second-order polynomial equation
The solutions x1 and x2 of the equation
ax2
+ bx + c = 0 (1.1)
are to be evaluated, with a, b, and c known floating-point numbers such that x1 and
x2 are real numbers. We have learnt in high school that
x1 =
−b +
√
b2 − 4ac
2a
and x2 =
−b −
√
b2 − 4ac
2a
. (1.2)
This is an example of a verifiable algorithm, as it suffices to check that the value of
the polynomial at x1 or x2 is zero to ensure that x1 or x2 is a solution.
This algorithm is suitable as long as it does not involve computing the difference
between two floating-point numbers that are close to one another, as would hap-
pen if |4ac| were too small compared to b2. Such a difference may be numerically

disastrous, and should be avoided. To this end, one may use the following algorithm,
which is also verifiable and takes benefit from the fact that x1x2 = c/a:
q =
−b − sign(b)
√
b2 − 4ac
2
, (1.3)
x1 =
q
a
, x2 =
c
q
. (1.4)
Although these two algorithms are mathematically equivalent, the second one is
much more robust to errors induced by floating-point operations than the first (see
Sect.14.7 for a numerical comparison). This does not, however, solve the problem
that appears when x1 and x2 tend toward one another, as b2 −4ac then tends to zero.
We will encounter many similar situations, where naive algorithms need to be
replaced by more robust or less costly variants.
1.1.3 Unavailable
Quite frequently, there is no mathematical method for finding the exact solution of
the problem of interest. This will be the case, for instance, for most simulation or
optimization problems, as well as for most systems of nonlinear equations.
1.2 What to Do, Then?
Mathematics should not be abandoned along the way, as it plays a central role in
deriving efficient numerical algorithms. Finding amazingly accurate approximate
solutions often becomes possible when the specificity of computing with floating-
point numbers is taken into account.
1.3 How Is This Book Organized?
Simple problems are addressed first, before moving on to more ambitious ones,
building on what has already been presented. The order of presentation is as follows:
• notation and basic notions,
• algorithms for linear algebra (solving systems of linear equations, inverting matri-
ces, computing eigenvalues, eigenvectors, and determinants),
• interpolating and extrapolating,
• integrating and differentiating,

1.3 How Is This Book Organized? 5
• solving systems of nonlinear equations,
• optimizing when there is no constraint,
• optimizing under constraints,
• solving ordinary differential equations,
• solving partial differential equations,
• assessing the precision of numerical results.
This classification is not tight. It may be a good idea to transform a given problem
into another one. Here are a few examples:
• to find the roots of a polynomial equation, one may look for the eigenvalues of a
matrix, as in Example 4.3,
• to evaluate a definite integral, one may solve an ordinary differential equation, as
in Sect.6.2.4,
• to solve a system of equations, one may minimize a norm of the deviation between
the left- and right-hand sides, as in Example 9.8,
• to solve an unconstrained optimization problem, one may introduce new variables
and impose constraints, as in Example 10.7.
Most of the numerical methods selected for presentation are important ingredients
in professional-grade numerical code. Exceptions are
• methods based on ideas that easily come to mind but are actually so bad that they
need to be denounced, as in Example1.1,
• prototype methods that may help one understand more sophisticated approaches,
as when one-dimensional problems are considered before the multivariate case,
• promising methods mostly available at present from academic research institu-
tions, such as methods for guaranteed optimization and simulation.
MATLAB is used to demonstrate, through simple yet not necessarily trivial exam-
ples typeset in typewriter, how easily classical methods can be put to work. It
would be hazardous, however, to draw conclusions on the merits of these methods on
the sole basis of these particular examples. The reader is invited to consult the MAT-
LAB documentation for more details about the functions available and their optional
arguments. Additional information, including illuminating examples, can be found
in [3], with ancillary material available on the WEB, and [4]. Although MATLAB is
the only programming language used in this book, it is not appropriate for solving all
numerical problems in all contexts. A number of potentially interesting alternatives
will be mentioned in Chap.15.
This book concludes with a chapter about WEB resources that can be used to
go further and a collection of problems. Most of these problems build on material
pertaining to several chapters and could easily be translated into computer-lab work.

This book was typeset with TeXmacs before exportation to LaTeX. Many
thanks to Joris van der Hoeven and his coworkers for this awesome and truly
WYSIWYG piece of software, freely downloadable at http://www.texmacs.
org/.
References
1. Hamming, R.: Numerical Methods for Scientists and Engineers. Dover, New York (1986)
2. Press, W., Flannery, B., Teukolsky, S., Vetterling, W.: Numerical Recipes. Cambridge University
Press, Cambridge (1986)
3. Moler C.: Numerical Computing with MATLAB, revised, reprinted edn. SIAM, Philadelphia
(2008)
4. Ascher, U., Greif, C.: A First Course in Numerical Methods. SIAM, Philadelphia (2011)

Chapter 2
Notation and Norms
2.1 Introduction
This chapter recalls the usual convention for distinguishing scalars, vectors, and
matrices. Vetter’s notation for matrix derivatives is then explained, as well as the
meaning of the expressions little o and big O employed for comparing the local
or asymptotic behaviors of functions. The most important vector and matrix norms
are finally described. Norms find a first application in the definition of types of
convergence speeds for iterative algorithms.
2.2 Scalars, Vectors, and Matrices
Unless stated otherwise, scalar variables are real valued, as are the entries of vectors
and matrices.
Italics are for scalar variables (v or V ), bold lower-case letters for column vectors
(v), and bold upper-case letters for matrices (M). Transposition, the transformation
of columns into rows in a vector or matrix, is denoted by the superscript T. It applies
to what is to its left, so vT is a row vector and, in ATB, A is transposed, not B.
The identity matrix is I, with In the (n ×n) identity matrix. The ith column vector
of I is the canonical vector ei .
The entry at the intersection of the ith row and jth column of M is mi, j . The
product of matrices
C = AB (2.1)
thus implies that
ci, j =
k
ai,kbk, j , (2.2)
DOI: 10.1007/978-3-319-07671-3_2,

8 2 Notation and Norms
and the number of columns in A must be equal to the number of rows in B. Recall
that the product of matrices (or vectors) is not commutative, in general. Thus, for
instance, when v and w are columns vectors with the same dimension, vTw is a scalar
whereas wvT is a (rank-one) square matrix.
Useful relations are
(AB)T
= BT
AT
, (2.3)
and, provided that A and B are invertible,
(AB)−1
= B−1
A−1
. (2.4)
If M is square and symmetric, then all of its eigenvalues are real. M √ 0 then means
that each of these eigenvalues is strictly positive (M is positive definite), while M 0
allows some of them to be zero (M is non-negative definite).
2.3 Derivatives
Provided that f (·) is a sufficiently differentiable function from R to R,
˙f (x) =
d f
dx
(x), (2.5)
¨f (x) =
d2 f
dx2
(x), (2.6)
f (k)
(x) =
dk f
dxk
(x). (2.7)
Vetter’s notation [1] will be used for derivatives of matrices with respect to matri-
ces. (A word of caution is in order: there are other, incompatible notations, and one
should be cautious about mixing formulas from different sources.)
If A is (nA × mA) and B (nB × mB), then
M =
∂A
∂B
(2.8)
is an (nAnB × mAmB) matrix, such that the (nA × mA) submatrix in position (i, j) is
Mi, j =
∂A
∂bi, j
. (2.9)
Remark 2.1 A and B in (2.8) may be row or column vectors.

2.3 Derivatives 9
Example 2.1 If v is a generic column vector of Rn, then
∂v
∂vT
=
∂vT
∂v
= In. (2.10)
Example 2.2 If J(·) is a differentiable function from Rn to R, and x a vector of Rn,
then
∂ J
∂x
(x) =
⎡
⎢
⎢
⎢
⎢
⎣
∂ J
∂x1
∂ J
∂x2
...
∂ J
∂xn
⎤
⎥
⎥
⎥
⎥
⎦
(x) (2.11)
is a column vector, called the gradient of J(·) at x.
Example 2.3 If J(·) is a twice differentiable function from Rn to R, and x a vector
of Rn, then
∂2 J
∂x∂xT
(x) =
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
∂2 J
∂x2
1
∂2 J
∂x1∂x2
· · · ∂2 J
∂x1∂xn
∂2 J
∂x2∂x1
∂2 J
∂x2
2
...
...
...
...
∂2 J
∂xn∂x1
· · · · · · ∂2 J
∂x2
n
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
(x) (2.12)
is an (n × n) matrix, called the Hessian of J(·) at x. Schwarz’s theorem ensures that
∂2 J
∂xi ∂x j
(x) =
∂2 J
∂x j ∂xi
(x) , (2.13)
provided that both are continuous at x and x belongs to an open set in which both are
deﬁned. Hessians are thus symmetric, except in pathological cases not considered
here.
Example 2.4 If f(·) is a differentiable function from Rn to Rp, and x a vector of Rn,
then
J(x) =
∂f
∂xT
(x) =
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎣
∂ f1
∂x1
∂ f1
∂x2
· · · ∂ f1
∂xn
∂ f2
∂x1
∂ f2
∂x2
...
...
...
...
∂ fp
∂x1
· · · · · ·
∂ fp
∂xn
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
(2.14)
is the (p × n) Jacobian matrix of f(·) at x. When p = n, the Jacobian matrix is
square and its determinant is the Jacobian.

Remark 2.2 The last three examples show that the Hessian of J(·) at x is the Jacobian
matrix of its gradient function evaluated at x.
Remark 2.3 Gradients and Hessians are frequently used in the context of optimiza-
tion, and Jacobian matrices when solving systems of nonlinear equations.
Remark 2.4 The Nabla operator ∇, a vector of partial derivatives with respect to all
the variables of the function on which it operates
∇ =
⎞
∂
∂x1
, . . . ,
∂
∂xn
⎠T
, (2.15)
is often used to make notation more concise, especially for partial differential equa-
tions. Applying ∇ to a scalar function J and evaluating the result at x, one gets the
gradient vector
∇ J(x) =
∂ J
∂x
(x). (2.16)
If the scalar function is replaced by a vector function f, one gets the Jacobian matrix
∇f(x) =
∂f
∂xT
(x), (2.17)
where ∇f is interpreted as ∇fT T
.
By applying ∇ twice to a scalar function J and evaluating the result at x, one gets
the Hessian matrix
∇2
J(x) =
∂2 J
∂x∂xT
(x). (2.18)
(∇2 is sometimes taken to mean the Laplacian operator , such that
f (x) =
n
i=1
∂2 f
∂x2
i
(x) (2.19)
is a scalar. The context and dimensional considerations should make what is meant
clear.)
Example 2.5 If v, M, and Q do not depend on x and Q is symmetric, then
∂
∂x
(vT
x) = v, (2.20)
∂
∂xT
(Mx) = M, (2.21)
∂
∂x
(xT
Mx) = (M + MT
)x (2.22)

2.3 Derivatives 11
and
∂
∂x
(xT
Qx) = 2Qx. (2.23)
These formulas will be used quite frequently.
2.4 Little o and Big O
The function f (x) is o(g(x)) as x tends to x0 if
lim
x→x0
f (x)
g(x)
= 0, (2.24)
so f (x) gets negligible compared to g(x) for x sufficiently close to x0. In what
follows, x0 is always taken equal to zero, so this need not be specified, and we just
write f (x) = o(g(x)).
The function f (x) is O(g(x)) as x tends to infinity if there exists real numbers x0
and M such that
x > x0 ⇒ | f (x)| M|g(x)|. (2.25)
The function f (x) is O(g(x)) as x tends to zero if there exists real numbers δ and M
such that
|x| < δ ⇒ | f (x)| M|g(x)|. (2.26)
The notation O(x) or O(n) will be used in two contexts:
• when dealing with Taylor expansions, x is a real number tending to zero,
• when analyzing algorithmic complexity, n is a positive integer tending to infinity.
Example 2.6 The function
f (x) =
m
i=2
ai xi
,
with m 2, is such that
lim
x→0
f (x)
x
= lim
x→0
m
i=2
ai xi−1
= 0,
so f (x) = o(x) when x tends to zero. Now, if |x| < 1, then
| f (x)|
x2
<
m
i=2
|ai |,

so f (x) = O(x2) when x tends to zero. If, on the other hand, x is taken equal to the
(large) positive integer n, then
f (n) =
m
i=2
ai ni
m
i=2
|ai ni
|
m
i=2
|ai | · nm
,
so f (n) = O(nm) when n tends to infinity.
2.5 Norms
A function f (·) from a vector space V to R is a norm if it satisfies the following
three properties:
1. f (v) 0 for all v ∈ V (positivity),
2. f (αv) = |α| · f (v) for all α ∈ R and v ∈ V (positive scalability),
3. f (v1 ± v2) f (v1) + f (v2) for all v1 ∈ V and v2 ∈ V (triangle inequality).
These properties imply that f (v) = 0 ⇒ v = 0 (non-degeneracy). Another useful
relation is
| f (v1
) − f (v2
)| f (v1
± v2
). (2.27)
Norms are used to quantify distances between vectors. They play an essential role,
for instance, in the characterization of the intrinsic difficulty of numerical problems
via the notion of condition number (see Sect.3.3) or in the definition of cost functions
for optimization.
2.5.1 Vector Norms
The most commonly used norms in Rn are the lp norms
v p =
n
i=1
|vi |p
1
p
, (2.28)
with p 1. They include

2.5 Norms 13
• the Euclidean norm (or l2 norm)
||v||2 =
n
i=1
v2
i =
⊂
vTv, (2.29)
• the taxicab norm (or Manhattan norm, or grid norm, or l1 norm)
||v||1 =
n
i=1
|vi |, (2.30)
• the maximum norm (or l∞ norm, or Chebyshev norm, or uniform norm)
||v||∞ = max
1 i n
|vi |. (2.31)
They are such that
||v||2 ||v||1 n||v||∞, (2.32)
and
vT
w v 2 · w 2. (2.33)
The latter result is known as the Cauchy-Schwarz inequality.
Remark 2.5 If the entries of v were complex, norms would be deﬁned differently.
The Euclidean norm, for instance, would become
||v||2 =
⊂
vHv, (2.34)
where vH is the transconjugate of v, i.e., the row vector obtained by transposing the
column vector v and replacing each of its entries by its complex conjugate.
Example 2.7 For the complex vector
v =
a
ai
,
where a is some nonzero real number and i is the imaginary unit (such that i2 = −1),
vTv = 0. This proves that
⊂
vTv is not a norm. The value of the Euclidean norm of
v is
⊂
vHv =
⊂
2|a|.
Remark 2.6 The so-called l0 norm of a vector is the number of its nonzero entries.
Used in the context of sparse estimation, where one is looking for an estimated
parameter vector with as few nonzero entries as possible, it is not a norm, as it does
not satisfy the property of positive scalability.

2.5.2 Matrix Norms
Each vector norm induces a matrix norm, deﬁned as
||M|| = max
||v||=1
||Mv||, (2.35)
so
Mv M · v (2.36)
for any M and v for which the product Mv makes sense. This matrix norm is sub-
ordinate to the vector norm inducing it. The matrix and vector norms are then said
to be compatible, an important property for the study of products of matrices and
vectors.
• The matrix norm induced by the vector norm l2 is the spectral norm, or 2-norm ,
||M||2 = ρ(MTM), (2.37)
where ρ(·) is the function that computes the spectral radius of its argument, i.e., the
modulus of the eigenvalue(s) with the largest modulus. Since all the eigenvalues
of MTM are real and non-negative, ρ(MTM) is the largest of these eigenvalues.
Its square root is the largest singular value of M, denoted by σmax(M). So
||M||2 = σmax(M). (2.38)
• The matrix norm induced by the vector norm l1 is the 1-norm
||M||1 = max
j
i
|mi, j |, (2.39)
which amounts to summing the absolute values of the entries of each column in
turn and keeping the largest result.
• The matrix norm induced by the vector norm l∞ is the inﬁnity norm
||M||∞ = max
i
j
|mi, j |, (2.40)
which amounts to summing the absolute values of the entries of each row in turn
and keeping the largest result. Thus
||M||1 = ||MT
||∞. (2.41)

2.5 Norms 15
Since each subordinate matrix norm is compatible with its inducing vector norm,
||v||1 is compatible with ||M||1, (2.42)
||v||2 is compatible with ||M||2, (2.43)
||v||∞ is compatible with ||M||∞. (2.44)
The Frobenius norm
||M||F =
i, j
m2
i, j = trace MTM (2.45)
deserves a special mention, as it is not induced by any vector norm yet
||v||2 is compatible with ||M||F. (2.46)
Remark 2.7 To evaluate a vector or matrix norm with MATLAB (or any other inter-
preted language based on matrices), it is much more efficient to use the corresponding
dedicated function than to access the entries of the vector or matrix individually to
implement the norm definition. Thus, norm(X,p) returns the p-norm of X, which
may be a vector or a matrix, while norm(M,’fro’) returns the Frobenius norm
of the matrix M.
2.5.3 Convergence Speeds
Norms can be used to study how quickly an iterative method would converge to the
solution xν if computation were exact. Define the error at iteration k as
ek
= xk
− xν
, (2.47)
where xk is the estimate of xν at iteration k. The asymptotic convergence speed is
linear if
lim sup
k→∞
ek+1
ek
= α < 1, (2.48)
with α the rate of convergence.
It is superlinear if
lim sup
k→∞
ek+1
ek
= 0, (2.49)
and quadratic if
lim sup
k→∞
ek+1
ek 2
= α < ∞. (2.50)

A method with quadratic convergence thus also has superlinear and linear
convergence. It is customary, however, to qualify a method with the best convergence
it achieves. Quadratic convergence is better that superlinear convergence, which is
better than linear convergence.
Remember that these convergence speeds are asymptotic, valid when the error
has become small enough, and that they do not take the effect of rounding into
account. They are meaningless if the initial vector x0 was too badly chosen for the
method to converge to xν. When the method does converge to xν, they may not
describe accurately its initial behavior and will no longer be true when rounding
errors become predominant. They are nevertheless an interesting indication of what
can be expected at best.
Reference
1. Vetter, W.: Derivative operations on matrices. IEEE Trans. Autom. Control 15, 241–244 (1970)

Chapter 3
Solving Systems of Linear Equations
3.1 Introduction
Linear equations are first-order polynomial equations in their unknowns. A system
of linear equations can thus be written as
Ax = b, (3.1)
where the matrix A and the vector b are known and where x is a vector of unknowns.
We assume in this chapter that
• all the entries of A, b and x are real numbers,
• there are n scalar equations in n scalar unknowns (A is a square (n × n) matrix
and dim x = dim b = n),
• these equations uniquely define x (A is invertible).
When A is invertible, the solution of (3.1) for x is unique, and given mathematically
in closed form as x = A−1b. We are not interested here in this closed-form solution,
and wish instead to compute x numerically from numerically known A and b. This
problem plays a central role in so many algorithms that it deserves a chapter of
its own. Systems of linear equations with more equations than unknowns will be
considered in Sect.9.2.
Remark 3.1 When A is square but singular (i.e., not invertible), its columns no longer
form a basis of Rn, so the vector Ax cannot take all directions in Rn. The direction of
b will thus determine whether (3.1) admits infinitely many solutions for x or none.
When b can be expressed as a linear combination of columns of A, the equations
are linearly dependent and there is a continuum of solutions. The system x1 +x2 = 1
and 2x1 + 2x2 = 2 corresponds to this situation.
WhenbcannotbeexpressedasalinearcombinationofcolumnsofA,theequations
are incompatible and there is no solution. The system x1 + x2 = 1 and x1 + x2 = 2
corresponds to this situation.
DOI: 10.1007/978-3-319-07671-3_3,

18 3 Solving Systems of Linear Equations
Great books covering the topics of this chapter and Chap.4 (as well as topics
relevant to many others chapters) are [1–3].
3.2 Examples
Example 3.1 Determination of a static equilibrium
The conditions for a linear dynamical system to be in static equilibrium translate
into a system of linear equations. Consider, for instance, a series of three vertical
springs si (i = 1, 2, 3), with the ﬁrst of them attached to the ceiling and the last
to an object with mass m. The mass of each spring is neglected, and the stiffness
coefﬁcient of the ith spring is denoted by ki . We want to compute the elongation xi
of the bottom end of spring i (i = 1, 2, 3) resulting from the action of the mass of
the object when the system has reached static equilibrium. The sum of all the forces
acting at any given point is then zero. Provided that m is small enough for Hooke’s
law of elasticity to apply, the following linear equations thus hold true
mg = k3(x3 − x2), (3.2)
k3(x2 − x3) = k2(x1 − x2), (3.3)
k2(x2 − x1) = k1x1, (3.4)
where g is the acceleration due to gravity. This system of linear equations can be
written as 
⎡
k1 + k2 −k2 0
−k2 k2 + k3 −k3
0 −k3 k3
⎢
⎣ ·

⎡
x1
x2
x3
⎢
⎣ =

⎡
0
0
mg
⎢
⎣ . (3.5)
The matrix in the left-hand side of (3.5) is tridiagonal, as only its main descending
diagonal and the descending diagonals immediately over and below it are nonzero.
This would still be true if there were many more strings in series, in which case the
matrix would also be sparse, i.e., with a majority of zero entries. Note that changing
the mass of the object would only modify the right-hand side of (3.5), so one might
be interested in solving a number of systems that share the same matrix A.
Example 3.2 Polynomial interpolation
Assume that the value yi of some quantity of interest has been measured at time
ti (i = 1, 2, 3). Interpolating these data with the polynomial
P(t, x) = a0 + a1t + a2t2
, (3.6)
where x = (a0, a1, a2)T, boils down to solving (3.1) with

3.2 Examples 19
A =

⎤
⎡
1 t1 t2
1
1 t2 t2
2
1 t3 t2
3
⎢
⎥
⎣ and b =

⎡
y1
y2
y3
⎢
⎣ . (3.7)
For more on interpolation, see Chap.5.
3.3 Condition Number(s)
The notion of condition number plays a central role in assessing the intrinsic difficulty
of solving a given numerical problem independently of the algorithm to be employed
[4, 5]. It can thus be used to detect problems about which one should be particularly
careful. We limit ourselves here to the problem of solving (3.1) for x. In general, A and
b are imperfectly known, for at least two reasons. First, the mere fact of converting
real numbers to their floating-point representation or of performing floating-point
computations almost always entails approximations. Moreover, the entries of A and
b often result from imprecise measurements. It is thus important to quantify the effect
that perturbations on A and b may have on the solution x.
Substitute A + ∂A for A and b + ∂b for b, and define ⎦x as the solution of the
perturbed system
(A + ∂A)⎦x = b + ∂b. (3.8)
The difference between the solutions of the perturbed system (3.8) and original
system (3.1) is
∂x = ⎦x − x. (3.9)
It satisfies
∂x = A−1
[∂b − (∂A)⎦x] . (3.10)
Provided that compatible norms are used, this implies that
||∂x|| ||A−1
|| · (||∂b|| + ||∂A|| · ||⎦x||) . (3.11)
Divide both sides of (3.11) by ||⎦x||, and multiply the right-hand side of the result by
||A||/||A|| to get
||∂x||
||⎦x||
||A−1
|| · ||A||
⎞
||∂b||
||A|| · ||⎦x||
+
||∂A||
||A||
⎠
. (3.12)
The multiplicative coefficient ||A−1||·||A|| appearing in the right-hand side of (3.12)
is the condition number of A
cond A = ||A−1
|| · ||A||. (3.13)

It quantiﬁes the consequences of an error on A or b on the error on x. We wish it to
be as small as possible, so that the solution be as insensitive as possible to the errors
∂A and ∂b.
Remark 3.2 When the errors on b are negligible, (3.12) becomes
||∂x||
||⎦x||
(cond A) ·
⎞
||∂A||
||A||
⎠
. (3.14)
Remark 3.3 When the errors on A are negligible,
∂x = A−1
∂b, (3.15)
so
√∂x√ √A−1√ · √∂b√. (3.16)
Now (3.1) implies that
√b√ √A√ · √x√, (3.17)
and (3.16) and (3.17) imply that
√∂x√ · √b√ √A−1
√ · √A√ · √∂b√ · √x√, (3.18)
so
√∂x√
√x√
(cond A) ·
⎞
||∂b||
||b||
⎠
. (3.19)
Since
1 = ||I|| = ||A−1
· A|| ||A−1
|| · ||A||, (3.20)
the condition number of A satisﬁes
cond A 1. (3.21)
Its value depends on the norm used. For the spectral norm,
||A||2 = σmax(A), (3.22)
where σmax(A) is the largest singular value of A. Since
||A−1
||2 = σmax(A−1
) =
1
σmin(A)
, (3.23)

3.3 Condition Number(s) 21
with σmin(A) the smallest singular value of A, the condition number of A for the
spectral norm is the ratio of its largest singular value to its smallest
cond A =
σmax(A)
σmin(A)
. (3.24)
The larger the condition number of A is, the more ill-conditioned solving (3.1)
becomes.
It is useful to compare cond A with the inverse of the precision of the floating-point
representation. For a double-precision representation according to IEEE Standard
754 (typical of MATLAB computations), this precision is about 10−16.
Solving (3.1) for x when cond A is not small compared to 1016 requires special
care.
Remark 3.4 Although this is probably the worst method for computing singular
values, the singular values of A are the square roots of the eigenvalues of ATA.
(When A is symmetric, its singular values are thus equal to the absolute values of its
eigenvalues.)
Remark 3.5 A is singular if and only if its determinant is zero, so one might have
thought of using the value of det A as an index of conditioning, with a small deter-
minant indicative of a nearly singular system. However, it is very difficult to check
that a floating-point number differs significantly from zero (think of what happens to
the determinant of A if A and b are multiplied by a large or small positive number,
which has no effect on the difficulty of the problem). The condition number is a much
more meaningful index of conditioning, as it is invariant to a multiplication of A by
a nonzero scalar of any magnitude (a consequence of the positive scalability of the
norm). Compare det(10−1In) = 10−n with cond(10−1In) = 1.
Remark 3.6 The numerical value of cond A depends on the norm being used, but an
ill-conditioned problem for one norm should also be ill-conditioned for the others,
so the choice of a given norm is just a matter of convenience.
Remark 3.7 Although evaluating the condition number of a matrix for the spectral
norm just takes one call to the MATLAB function cond(·), this may actually require
more computation than solving (3.1). Evaluating the condition number of the same
matrix for the 1-norm (by a call to the function cond(·,1)), is less costly than for
the spectral norm, and algorithms are available to get cheaper estimates of its order
of magnitude [2, 6, 7], which is what we are actually interested in, after all.
Remark 3.8 The concept of condition number extends to rectangular matrices, and
the condition number for the spectral norm is then still given by (3.24). It can also
be extended to nonlinear problems, see Sect.14.5.2.1.

3.4 Approaches Best Avoided
For solving a system of linear equations numerically, matrix inversion should
almost always be avoided, as it requires useless computations.
Unless A has some specific structure that makes inversion particularly simple, one
should thus think twice before inverting A to take advantage of the closed-form
solution
x = A−1
b. (3.25)
Cramer’s rule for solving systems of linear equations, which requires the com-
putation of ratios of determinants is the worst possible approach. Determinants are
notoriously difficult to compute accurately and computing these determinants is
unnecessarily costly, even if much more economical methods than cofactor expan-
sion are available.
3.5 Questions About A
A often has specific properties that may be taken advantage of and that may lead to
selecting a specific method rather than systematically using some general-purpose
workhorse. It is thus important to address the following questions:
• Are A and b real (as assumed here)?
• Is A square and invertible (as assumed here)?
• Is A symmetric, i.e., such that AT = A?
• Is A symmetric positive definite (denoted by A ∇ 0)? This means that A is sym-
metric and such that
→v ⇒= 0, vT
Av > 0, (3.26)
which implies that all of its eigenvalues are real and strictly positive.
• If A is large, is it sparse, i.e., such that most of its entries are zeros?
• Is A diagonally dominant, i.e., such that the absolute value of each of its diagonal
entries is strictly larger than the sum of the absolute values of all the other entries
in the same row?
• Is A tridiagonal, i.e., such that only its main descending diagonal and the diagonals
immediately over and below are nonzero?

3.5 Questions About A 23
A =

⎤
⎤
⎤
⎤
⎤
⎤
⎤
⎤
⎤
⎤
⎤
⎡
b1 c1 0 · · · · · · 0
a2 b2 c2 0
...
0 a3
...
...
...
...
... 0
...
...
... 0
...
...
... bn−1 cn−1
0 · · · · · · 0 an bn
⎢
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎣
(3.27)
• Is A Toeplitz, i.e., such that all the entries on the same descending diagonal take the same
value?
A =

⎤
⎤
⎤
⎤
⎤
⎤
⎡
h0 h−1 h−2 · · · h−n+1
h1 h0 h−1 h−n+2
...
...
...
...
...
...
...
... h−1
hn−1 hn−2 · · · h1 h0
⎢
⎥
⎥
⎥
⎥
⎥
⎥
⎣
(3.28)
• Is A well-conditioned? (See Sect.3.3.)
3.6 Direct Methods
Direct methods attempt to solve (3.1) for x in a finite number of steps. They require
a predictable amount of ressources and can be made quite robust, but scale poorly
on very large problems. This is in contrast with iterative methods, considered in
Sect.3.7, which aim at generating a sequence of improving approximations of the
solution. Some iterative methods can deal with millions of unknowns, as encountered
for instance when solving partial differential equations.
Remark 3.9 The distinction between direct and iterative method is not as clear cut
as it may seem; results obtained by direct methods may be improved by iterative
methods (as in Sect.3.6.4), and the most sophisticated iterative methods (presented
in Sect.3.7.2) would find the exact solution in a finite number of steps if computation
were carried out exactly.
3.6.1 Backward or Forward Substitution
Backward or forward substitution applies when A is triangular. This is less of a special
case than it may seem, as several of the methods presented below and applicable to
generic linear systems involve solving triangular systems.
Backward substitution applies to the upper triangular system

Ux = b, (3.29)
where
U =

⎤
⎤
⎤
⎡
u1,1 u1,2 · · · u1,n
0 u2,2 u2,n
...
...
...
...
0 · · · 0 un,n
⎢
⎥
⎥
⎥
⎣
. (3.30)
When U is invertible, all its diagonal entries are nonzero and (3.29) can be solved
for one unknown at a time, starting with the last
xn = bn/un,n, (3.31)
then moving up to get
xn−1 = (bn−1 − un−1,nxn)/un−1,n−1, (3.32)
and so forth, with finally
x1 = (b1 − u1,2x2 − u1,3x3 − · · · − u1,nxn)/u1,1. (3.33)
Forward substitution, on the other hand, applies to the lower triangular system
Lx = b, (3.34)
where
L =

⎤
⎤
⎤
⎤
⎡
l1,1 0 · · · 0
l2,1 l2,2
...
...
...
... 0
ln,1 ln,2 . . . ln,n
⎢
⎥
⎥
⎥
⎥
⎣
. (3.35)
It also solves (3.34) for one unknown at a time, but starts with x1 then moves down
to get x2 and so forth until xn is obtained.
Solving (3.29) by backward substitution can be carried out in MATLAB via the
instruction x=linsolve(U,b,optsUT), provided that optsUT.UT=true,
which specifies that U is an upper triangular matrix. Similarly, solving (3.34) by
forward substitution can be carried out via x=linsolve(L,b,optsLT), pro-
vided that optsLT.LT=true, which specifies that L is a lower triangular matrix.

3.6 Direct Methods 25
3.6.2 Gaussian Elimination
Gaussian elimination [8] transforms the original system (3.1) into an upper triangular
system
Ux = v, (3.36)
by replacing each row of Ax and b by a suitable linear combination of such rows.
This triangular system is then solved by backward substitution, one unknown at a
time. All of this is carried out by the single MATLAB instruction x=Ab. This
attractive one-liner actually hides the fact that A has been factored, and the resulting
factorization is thus not available for later use (for instance, to solve (3.1) with the
same A but another b).
When (3.1) must be solved for several right-hand sides bi (i = 1, . . . , m) all
known in advance, the system
Ax1
· · · xm
= b1
· · · bm
(3.37)
is similarly transformed by row combinations into
Ux1
· · · xm
= v1
· · · vm
. (3.38)
The solutions are then obtained by solving the triangular systems
Uxi
= vi
, i = 1, . . . , m. (3.39)
This classical approach for solving (3.1) has no advantage over LU factorization
presented next. As it works simultaneously on A and b, Gaussian elimination for a
right-hand side b not previously known cannot take advantage of past computations
carried out with other right-hand sides, even if A remains the same.
3.6.3 LU Factorization
LU factorization, a matrix reformulation of Gaussian elimination, is the basic work-
horse to be used when A has no particular structure to be taken advantage of. Consider
ﬁrst its simplest version.
3.6.3.1 LU Factorization Without Pivoting
A is factored as
A = LU, (3.40)

where L is lower triangular and U upper triangular. (It is also known as
LR factorization, with L standing for left triangular and R for right triangular.)
When possible, this factorization is not unique, since L and U contain (n2 + n)
unknown entries whereas A has only n2 entries, which provide as many scalar rela-
tions between L and U. It is therefore necessary to add n constraints to ensure
uniqueness, so we set all the diagonal entries of L equal to one. Equation (3.40) then
translates into
A =

⎤
⎤
⎤
⎤
⎤
⎡
1 0 · · · 0
l2,1 1
...
...
...
...
... 0
ln,1 · · · ln,n−1 1
⎢
⎥
⎥
⎥
⎥
⎥
⎣
·

⎤
⎤
⎤
⎤
⎤
⎡
u1,1 u1,2 · · · u1,n
0 u2,2 u2,n
...
...
...
...
0 · · · 0 un,n
⎢
⎥
⎥
⎥
⎥
⎥
⎣
. (3.41)
When (3.41) admits a solution for its unknowns li, j et ui, j , this solution can be
obtained very simply by considering the equations in the proper order. Each unknown
is then expressed as a function of entries of A and already computed entries of L and
U. For the sake of notational simplicity, and because our purpose is not coding LU
factorization, we only illustrate this with a very small example.
Example 3.3 LU factorization without pivoting
For the system
a1,1 a1,2
a2,1 a2,2
=
1 0
l2,1 1
·
u1,1 u1,2
0 u2,2
, (3.42)
we get
u1,1 = a1,1, u1,2 = a1,2, l2,1u1,1 = a2,1 and l2,1u1,2 + u2,2 = a2,2. (3.43)
So, provided that a11 ⇒= 0,
l2,1 =
a2,1
u1,1
=
a2,1
a1,1
and u2,2 = a2,2 − l2,1u1,2 = a2,2 −
a2,1
a1,1
a1,2. (3.44)
Terms that appear in denominators, such as a1,1 in Example 3.3, are called pivots.
LU factorization without pivoting fails whenever a pivot turns out to be zero.
After LU factorization, the system to be solved is
LUx = b. (3.45)
Its solution for x is obtained in two steps.
First,
Ly = b (3.46)

is solved for y. Since L is lower triangular, this is by forward substitution, each
equation providing the solution for a new unknown. As the diagonal entries of L are
equal to one, this is particularly simple.
Second,
Ux = y (3.47)
is solved for x. Since U is upper triangular, this is by backward substitution, each
equation again providing the solution for a new unknown.
Example 3.4 Failure of LU factorization without pivoting
For
A =
0 1
1 0
,
the pivot a1,1 is equal to zero, so the algorithm fails unless pivoting is carried out, as
presented next. Note that it sufﬁces here to permute the rows of A (as well as those
of b) for the problem to disappear.
Remark 3.10 When no pivot is zero but the magnitude of some of them is too small,
pivoting plays a crucial role for improving the quality of LU factorization.
3.6.3.2 Pivoting
Pivoting is a short name for reordering the equations (and possibly the variables) so
as to avoid zero pivots. When only the equations are reordered, one speaks of partial
pivoting, whereas total pivoting, not considered here, also involves reordering the
variables. (Total pivoting is seldom used, as it rarely provides better results than
partial pivoting while being more expensive.)
Reordering the equations amounts to permuting the same rows in A and in b,
which can be carried out by left-multiplying A and b by a suitable permutation matrix.
The permutation matrix P that exchanges the ith and jth rows of A is obtained by
exchanging the ith and jth rows of the identity matrix. Thus, for instance,

⎡
0 0 1
1 0 0
0 1 0
⎢
⎣ ·

⎡
b1
b2
b3
⎢
⎣ =

⎡
b3
b1
b2
⎢
⎣ . (3.48)
Since det I = 1 and any exchange of two rows changes the sign of the determinant,
we have
det P = ±1. (3.49)
P is an orthonormal matrix (also called unitary matrix), i.e., it is such that
PT
P = I. (3.50)

The inverse of P is thus particularly easy to compute, as
P−1
= PT
. (3.51)
Finally, the product of permutation matrices is a permutation matrix.
3.6.3.3 LU Factorization with Partial Pivoting
When computing the ith column of L, the rows i to n of A are reordered so as
to ensure that the entry with the largest absolute value in the ith column gets on
the diagonal (if it is not already there). This guarantees that all the entries of L are
bounded by one in absolute value. The resulting algorithm is described in [2].
Let P be the permutation matrix summarizing the requested row exchanges on A
and b. The system to be solved becomes
PAx = Pb, (3.52)
and LU factorization is carried out on PA, so
LUx = Pb. (3.53)
Solution for x is again in two steps. First,
Ly = Pb (3.54)
is solved for y, and then
Ux = y (3.55)
is solved for x. Of course the (sparse) permutation matrix P need not be stored as an
(n × n) matrix; it sufﬁces to keep track of the corresponding row exchanges.
Remark 3.11 Algorithms solving systems of linear equations via LU factorization
with partial or total pivoting are readily and freely available on the WEB with a
detailed documentation (in LAPACK, for instance, see Chap.15). The same remark
applies to most of the methods presented in this book. In MATLAB, LU factorization
with partial pivoting is achieved by the instruction [L,U,P]=lu(A).
Remark 3.12 Although the pivoting strategy of LU factorization is not based on
keeping the condition number of the problem unchanged, the increase in this condi-
tion number is mitigated, which makes LU with partial pivoting applicable even to
some very ill-conditioned problems. See Sect.3.10.1 for an illustration.
LU factorization is a ﬁrst example of the decomposition approach to matrix com-
putation [9], where a matrix is expressed as a product of factors. Other examples
are QR factorization (Sects.3.6.5 and 9.2.3), SVD (Sects.3.6.6 and 9.2.4), Cholesky

factorization (Sect.3.8.1), and Schur and spectral decompositions, both carried out
by the QR algorithm (Sect.4.3.6). By concentrating efforts on the development of
efficient, robust algorithms for a few important factorizations, numerical analysts
have made it possible to produce highly effective packages for matrix computation,
with surprisingly diverse applications. Huge savings can be achieved when a number
of problems share the same matrix, which then only needs to be factored once. Once
LU factorization has been carried out on a given matrix A, for instance, all the systems
(3.1) that differ only by their vector b are easily solved with the same factorization,
even if the values of b to be considered were not known when A was factored. This
is a definite advantage over Gaussian elimination where the factorization of A is
hidden in the solution of (3.1) for some pre-specified b.
3.6.4 Iterative Improvement
Let ⎦x be the numerical result obtained when solving (3.1) via LU factorization.
The residual A⎦x − b should be small, but this does not guarantee that ⎦x is a good
approximation of the mathematical solution x = A−1b. One may try to improve ⎦x
by looking for the correction vector ∂x such that
A(⎦x + ∂x) = b, (3.56)
or equivalently that
A∂x = b − A⎦x. (3.57)
Remark 3.13 A is the same in (3.57) as in (3.1), so its LU factorization is already
available.
Once ∂x has been obtained by solving (3.57), ⎦x is replaced by ⎦x + ∂x, and the
procedure may be iterated until convergence, with a stopping criterion on ||∂x||. It is
advisable to compute the residual b − A⎦x with extended precision, as it corresponds
to the difference between hopefully similar floating-point quantities.
Spectacular improvements may be obtained for such a limited effort.
Remark 3.14 Iterative improvement is not limited to the solution of linear systems
of equations via LU factorization.
3.6.5 QR Factorization
Any (n × n) invertible matrix A can be factored as
A = QR, (3.58)

where Q is an (n × n) orthonormal matrix, such that QTQ = In, and R is an (n × n)
invertible upper triangular matrix (which tradition persists in calling R instead of
U...). This QR factorization is unique if one imposes that the diagonal entries of R
are positive, which is not mandatory. It can be carried out in a ﬁnite number of steps.
In MATLAB, this is achieved by the instruction [Q,R]=qr(A).
Multiply (3.1) on the left by QT while taking (3.58) into account, to get
Rx = QT
b, (3.59)
which is easy to solve for x, as R is triangular.
For the spectral norm, the condition number of R is the same as that of A, since
AT
A = (QR)T
QR = RT
QT
QR = RT
R. (3.60)
QR factorization therefore does not worsen conditioning. This is an advantage
over LU factorization, which comes at the cost of more computation.
Remark 3.15 Contrary to LU factorization, QR factorization also applies to rectan-
gular matrices, and will prove extremely useful in the solution of linear least-squares
problems, see Sect.9.2.3.
At least in principle, Gram–Schmidt orthogonalization could be used to carry out
QR factorization, but it suffers from numerical instability when the columns of A are
close to being linearly dependent. This is why the more robust approach presented
in the next section is usually preferred, although a modiﬁed Gram-Schmidt method
could also be employed [10].
3.6.5.1 Householder Transformation
The basic tool for QR factorization is the Householder transformation, described by
the eponymous matrix
H(v) = I − 2
vvT
vTv
, (3.61)
where v is a vector to be chosen. The vector H(v)x is the symmetric of x with respect
to the hyperplan passing through the origin O and orthogonal to v (Fig.3.1).
The matrix H(v) is symmetric and orthonormal. Thus
H(v) = HT
(v) and HT
(v)H(v) = I, (3.62)
which implies that
H−1
(v) = H(v). (3.63)

x
O
v
vv T
v T v
x
x − 2
v v T
v T v
x = H(v)x
Fig. 3.1 Householder transformation
Moreover, since v is an eigenvector of H(v) associated with the eigenvalue −1 and
all the other eigenvectors of H(v) are associated with the eigenvalue 1,
det H(v) = −1. (3.64)
This property will be useful when computing determinants in Sect.4.2.
Assume that v is chosen as
v = x ± ||x||2e1
, (3.65)
where e1 is the vector corresponding to the ﬁrst column of the identity matrix, and
where the ± sign indicates liberty to choose a plus or minus operator. The following
proposition makes it possible to use H(v) to transform x into a vector with all of its
entries equal to zero except for the ﬁrst one.
Proposition 3.1 If
H(+) = H(x + ||x||2e1
) (3.66)
and
H(−) = H(x − ||x||2e1
), (3.67)
then
H(+)x = −||x||2e1
(3.68)
and
H(−)x = +||x||2e1
. (3.69)

Proof If v = x ± ||x||2e1 then
vT
v = xT
x + √x√2
2(e1
)T
e1
± 2√x√2x1 = 2(√x√2
2 ± √x√2x1) = 2vT
x. (3.70)
So
H(v)x = x − 2v
⎞
vTx
vTv
⎠
= x − v = ∈||x||2e1
. (3.71)
Among H(+) and H(−), one should choose
Hbest = H(x + sign (x1)||x||2e1
), (3.72)
to protect oneself against the risk of having to compute the difference of floating-point
numbers that are close to one another. In practice, the matrix H(v) is not formed.
One computes instead the scalar
δ = 2
vTx
vTv
, (3.73)
and the vector
H(v)x = x − δv. (3.74)
3.6.5.2 Combining Householder Transformations
A is triangularized by submitting it to a series of Householder transformations, as
follows.
Start with A0 = A.
Compute A1 = H1A0, where H1 is a Householder matrix that transforms the first
column of A0 into the first column of A1, all the entries of which are zeros except
for the first one. Based on Proposition 3.1, take
H1 = H(a1
+ sign(a1
1)||a1
||2e1
), (3.75)
where a1 is the first column of A0.
Iterate to get
Ak+1 = Hk+1Ak, k = 1, . . . , n − 2. (3.76)
Hk+1 is in charge of shaping the (k +1)-st column of Ak while leaving the k columns
to its left unchanged. Let ak+1 be the vector consisting of the last (n − k) entries
of the (k + 1)-st column of Ak. The Householder transformation must modify only
ak+1, so

Hk+1 =
Ik 0
0 H(ak+1 + sign(ak+1
1 ) ak+1
2
e1)
. (3.77)
In the next equation, for instance, the top and bottom entries of a3 are indicated by
the symbol ×:
A3 =

⎤
⎤
⎤
⎤
⎤
⎤
⎡
• • • · · · •
0 • • · · · •
... 0 ×
...
...
...
...
... • •
0 0 × • •
⎢
⎥
⎥
⎥
⎥
⎥
⎥
⎣
. (3.78)
In (3.77), e1 has the same dimension as ak+1 and all its entries are again zero, except
for the ﬁrst one, which is equal to one.
At each iteration, the matrix H(+) or H(−) that leads to the more stable numerical
computation is selected, see (3.72). Finally
R = Hn−1Hn−2 · · · H1A, (3.79)
or equivalently
A = (Hn−1Hn−2 · · · H1)−1
R = H−1
1 H−1
2 · · · H−1
n−1R = QR. (3.80)
Take (3.63) into account to get
Q = H1H2 · · · Hn−1. (3.81)
Instead of using Householder transformations, one may implement QR factoriza-
tion via Givens rotations [2], which are also robust, orthonormal transformations,
but this makes computation more complex without improving performance.
3.6.6 Singular Value Decomposition
Singular value decomposition (SVD) [11] has turned out to be one of the most fruitful
ideasinthetheoryofmatrices[12].Althoughitismainlyusedonrectangularmatrices
(seeSect.9.2.4,wheretheprocedureisexplainedinmoredetail),itcanalsobeapplied
to any square matrix A, which it transforms into a product of three square matrices
A = U VT
. (3.82)
U and V are orthonormal, i.e.,

UT
U = VT
V = I, (3.83)
which makes their inversion particularly easy, as
U−1
= UT
and V−1
= VT
. (3.84)
is a diagonal matrix, with diagonal entries equal to the singular values of A, so
cond A for the spectral norm is trivial to evaluate from the SVD. In this chapter, A is
assumed to be invertible, which implies that no singular value is zero and is invert-
ible. In MATLAB, the SVD of A is achieved by the instruction [U,S,V]=svd(A).
Equation (3.1) translates into
U VT
x = b, (3.85)
so
x = V −1
UT
b, (3.86)
with −1 trivial to evaluate as is diagonal. As SVD is significantly more complex
than QR factorization, one may prefer the latter.
When cond A is too large, solving (3.1) becomes impossible using floating-point
numbers, even via QR factorization. A better approximate solution may then be
obtained by replacing (3.86) by
⎦x = V⎦−1
UT
b, (3.87)
where ⎦−1
is a diagonal matrix such that
⎦−1
i,i =
1/σi,i if σi,i > ∂
0 otherwise
, (3.88)
with ∂ a positive threshold to be chosen by the user. This amounts to replacing
any singular value of A that is smaller than ∂ by zero, thus pretending that (3.1)
has infinitely many solutions, and then picking up the solution with the smallest
Euclidean norm. See Sect.9.2.6 for more details on this regularization approach in
the context of least squares. This approach should be used with a lot of caution here,
however, as the quality of the approximate solution ⎦x provided by (3.87) depends
heavily on the value taken by b. Assume, for instance, that A is symmetric positive
definite, and that b is an eigenvector of A associated with some very small eigenvalue
αb, such that √b√2 = 1. The mathematical solution of (3.1)
x =
1
αb
b (3.89)
then has a very large Euclidean norm, and should thus be completely different from
⎦x, as the eigenvalue αb is also a (very small) singular value of A and 1/αb will be

replaced by zero in the computation of⎦x. Examples of ill-posed problems for which
regularization via SVD gives interesting results are in [13].
3.7 Iterative Methods
In very large-scale problems such as those involved in the solution of partial dif-
ferential equations, A is typically sparse, which should be taken advantage of. The
direct methods in Sect.3.6 become difficult to use, because sparsity is usually lost
during the factorization of A. One may then use sparse direct solvers (not presented
here), which permute equations and unknowns in an attempt to minimize fill-in in
the factors. This is a complex optimization problem in itself, so iterative methods are
an attractive alternative [2, 14].
3.7.1 Classical Iterative Methods
These methods are slow and now seldom used, but simple to understand. They serve
as an introduction to the more modern Krylov subspace iteration of Sect.3.7.2.
3.7.1.1 Principle
To solve (3.1) for x, decompose A into a sum of two matrices
A = A1 + A2, (3.90)
with A1 (easily) invertible, so as to ensure
x = −A−1
1 A2x + A−1
1 b. (3.91)
Define M = −A−1
1 A2 and v = A−1
1 b to get
x = Mx + v. (3.92)
The idea is to choose the decomposition (3.90) in such a way that the recursion
xk+1
= Mxk
+ v (3.93)
converges to the solution of (3.1) when k tends to infinity. This will be the case if
and only if all the eigenvalues of M are strictly inside the unit circle.

The methods considered below differ in how A is decomposed. We assume that
all diagonal entries of A are nonzero, and write
A = D + L + U, (3.94)
where D is a diagonal invertible matrix with the same diagonal entries as A, L is
a lower triangular matrix with zero main descending diagonal, and U is an upper
triangular matrix also with zero main descending diagonal.
3.7.1.2 Jacobi Iteration
In the Jacobi iteration, A1 = D and A2 = L + U, so
M = −D−1
(L + U) and v = D−1
b. (3.95)
The scalar interpretation of this method is as follows. The jth row of (3.1) is
n
i=1
aj,i xi = bj . (3.96)
Since aj, j ⇒= 0 by hypothesis, it can be rewritten as
x j =
bj − i⇒= j aj,i xi
aj, j
, (3.97)
which expresses x j as a function of the other unknowns. A Jacobi iteration computes
xk+1
j =
bj − i⇒= j aj,i xk
i
aj, j
, j = 1, . . . , n. (3.98)
A sufﬁcient condition for convergence to the solution xρ of (3.1) (whatever the initial
vector x0) is that A be diagonally dominant. This condition is not necessary, and
convergence may take place under less restrictive conditions.
3.7.1.3 Gauss–Seidel Iteration
In the Gauss–Seidel iteration, A1 = D + L and A2 = U, so
M = −(D + L)−1
U and v = (D + L)−1
b. (3.99)
The scalar interpretation becomes

3.7 Iterative Methods 37
xk+1
j =
bj −
j−1
i=1 aj,i xk+1
i − n
i= j+1 aj,i xk
i
aj, j
, j = 1, . . . , n. (3.100)
Note the presence of xk+1
i on the right-hand side of (3.100). The components of xk+1
that have already been evaluated are thus used in the computation of those that have
not. This speeds up convergence and makes it possible to save memory space.
Remark 3.16 The behavior of the Gauss–Seidel method depends on how the vari-
ables are ordered in x, contrary to what happens with the Jacobi method.
As with the Jacobi method, a sufﬁcient condition for convergence to the solution
xρ of (3.1) (whatever the initial vector x0) is that A be diagonally dominant. This
conditionisagainnotnecessary,andconvergencemaytakeplaceunderlessrestrictive
conditions.
3.7.1.4 Successive Over-Relaxation
Thesuccessiveover-relaxationmethod(SOR)wasdevelopedinthecontextofsolving
partial differential equations [15]. It rewrites (3.1) as
(D + σL)x = σb − [σU + (σ − 1)D]x, (3.101)
where σ ⇒= 0 is the relaxation factor, and iterates solving
(D + σL)xk+1
= σb − [σU + (σ − 1)D]xk
(3.102)
for xk+1. As D + σL is lower triangular, this is done by forward substitution, and
equivalent to writing
xk+1
j = (1 − σ)xk
j + σ
bj −
j−1
i=1 aj,i xk+1
i − n
i= j+1 aj,i xk
i
aj, j
, j = 1, . . . , n.
(3.103)
As a result,
xk+1
= (1 − σ)xk
+ σxk+1
GS , (3.104)
where xk+1
GS is the approximation of the solution xρ suggested by the Gauss–Seidel
iteration.
A necessary condition for convergence is σ ∈ [0, 2]. For σ = 1, the Gauss–
Seidel method is recovered. When σ < 1 the method is under-relaxed, whereas it is
over-relaxed if σ > 1. The optimal value of σ depends on A, but over-relaxation is
usually preferred, where the displacements suggested by the Gauss–Seidel method
are increased. The convergence of the Gauss–Seidel method may thus be accelerated
by extrapolating on iteration results. Methods are available to adapt σ based on past

behavior. They have largely lost their interest with the advent of Krylov subspace
iteration, however.
3.7.2 Krylov Subspace Iteration
Krylov subspace iteration [16, 17] has superseded classical iterative approaches,
which may turn out to be very slow or even fail to converge. It was dubbed in [18]
one of the ten algorithms with the greatest inﬂuence on the development and practice
of science and engineering in the twentieth century.
3.7.2.1 From Jacobi to Krylov
Jacobi iteration has
xk+1
= −D−1
(L + U)xk
+ D−1
b. (3.105)
Equation (3.94) implies that L + U = A − D, so
xk+1
= (I − D−1
A)xk
+ D−1
b. (3.106)
Since the true solution xρ = A−1b is unknown, the error
∂xk
= xk
− xρ
(3.107)
cannot be computed, and the residual
rk
= b − Axk
= −A(xk
− xρ
) = −A∂xk
(3.108)
is used instead to characterize the quality of the approximate solution obtained so
far. Normalize the system of equations to be solved to ensure that D = I. Then
xk+1
= (I − A)xk
+ b
= xk
+ rk
. (3.109)
Subtract xρ from both sides of (3.109), and left multiply the result by −A to get
rk+1
= rk
− Ark
. (3.110)
The recursion (3.110) implies that
rk
∈ span{r0
, Ar0
, . . . , Ak
r0
}, (3.111)

and (3.109) then implies that
xk
− x0
=
k−1
i=0
ri
. (3.112)
Therefore,
xk
∈ x0
+ span{r0
, Ar0
, . . . , Ak−1
r0
}, (3.113)
where span{r0, Ar0, . . . , Ak−1r0} is the kth Krylov subspace generated by A from
r0, denoted by Kk(A, r0).
Remark 3.17 The definition of Krylov subspaces implies that
Kk−1(A, r0
) ⊂ Kk(A, r0
), (3.114)
and that each iteration increases the dimension of search space at most by one.
Assume, for instance, that x0 = 0, which implies that r0 = b, and that b is an
eigenvector of A such that
Ab = αb. (3.115)
Then
→k 1, span{r0
, Ar0
, . . . , Ak−1
r0
} = span{b}, (3.116)
This is appropriate, as the solution is x = α−1b.
Remark 3.18 Let Pn(α) be the characteristic polynomial of A,
Pn(α) = det(A − αIn). (3.117)
The Cayley-Hamilton theorem states that Pn(A) is the zero (n × n) matrix. In other
words, An is a linear combination of An−1, An−2, . . . , In, so
→k n, Kk(A, r0
) = Kn(A, r0
), (3.118)
and the dimension of the space in which search takes place does not increase after
the first n iterations.
A crucial point, not proved here, is that there exists ν n such that
xρ
∈ x0
+ Kν(A, r0
). (3.119)
In principle, one may thus hope to get the solution in no more than n = dim x
iterations in Krylov subspaces, whereas for Jacobi, Gauss–Seidel or SOR iterations
no such bound is available. In practice, with floating-point computations, one may
still get better results by iterating until the solution is deemed satisfactory.

3.7.2.2 A Is Symmetric Positive Deﬁnite
When A ∇ 0, conjugate-gradient methods [19–21] are the iterative approach of
choice to this day. The approximate solution is sought for by minimizing
J(x) =
1
2
xT
Ax − bT
x. (3.120)
Using theoretical optimality conditions presented in Sect.9.1, it is easy to show that
the unique minimizer of this cost function is indeed ⎦x = A−1b. Starting from xk,
the approximation of xρ at iteration k, xk+1 is computed by line search along some
direction dk as
xk+1
(αk) = xk
+ αkdk
. (3.121)
It is again easy to show that J(xk+1(αk)) is minimum if
αk =
(dk)T(b − Axk
)
(dk)TAdk
. (3.122)
The search direction dk is taken so as to ensure that
(di
)T
Adk
= 0, i = 0, . . . , k − 1, (3.123)
which means that it is conjugate with respect to A (or A-orthogonal) with all the
previous search directions. With exact computation, this would ensure convergence
to⎦x in at most n iterations. Because of the effect of rounding errors, it may be useful
to allow more than n iterations, although n may be so large that n iterations is actually
more than can be achieved. (One often gets a useful approximation of the solution
in less than n iterations.)
After n iterations,
xn
= x0
+
n−1
i=0
αi di
, (3.124)
so
xn
∈ x0
+ span{d0
, . . . , dn−1
}. (3.125)
A Krylov-space solver is obtained if the search directions are such that
span{d0
, . . . , di
} = Ki+1(A, r0
), i = 0, 1, . . . (3.126)
This can be achieved with an amazingly simple algorithm [19, 21], summarized in
Table3.1. See also Sect.9.3.4.6 and Example 9.8.
Remark 3.19 The notation := in Table3.1 means that the variable on the left-hand
sign is assigned the value resulting of the evaluation of the expression on the

Table 3.1 Krylov-space
solver
r0 := b − Ax0,
d0 := r0,
∂0 := √r0√2
2,
k := 0.
While ||rk||2 > tol, compute
∂∞
k := (dk)TAdk,
αk := ∂k/∂∞
k,
xk+1 := xk + αkdk,
rk+1 := rk − αkAdk,
∂k+1 := √rk+1√2
2,
βk := ∂k+1/∂k,
dk+1 := rk+1 + βkdk,
k := k + 1.
right-hand side. It should not be confused with the equal sign, and one may write
k := k + 1 whereas k = k + 1 would make no sense. In MATLAB and a number of
other programming languages, however, the sign = is used instead of :=.
3.7.2.3 A Is Not Symmetric Positive Definite
This is a much more complicated and costly situation. Specific methods, not detailed
here, have been developed for symmetric matrices that are not positive definite [22],
as well as for nonsymmetric matrices [23, 24].
3.7.2.4 Preconditioning
The convergence speed of Krylov iteration strongly depends on the condition number
of A. Spectacular acceleration may be achieved by replacing (3.1) by
MAx = Mb, (3.127)
where M is a suitably chosen preconditioning matrix, and a considerable amount of
research has been devoted to this topic [25, 26]. As a result, modern preconditioned
Krylov methods converge must faster and for a much wider class of matrices than
the classical iterative methods of Sect.3.7.1.
One possible approach for choosing M is to look for a sparse approximation of
the inverse of A by solving
⎦M = arg min
M∈S
√In − AM√F, (3.128)

where √ · √F is the Frobenius norm and S is a set of sparse matrices to be specified.
Since
√In − AM√2
F =
n
j=1
√ej
− Amj
√2
2, (3.129)
where ej is the jth column of In and mj the jth column of M, computing M can be
split into solving n independent least-squares problems (one per column), subject to
sparsity constraints. The nonzero entries of mj are then obtained by solving a small
unconstrained linear least-squares problem (see Sect.9.2). The computation of the
columns of ⎦M is thus easily parallelized. The main difficulty is a proper choice for S,
which may be carried out by adaptive strategies [27]. One may start with M diagonal,
or with the same sparsity pattern as A.
Remark 3.20 Preconditioning may also be used with direct methods.
3.8 Taking Advantage of the Structure of A
This section describes important special cases where the structure of A suggests
dedicated algorithms, as in Sect.3.7.2.2.
3.8.1 A Is Symmetric Positive Definite
When A is real, symmetric and positive definite, i.e.,
vT
Av > 0 →v ⇒= 0, (3.130)
its LU factorization is particularly easy as there is a unique lower triangular matrix
L such that
A = LLT
, (3.131)
with lk,k > 0 for all k (lk,k is no longer taken equal to 1). Thus U = LT, and we
could just as well write
A = UT
U. (3.132)
This factorization, known as Cholesky factorization [28], is readily obtained by iden-
tifying the two sides of (3.131). No pivoting is ever necessary, because the entries of
L must satisfy
k
i=1
l2
i,k = ak,k, k = 1, . . . , n, (3.133)

3.8 Taking Advantage of the Structure of A 43
and are therefore bounded. As Cholesky factorization fails if A is not positive definite,
it canalsobeusedtotest symmetricmatrices for positivedefiniteness, whichis prefer-
able to computing the eigenvalues of A. In MATLAB, one may use U=chol(A) or
L=chol(A,’lower’).
When A is also large and sparse, see Sect.3.7.2.2.
3.8.2 A Is Toeplitz
When all the entries in any given descending diagonal of A have the same value, i.e.,
A =

⎤
⎤
⎤
⎤
⎤
⎤
⎡
h0 h−1 h−2 · · · h−n+1
h1 h0 h−1 h−n+2
...
...
...
...
...
hn−2
... h0 h−1
hn−1 hn−2 · · · h1 h0
⎢
⎥
⎥
⎥
⎥
⎥
⎥
⎣
, (3.134)
as in deconvolution problems, A is Toeplitz. The Levinson–Durbin algorithm (not
presented here) can then be used to get solutions that are recursive on the dimension
m of the solution vector xm, with xm expressed as a function of xm−1.
3.8.3 A Is Vandermonde
When
A =

⎤
⎤
⎤
⎤
⎤
⎤
⎤
⎡
1 t1 t2
1 · · · tn
1
1 t2 t2
2 · · · tn
2
...
...
...
...
...
...
...
...
...
...
1 tn+1 t2
n+1 · · · tn
n+1
⎢
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎣
, (3.135)
it is said to be Vandermonde. Such matrices, encountered for instance in polynomial
interpolation, are ill-conditioned for large n, which calls for numerically robust meth-
ods or a reformulation of the problem to avoid Vandermonde matrices altogether.
3.8.4 A Is Sparse
A is sparse when most of its entries are zeros. This is particularly frequent when a
partial differential equation is discretized, as each node is influenced only by its close
neighbors. Instead of storing the entire matrix A, one may then use more economical

Ric walter (auth.) numerical methods and optimization a consumer guide-springer international publishing (2014)

Ric walter (auth.) numerical methods and optimization a consumer guide-springer international publishing (2014)

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (8)

Similar to Ric walter (auth.) numerical methods and optimization a consumer guide-springer international publishing (2014)

Similar to Ric walter (auth.) numerical methods and optimization a consumer guide-springer international publishing (2014) (20)

Recently uploaded

Recently uploaded (20)

Ric walter (auth.) numerical methods and optimization a consumer guide-springer international publishing (2014)