1. Normal equations for linear regression?
No! Please no!
Hamed Zakerzadeh
Hamed Zakerzadeh Normal equations 1 / 3
2. Linear regression
Find the best parameters β describing the linear relation
between n variables x1, ⋯, xn (columns of X ∈ Rm×n
) and y,
having m data points, that is, to minimize ε:
y = Xβ + ε
How to solve minβ ∥Xβ − y∥?
using normal equation
Apply first-order optimality condition:
(Xt
X) β = Xt
y
solve n × n full symmetric system
using Cholesky factorization
using QR decomposition
X = Q R the product of orthogonal Q ∈ Rm×m
and upper-triangular R ∈ Rm×n
min
β
∥Xβ − y∥ = min
β
∥R β − Qt
y∥
solve simple n × n upper-triangular
system using backward substitution
If m ≫ n Ô⇒ NE method is 2x faster (flops: O(n3
+ mn2
) vs O(2mn2
) for QR)
Hamed Zakerzadeh Normal equations 2 / 3
3. Achilles heel: numerical stability
Information may be lost forming Xt
X: [
1 1
0
] Ô⇒ [
1 + 2
1
1 1
]
Condition number being denoted by κ, forward error bound is proportional to
κ(X)2
for NE method while it was only κ(X) for the original least squares problem.
QR method is always backward stable while NE method is guaranteed to be
backward stable only if X is well-conditioned.
The last word
NE method is simple for teaching machine learning and, sometimes, useful in practice.
But be aware of its disadvantages!
“Although numerical analysts almost invariably
solve the full rank LS problem by QR factor-
ization, statisticians frequently use the normal
equations (though perhaps less frequently than
they used to, thanks to the influence of numer-
ical analysts).”
Hamed Zakerzadeh Normal equations 3 / 3