Normal equations for linear regression?

•

0 likes•106 views

Hamed Zakerzadeh

I explain, in simple language, why normal equations should not be used for the linear regression!

Science

Normal equations for linear regression?
No! Please no!
Hamed Zakerzadeh
Hamed Zakerzadeh Normal equations 1 / 3

Linear regression
Find the best parameters β describing the linear relation
between n variables x1, ⋯, xn (columns of X ∈ Rm×n
) and y,
having m data points, that is, to minimize ε:
y = Xβ + ε
How to solve minβ ∥Xβ − y∥?
using normal equation
Apply first-order optimality condition:
(Xt
X) β = Xt
y
solve n × n full symmetric system
using Cholesky factorization
using QR decomposition
X = Q R the product of orthogonal Q ∈ Rm×m
and upper-triangular R ∈ Rm×n
min
β
∥Xβ − y∥ = min
β
∥R β − Qt
y∥
solve simple n × n upper-triangular
system using backward substitution
If m ≫ n Ô⇒ NE method is 2x faster (flops: O(n3
+ mn2
) vs O(2mn2
) for QR)
Hamed Zakerzadeh Normal equations 2 / 3

Achilles heel: numerical stability
Information may be lost forming Xt
X: [
1 1
0
] Ô⇒ [
1 + 2
1
1 1
]
Condition number being denoted by κ, forward error bound is proportional to
κ(X)2
for NE method while it was only κ(X) for the original least squares problem.
QR method is always backward stable while NE method is guaranteed to be
backward stable only if X is well-conditioned.
The last word
NE method is simple for teaching machine learning and, sometimes, useful in practice.
But be aware of its disadvantages!
“Although numerical analysts almost invariably
solve the full rank LS problem by QR factor-
ization, statisticians frequently use the normal
equations (though perhaps less frequently than
they used to, thanks to the influence of numer-
ical analysts).”
Hamed Zakerzadeh Normal equations 3 / 3

What's hot

Proofs nearest rankfithisux

Friedrichs1958staros11

Theorem-proving Verification of Multi-clock Synchronous Circuits on Multimoda...Shunji Nishimura

Basic terminology description in convex optimizationVARUN KUMAR

Composition Of Functions & Difference Quotientcpirie0607

Fixedpointuis

Cycloidal pendulumeli priyatna laidan

Zero. Probabilystic Foundation of Theoretyical PhysicsGunn Quznetsov

Tensor analysisUniversity of Education

Newton Raphson MethodTayyaba Abbas

Help, we have no more time for testing! (gotoCon Berlin 2013)Dr. Alexander Schwartz

L4 one sided limits limits at infinityJames Tagara

Line integralsTarun Gehlot

Application of integral calculusHabibur Rahman

Application of Derivative 1Lakshmikanta Satapathy

Change of order in integrationShubham Sojitra

What's hot (16)

Proofs nearest rank

Friedrichs1958

Theorem-proving Verification of Multi-clock Synchronous Circuits on Multimoda...

Basic terminology description in convex optimization

Composition Of Functions & Difference Quotient

Fixedpoint

Cycloidal pendulum

Zero. Probabilystic Foundation of Theoretyical Physics

Tensor analysis

Newton Raphson Method

Help, we have no more time for testing! (gotoCon Berlin 2013)

L4 one sided limits limits at infinity

Line integrals

Application of integral calculus

Application of Derivative 1

Change of order in integration

Similar to Normal equations for linear regression?

Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa

06_AJMS_256_20-compressed.pdfBRNSS Publication Hub

Solving High-order Non-linear Partial Differential Equations by Modified q-Ho...BRNSS Publication Hub

Ch07 5Rendy Robert

Metodo gauss_newton.pdfMarceloAlejandroPala

QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...The Statistical and Applied Mathematical Sciences Institute

MASSS_Presentation_20160209Yimin Wu

Non-negative Matrix FactorizationAkankshaAgrawal55

Recurrence equationsTarun Gehlot

Es272 ch5aBatuhan Yıldırım

The low-rank basis problem for a matrix subspaceTasuku Soma

QuadratureLinh Tran

Lecture 3 - Linear RegressionHarsha Vardhan Tetali

Appendex bswavicky

Concentration inequality in Machine LearningVARUN KUMAR

2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...The Statistical and Applied Mathematical Sciences Institute

Chapter 3 solving systems of linear equationsssuser53ee01

The electromagnetic fieldGabriel O'Brien

ch3.pptNajlaAlThuniyan1

overviewPCAEdwin Heredia

Similar to Normal equations for linear regression? (20)

Random Matrix Theory and Machine Learning - Part 3

06_AJMS_256_20-compressed.pdf

Solving High-order Non-linear Partial Differential Equations by Modified q-Ho...

Ch07 5

Metodo gauss_newton.pdf

QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...

MASSS_Presentation_20160209

Non-negative Matrix Factorization

Recurrence equations

Es272 ch5a

The low-rank basis problem for a matrix subspace

Quadrature

Lecture 3 - Linear Regression

Appendex b

Concentration inequality in Machine Learning

2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...

Chapter 3 solving systems of linear equations

The electromagnetic field

ch3.ppt

overviewPCA

Recently uploaded

Formation of low mass protostars and their circumstellar disksSérgio Sacani

Isotopic evidence of long-lived volcanism on IoSérgio Sacani

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora

Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls

Natural Polymer Based NanomaterialsAArockiyaNisha

Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi

Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani

Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju

Zoology 4th semester series (krishna).pdfSumit Kumar yadav

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk

CELL -Structural and Functional unit of life.pdfNistarini College, Purulia (W.B) India

Recently uploaded (20)

Formation of low mass protostars and their circumstellar disks

Isotopic evidence of long-lived volcanism on Io

Recombination DNA Technology (Nucleic Acid Hybridization )

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency

Physiochemical properties of nanomaterials and its nanotoxicity.pptx

Presentation Vikram Lander by Vedansh Gupta.pptx

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR

Natural Polymer Based Nanomaterials

Hubble Asteroid Hunter III. Physical properties of newly found asteroids

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.

Botany 4th semester file By Sumit Kumar yadav.pdf

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b

Recombinant DNA technology (Immunological screening)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf

Zoology 4th semester series (krishna).pdf

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx

CELL -Structural and Functional unit of life.pdf

Normal equations for linear regression?

1. Normal equations for linear regression? No! Please no! Hamed Zakerzadeh Hamed Zakerzadeh Normal equations 1 / 3

2. Linear regression Find the best parameters β describing the linear relation between n variables x1, ⋯, xn (columns of X ∈ Rm×n ) and y, having m data points, that is, to minimize ε: y = Xβ + ε How to solve minβ ∥Xβ − y∥? using normal equation Apply first-order optimality condition: (Xt X) β = Xt y solve n × n full symmetric system using Cholesky factorization using QR decomposition X = Q R the product of orthogonal Q ∈ Rm×m and upper-triangular R ∈ Rm×n min β ∥Xβ − y∥ = min β ∥R β − Qt y∥ solve simple n × n upper-triangular system using backward substitution If m ≫ n Ô⇒ NE method is 2x faster (flops: O(n3 + mn2 ) vs O(2mn2 ) for QR) Hamed Zakerzadeh Normal equations 2 / 3

3. Achilles heel: numerical stability Information may be lost forming Xt X: [ 1 1 0 ] Ô⇒ [ 1 + 2 1 1 1 ] Condition number being denoted by κ, forward error bound is proportional to κ(X)2 for NE method while it was only κ(X) for the original least squares problem. QR method is always backward stable while NE method is guaranteed to be backward stable only if X is well-conditioned. The last word NE method is simple for teaching machine learning and, sometimes, useful in practice. But be aware of its disadvantages! “Although numerical analysts almost invariably solve the full rank LS problem by QR factorization, statisticians frequently use the normal equations (though perhaps less frequently than they used to, thanks to the influence of numerical analysts).” Hamed Zakerzadeh Normal equations 3 / 3

Normal equations for linear regression?

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to Normal equations for linear regression?

Similar to Normal equations for linear regression? (20)

More from Hamed Zakerzadeh

More from Hamed Zakerzadeh (7)

Recently uploaded

Recently uploaded (20)

Normal equations for linear regression?