Slides econ-lm

Arthur Charpentier, Econometrics & Machine Learning - 2017
Econometrics & Machine Learning
A. Charpentier, E. Flachaire & A. Ly
https://arxiv.org/abs/1708.06992
@freakonometrics freakonometrics freakonometrics.hypotheses.org 1

Probabilistic Foundations of Econometrics
see Haavelmo (1944)
see Duo (1997)

Probabilistic Foundations of Econometrics
There is a probabilistic space (Ω, F, P) such that data {yi, xi} can be seen as
realizations of random variables {Yi, Xi}.
Consider a conditional distribution for Y |X, e.g.
(Y |X = x)
L
∼ N(µ(x), σ2
) where µ(x) = β0 + xT
β, and β ∈ Rp
.
for the linear model, with some extension when it is in the exponential family
(GLM).
Then use maximum likelihood for inference, to derive conﬁdence interval
(quantiﬁcation of uncertainty), etc.
Importance of unbiased estimators (Cramer Rao lower bound for the variance).

Loss Functions
Gneiting (2009) in a statistical context: a statistics T is said to be ellicitable if
there is : R × R → R+ such that
T(Y ) = argmin
x∈R R
l(x, y)dF(y) = argmin
x∈R
E l(x, Y ) where Y
L
∼ F
(e.g. T(x) = x and = 2, or T(x) = median(x) and = 1).
In machine-learning, we want to solve
m (x) = argmin
m∈M
n
i=1
(yi, m(x))
using optimization techniques, in a not-too-complex space M.

Overfit ?
Underfit: true yi = β0 + xT
1 β1 + xT
2 β2 + εi vs. fitted model yi = b0 + xT
1 b1 + ηi.
b1 = (XT
1 X1)−1
XT
1 y = β1 + (X1X1)−1
XT
1 X2β2
β12
+ (XT
1 X1)−1
XT
1 ε
νi
i.e. E[b1] = β1 + β12 = β1, unless X1 ⊥ X2 (see Frish-Waugh Theorem).
Overfit: true yi = β0 + xT
1 β1 + εi vs. fitted model yi = b0 + xT
1 b1 + xT
2 b2 + ηi.
In that case E[b1] = β1 but no-longer efficient.

Occam’s Razor and Parsimony
Importance of penalty in order to avoid a too complex model (overﬁt).
Consider some linear predictor, y = Hy where H = X(XT
X)−1
XT
y, or more
generally y = Sy for some smoothing matrix S.
In econometrics, consider some ex-post penalty for model selection,
AIC = −2 log L + 2p = deviance + 2p
where p is the dimension (i.e. trace(S) in a more general setting).
In machine-learning, penalty is added in the objective function, see Ridge or
LASSO regression
(β0,λ, βλ) = argmin
(β0,β)
n
i=1
(yi, β0 + xT
i β) + λ β
Goodhart’s law, “when a measure becomes a target, it ceases to be a good
measure”

Boosting, or learning from previous errors
Construct a sequence of models
m(k)
(x) = m(k−1)
(x) + α · f (x)
where
f = argmin
f∈W
n
i=1
(yi − m(k−1)
(xi), f(xi))
for some set of weak learner W.
Problem: where to stop to avoid overﬁt...
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
0 2 4 6 8 10
−2−101

Slides econ-lm

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Slides econ-lm

Similar to Slides econ-lm (20)

More from Arthur Charpentier

More from Arthur Charpentier (20)

Recently uploaded

Recently uploaded (20)

Slides econ-lm