SlideShare a Scribd company logo
Arthur Charpentier, SIDE Summer School, July 2019
# 9 Updates & Missing Values
Arthur Charpentier (Universit´e du Qu´ebec `a Montr´eal)
Machine Learning & Econometrics
SIDE Summer School - July 2019
@freakonometrics freakonometrics freakonometrics.hypotheses.org 1
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, Practical Issues
Two important practical issue :
• what if we cannot access the entire dataset ?
• what if there is an update ? (new observation or new variable)
Consider the case where datasets are located on various
servers, and cannot be downloaded (e.g. hospitals), but
one can run functions and obtain outputs.
see Wolfson et al. (2010, Data Shield)
or http://www.datashield.ac.uk/
Consider a regression model y = Xβ + ε
@freakonometrics freakonometrics freakonometrics.hypotheses.org 2
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, Practical Issues
Use the QR decomposition of X, X = QR where Q is an orthogonal matrix
QT
Q = I. Then
β = [XT
X]−1
XT
y = R−1
QT
y
Consider m blocks - map part
y =








y1
y2
...
ym








and X =








X1
X2
...
Xm








=








Q
(1)
1 R
(1)
1
Q
(1)
2 R
(1)
2
...
Q(1)
m R(1)
m








@freakonometrics freakonometrics freakonometrics.hypotheses.org 3
Arthur Charpentier, SIDE Summer School, July 2019
Machine Learning, Practical Issues
Consider the QR decomposition of R(1)
- step 1 of reduce part
R(1)
=








R1
R2
...
Rm








= Q(2)
R(2)
where Q(2)
=








Q
(2)
1
Q
(2)
2
...
Q(2)
m








define - step 2 of reduce part
Q
(3)
j = Q
(2)
j Q
(1)
j and V j = Q
(3)
j
T
yj
and finally set - step 3 of reduce part
β = [R(2)
]−1
m
j=1
V j
@freakonometrics freakonometrics freakonometrics.hypotheses.org 4
Arthur Charpentier, SIDE Summer School, July 2019
Online Learning
Let Tn = {(y1, x1), · · · , (yn, xn)} denote the training dataset, with y ∈ Y.
Learning
A learning algorithm is a map A : Tn → Y
Online Learning
A pure online learning algorithm is a sequence of recursive algorithms
(i) m0 is the initialization
(ii) for k = 1, 2 · · · , mk = A(mk−1, (yn, xn))
Recall that the risk is R(m) = E (Y, mX)
As in gradient boosting, consider some approximation of the gradient of R(m),
mk = mk−1 + γkG(mk−1, (yn, xn))
@freakonometrics freakonometrics freakonometrics.hypotheses.org 5
Arthur Charpentier, SIDE Summer School, July 2019
• Update with a new observation, as Ridell (1975, Recursive Estimation Algorithms
for Economic Research)
Let X1:n denote the matrix of covariates, with n observations (rows), and xn+1
denote a new one. Recall that
βn = [X1:n
T
X1:n]−1
X1:n
T
y1:n = C−1
n X1:n
T
y1:n
Since Cn+1 = X1:n+1X1:n+1 = Cn + xn+1xn+1, then
βn+1 = βn + C−1
n+1xn+1[yn+1 − xn+1βn]
This updating formation is also called a differential correction, since it is
proportional to the prediction error.
Note that the residual sum of squares can also be updated, with
Sn+1 = Sn +
1
d
[yn+1 − xn+1
T
βn]2
@freakonometrics freakonometrics freakonometrics.hypotheses.org 6
Arthur Charpentier, SIDE Summer School, July 2019
Online Learning
Online Learning for OLS
βn+1 = βn + C−1
n+1xn+1[yn+1 − xn+1βn]
is a recursive formula, requires storing all the data
(and inverting a matrix at each step).
Good news, [A + BCD]−1
= A−1
− A−1
B DA−1
B + C−1 −1
DA−1
, so
C−1
n+1 = C−1
n −
C−1
n xn+1xn+1C−1
n
1 + xn+1C−1
n xn+1
We have an algorithm of the form for k = 1, 2 · · · , mk = A(mk−1, (yn, Cn, xn))
for some matrix Cn
@freakonometrics freakonometrics freakonometrics.hypotheses.org 7
Arthur Charpentier, SIDE Summer School, July 2019
Online Learning
Online Learning for OLS
βn+1 = βn + C−1
n+1xn+1[yn+1 − xn+1βn]
is also a gradient-type algorithm, since
yn+1 − xn+1β
2
= 2xn+1[yn+1 − xn+1β]
One might consider using γn+1 ∈ R instead of Cn+1 (p × p matrix)
Polyak-Ruppert Averaging suggests to use γn = n−α
where α ∈ (1/2, 1) to ensure
convergence
@freakonometrics freakonometrics freakonometrics.hypotheses.org 8
Arthur Charpentier, SIDE Summer School, July 2019
Update Formulas
• Update with a new variable
Let X1:k denote the matrix of covariates, with k explanatory variables
(columns), and xk+1 denote a new one. Recall that
βk = [X1:k
T
X1:k]−1
X1:k
T
y
Then βk+1 = (βk , βk+1)T
where
βk = βk −
[X1:k
T
X1:k]−1
X1:k
T
xk+1xk+1P⊥
k y
xk+1
TP⊥
k xk+1
with P⊥
k = I − X1:k(X1:k
T
X1:k)−1
X1:k
T
, while
βk+1 =
xk+1
T
P⊥
k y
xk+1
TP⊥
k xk+1
If xk+1 is orthogonal to previous variables - X1:k
T
xk+1 = 0, then βk = βk.
Observe that P⊥
k y = εk.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 9
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values
“There are two kinds of model in the world : those who can extrapolate from incomplete data...”
From Tropical Atmosphere Ocean (TAO) dataset, see VIM::tao
@freakonometrics freakonometrics freakonometrics.hypotheses.org 10
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values
With lm function, rows with missing values (in y or x) are deleted
To deal with them, one should understand the mechanism leading to missing values
Expectation - Maximization, see Dempster et al. (1977, Maximum Likelihood from Incomplete
Data via the EM Algorithm)
Consider a mixture model dF(y) = p1dFθ1 (y) + p2dFθ2 (y), i.e. there is Θ ∈ {1, 2} (with
pj = P[Θ = j]) such that
yi =
y1,i with Y1 ∼ Fθ1 , if Θ = 1
y2,i with Y2 ∼ Fθ2 , if Θ = 2
see mixtools::normalmixEM for Gaussian mixtures
@freakonometrics freakonometrics freakonometrics.hypotheses.org 11
Arthur Charpentier, SIDE Summer School, July 2019
Observable and Non-Obsevable Heterogeneity
Mixture distribution (with two classes) :
• if θ = A, Y ∼ N(µA, σ2
A)
• if θ = B, Y ∼ N(µB, σ2
B)
f(y) = pAfA(y) + pBfB(y)
5 parameters to estimate,
no interpretation of the mixture parameter θ
Height (in cm)
Density
150 160 170 180 190 200
0.000.010.020.030.040.05
@freakonometrics freakonometrics freakonometrics.hypotheses.org 12
Arthur Charpentier, SIDE Summer School, July 2019
Observable and Non-Observable Heterogeneity
One categorical variable (e.g. gender)
• if gender=M, Y ∼ N(µM , σ2
M )
• if gender=F, Y ∼ N(µF , σ2
F )
f(y) = pM fM (y) + pF fF (y)
4 parameters to estimate,
(pM and pF are known)
clear interpretation of the mixture parameter
Height (in cm)
Density
150 160 170 180 190 200
0.000.010.020.030.040.05
@freakonometrics freakonometrics freakonometrics.hypotheses.org 13
Arthur Charpentier, SIDE Summer School, July 2019
Expectation - Maximization
EM for Mixtures
(i) start with initial values θ1,0 and θ2,0, pj,0
(ii) for k = 1, 2, · · ·
E step : γk,j,i =
dF
θj,k−1
(yi)
p1,k−1dF
θ1,k−1
(yi) + p2,k−1dF
θ2,k−1
(yi)
M step : use ML techniques with weights γk,j,i
M step with a Gaussian mixture, µj,k =
γk,j,iyi
γk,j,i
and σ2
j,k =
γk,j,i[yi − µj,k]2
γk,j,i
@freakonometrics freakonometrics freakonometrics.hypotheses.org 14
Arthur Charpentier, SIDE Summer School, July 2019
Expectation - Maximization
Expectation - Maximization
E step expectation : compute Q(θ, θk
) = E log f(Y |θ)|yobs, θk
M step maximization : θk+1 = argmax
θ
Q(θ, θk
)
Stochastic EM (for Mixtures)
(i) start with initial values θ1,0 and θ2,0, pj,0
(ii) for k = 1, 2, · · ·
E step : γk,j,i =
dF
θj,k−1
(yi)
p1,k−1dF
θ1,k−1
(yi) + p2,k−1dF
θ2,k−1
(yi)
S step : generate ξk,i in {1, 2} with probabilities γk,1,i and γk,2,i
M step : compute ML estimate θk,j on sample {yi : ξk,i = j}
@freakonometrics freakonometrics freakonometrics.hypotheses.org 15
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Single Imputation
Classical idea : Principal Component Analysis (PCA)
Approximate n × p matrix X with a lower rank matrix,
Xs = argmin
Y , rank(Y )≤s
X − Y 2
2 = UsΛ
1/2
s V s
(using Singular Value Decomposition)
One can consider PCA with missing values, based on weighted least squares
Xs = argmin
Y , rank(Y )≤s
W (X − Y ) 2
2
where W is the n × p matrix with 1’s, and Wi,j = 0 if xi,j is missing, see Gabriel & Zamir
(1979, Lower rank approximation of matrices by least squares with any choice of weights) or Kiers
(1997, Weighted least squares fitting using ordinary least squares algorithms)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 16
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Single Imputation
Iterative PCA
(i) if xi,j is missing, Wi,j = 0,
x1
i,j = Wi,j · x0
i,j + (1 − Wi,j) · 0
(ii) for k = 1, 2, · · ·
• Xs = argmin
Y , rank(Y )≤s
W (X − Y ) 2
2
• xk+1
i,j = Wi,j · xk
i,j + (1 − Wi,j) · xi,j
q
q
q
q
q
q
q
q
−0.5 0.0 0.5 1.0 1.5
−0.50.00.51.01.5
Connections with fixed effects model, xi,j =
s
k=1
fi,kuj,k + εi,j with εi,j ∼ N(0, σ2)
and random effects model, xi = Γzi + εi with εi ∼ N(0, σ2I) and zi ∼ N(0, I)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 17
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Single Imputation
Iterative PCA
(i) if xi,j is missing, Wi,j = 0,
x1
i,j = Wi,j · x0
i,j + (1 − Wi,j) · 0
(ii) for k = 1, 2, · · ·
• Xs = argmin
Y , rank(Y )≤s
W (X − Y ) 2
2
• xk+1
i,j = Wi,j · xk
i,j + (1 − Wi,j) · xi,j
q
q
q
q
q
q
q
q
−0.5 0.0 0.5 1.0 1.5
−0.50.00.51.01.5
q
q
Connections with fixed effects model, xi,j =
s
k=1
fi,kuj,k + εi,j with εi,j ∼ N(0, σ2)
and random effects model, xi = Γzi + εi with εi ∼ N(0, σ2I) and zi ∼ N(0, I)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 18
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Single Imputation
Iterative PCA
(i) if xi,j is missing, Wi,j = 0,
x1
i,j = Wi,j · x0
i,j + (1 − Wi,j) · 0
(ii) for k = 1, 2, · · ·
• Xs = argmin
Y , rank(Y )≤s
W (X − Y ) 2
2
• xk+1
i,j = Wi,j · xk
i,j + (1 − Wi,j) · xi,j
q
q
q
q
q
q
q
q
−0.5 0.0 0.5 1.0 1.5
−0.50.00.51.01.5
q
q
Connections with fixed effects model, xi,j =
s
k=1
fi,kuj,k + εi,j with εi,j ∼ N(0, σ2)
and random effects model, xi = Γzi + εi with εi ∼ N(0, σ2I) and zi ∼ N(0, I)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 19
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Single Imputation
Iterative PCA
(i) if xi,j is missing, Wi,j = 0,
x1
i,j = Wi,j · x0
i,j + (1 − Wi,j) · 0
(ii) for k = 1, 2, · · ·
• Xs = argmin
Y , rank(Y )≤s
W (X − Y ) 2
2
• xk+1
i,j = Wi,j · xk
i,j + (1 − Wi,j) · xi,j q
q
q
q
q
q
q
q
q
q
−0.5 0.0 0.5 1.0 1.5
−0.50.00.51.01.5
q
q
q
q
Connections with fixed effects model, xi,j =
s
k=1
fi,kuj,k + εi,j with εi,j ∼ N(0, σ2)
and random effects model, xi = Γzi + εi with εi ∼ N(0, σ2I) and zi ∼ N(0, I)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 20
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Single Imputation
Iterative PCA
(i) if xi,j is missing, Wi,j = 0,
x1
i,j = Wi,j · x0
i,j + (1 − Wi,j) · 0
(ii) for k = 1, 2, · · ·
• Xs = argmin
Y , rank(Y )≤s
W (X − Y ) 2
2
• xk+1
i,j = Wi,j · xk
i,j + (1 − Wi,j) · xi,j q
q
q
q
q
q
q
q
q
q
−0.5 0.0 0.5 1.0 1.5
−0.50.00.51.01.5
q
q
Connections with fixed effects model, xi,j =
s
k=1
fi,kuj,k + εi,j with εi,j ∼ N(0, σ2)
and random effects model, xi = Γzi + εi with εi ∼ N(0, σ2I) and zi ∼ N(0, I)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 21
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Single Imputation
The iterative PCA is simply using EM on fixed effects model,
xi,j =
s
i=1
fi,jui,j + εi,j with εi,j ∼ N(0, σ2
)
X
n×p
= F
n×s
U
p×s
Log-likelihood is here
log L(F , u, σ2
) = −
np
2
log 2πσ2
−
1
2σ2
X − F u 2
E step : compute E Xi,j X, F k, Uk, σ2
k (imputation)
M step : maximize the log-likelihood
Uk+1 = Xk F k F k F k
−1
and F k+1 = XkUk Uk Uk
−1
@freakonometrics freakonometrics freakonometrics.hypotheses.org 22
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Single Imputation
One can use regularized iterative PCA. So far, we used (SVD) XsUsΛ
1/2
s V s
Xi,j =
s
k=1
λkUi,kVj,k
Following Efron & Morris (1972, Limiting the Risk of Bayes and Empirical Bayes Estimators)
consider a shrinkage version
Xi,j =
s
k=1
λk − σ2
λk
λkUi,kVj,k =
s
k=1
λk −
σ2
λk
Ui,kVj,k
where σ2
=
n[λs + 1 + · · · + λp]
np − p − ns − ps + s2 + s
See package missMDA
@freakonometrics freakonometrics freakonometrics.hypotheses.org 23
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Single Imputation
One can use soft-thresholding PCA. Following Hastie & Mazumber (2015, Matrix Completion
and Low-Rank SVD)
Xi,j =
s
k=1
λk − λ
+
Ui,kVj,k
solution of
Xs = argmin
Y , rank(Y )≤s
W (X − Y ) 2
2 + λ Y
where the penalty is based on the nuclear norm (sum of the singular values).
Complicated to select λ...
See package softImpute
@freakonometrics freakonometrics freakonometrics.hypotheses.org 24
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Single Imputation
One can also use k-nearest neigbors
with missMDA::imputePCA(y,ncp=1) and VIM::kNN(y,k=5)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 25
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Multiple Imputation
It aims to allow for the uncertainty about the missing data by creating several different
plausible imputed data sets (via Sterne et al. (2009, Multiple imputation for missing data)
Reference, Rubin (2007, Multiple imputation for nonresponse in surveys)
The idea is to generate N possible values for each missing value, see Honaker, King & Blackwell
(2010, Amelia) and library Amelia using boostrap samples or van Buuren (2018, Multivariate
Imputation by Chained Equations) with mice using bootstrap and regression
The idea of imputation is both seductive and dangerous. It is seductive because it can lull the
user into the pleasurable state of believing that the data are complete after all, and it is
dangerous because it lumps together situations where the problem is sufficiently minor that it
can be legitimately handled in this way and situations where standard estimators applied to the
real and imputed data have substantial biases Dempster & Rubin (1983, Incomplete Data in
Sample Surveys)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 26
Arthur Charpentier, SIDE Summer School, July 2019
Missing Values : Gaussian process regression (and krigging)
Extrapolation or interpolation ?
x y
1 y1
2 y2
3 ?
x y
1 y1
2 ?
3 y3



y1
y2
y3


 ∼ N


0,



σ1,1 σ1,2 σ1,3
σ2,1 σ2,2 σ2,3
σ3,1 σ3,2 σ3,3






y
y
∼ N 0,
Σ Σ
Σ Σ
(y |y) ∼ N(µy, Σy) where
µy = Σ Σ−1y
Σy = Σ − Σ Σ−1Σ
see Roberts et al. (2012, Gaussian Processes for Time Series) or Rasmussen & Williams (2006,
Gaussian Processes for Machine Learning)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 27

More Related Content

What's hot

Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2
Arthur Charpentier
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
Arthur Charpentier
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
Arthur Charpentier
 
Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
Arthur Charpentier
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
Arthur Charpentier
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
Arthur Charpentier
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
Arthur Charpentier
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
Arthur Charpentier
 
Side 2019 #5
Side 2019 #5Side 2019 #5
Side 2019 #5
Arthur Charpentier
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
Arthur Charpentier
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
Arthur Charpentier
 
Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1
Arthur Charpentier
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive Modeling
Arthur Charpentier
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
Arthur Charpentier
 
Slides ub-7
Slides ub-7Slides ub-7
Slides ub-7
Arthur Charpentier
 
transformations and nonparametric inference
transformations and nonparametric inferencetransformations and nonparametric inference
transformations and nonparametric inference
Arthur Charpentier
 
Machine Learning for Actuaries
Machine Learning for ActuariesMachine Learning for Actuaries
Machine Learning for Actuaries
Arthur Charpentier
 
Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
Arthur Charpentier
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
Arthur Charpentier
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
Arthur Charpentier
 

What's hot (20)

Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
 
Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Side 2019 #5
Side 2019 #5Side 2019 #5
Side 2019 #5
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
 
Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive Modeling
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Slides ub-7
Slides ub-7Slides ub-7
Slides ub-7
 
transformations and nonparametric inference
transformations and nonparametric inferencetransformations and nonparametric inference
transformations and nonparametric inference
 
Machine Learning for Actuaries
Machine Learning for ActuariesMachine Learning for Actuaries
Machine Learning for Actuaries
 
Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
 

Similar to Side 2019 #9

Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
Arthur Charpentier
 
Slides ub-2
Slides ub-2Slides ub-2
Slides ub-2
Arthur Charpentier
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
Arthur Charpentier
 
Slides ub-3
Slides ub-3Slides ub-3
Slides ub-3
Arthur Charpentier
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
SEMINARGROOT
 
Classification
ClassificationClassification
Classification
Arthur Charpentier
 
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
The Statistical and Applied Mathematical Sciences Institute
 
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfStatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
Anna Carbone
 
Ica group 3[1]
Ica group 3[1]Ica group 3[1]
Ica group 3[1]
Apoorva Srinivasan
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
The Statistical and Applied Mathematical Sciences Institute
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Advanced-Concepts-Team
 
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
The Statistical and Applied Mathematical Sciences Institute
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
Umberto Picchini
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
The Statistical and Applied Mathematical Sciences Institute
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
Zheng Mengdi
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
The Statistical and Applied Mathematical Sciences Institute
 
Mathematics and AI
Mathematics and AIMathematics and AI
Mathematics and AI
Marc Lelarge
 
Slides Bank England
Slides Bank EnglandSlides Bank England
Slides Bank England
Arthur Charpentier
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
Alexander Litvinenko
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
Faeco Bot
 

Similar to Side 2019 #9 (20)

Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Slides ub-2
Slides ub-2Slides ub-2
Slides ub-2
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Slides ub-3
Slides ub-3Slides ub-3
Slides ub-3
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
 
Classification
ClassificationClassification
Classification
 
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
 
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfStatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
 
Ica group 3[1]
Ica group 3[1]Ica group 3[1]
Ica group 3[1]
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
 
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Mathematics and AI
Mathematics and AIMathematics and AI
Mathematics and AI
 
Slides Bank England
Slides Bank EnglandSlides Bank England
Slides Bank England
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
 

More from Arthur Charpentier

Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
Arthur Charpentier
 
ACT6100 introduction
ACT6100 introductionACT6100 introduction
ACT6100 introduction
Arthur Charpentier
 
Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
Arthur Charpentier
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
Arthur Charpentier
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
Arthur Charpentier
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
Arthur Charpentier
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
Arthur Charpentier
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
Arthur Charpentier
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
Arthur Charpentier
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
Arthur Charpentier
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
Arthur Charpentier
 
Econ. Seminar Uqam
Econ. Seminar UqamEcon. Seminar Uqam
Econ. Seminar Uqam
Arthur Charpentier
 
Mutualisation et Segmentation
Mutualisation et SegmentationMutualisation et Segmentation
Mutualisation et Segmentation
Arthur Charpentier
 

More from Arthur Charpentier (13)

Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
ACT6100 introduction
ACT6100 introductionACT6100 introduction
ACT6100 introduction
 
Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
 
Econ. Seminar Uqam
Econ. Seminar UqamEcon. Seminar Uqam
Econ. Seminar Uqam
 
Mutualisation et Segmentation
Mutualisation et SegmentationMutualisation et Segmentation
Mutualisation et Segmentation
 

Recently uploaded

how to sell pi coins at high rate quickly.
how to sell pi coins at high rate quickly.how to sell pi coins at high rate quickly.
how to sell pi coins at high rate quickly.
DOT TECH
 
what is the future of Pi Network currency.
what is the future of Pi Network currency.what is the future of Pi Network currency.
what is the future of Pi Network currency.
DOT TECH
 
how can I sell pi coins after successfully completing KYC
how can I sell pi coins after successfully completing KYChow can I sell pi coins after successfully completing KYC
how can I sell pi coins after successfully completing KYC
DOT TECH
 
The European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
GRAPE
 
What website can I sell pi coins securely.
What website can I sell pi coins securely.What website can I sell pi coins securely.
What website can I sell pi coins securely.
DOT TECH
 
Scope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theoriesScope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theories
nomankalyar153
 
how to sell pi coins on Binance exchange
how to sell pi coins on Binance exchangehow to sell pi coins on Binance exchange
how to sell pi coins on Binance exchange
DOT TECH
 
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit CardPoonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
nickysharmasucks
 
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
Quotidiano Piemontese
 
how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.
DOT TECH
 
Intro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptxIntro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptx
shetivia
 
how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.
DOT TECH
 
Proposer Builder Separation Problem in Ethereum
Proposer Builder Separation Problem in EthereumProposer Builder Separation Problem in Ethereum
Proposer Builder Separation Problem in Ethereum
RasoulRamezanian1
 
Economics and Economic reasoning Chap. 1
Economics and Economic reasoning Chap. 1Economics and Economic reasoning Chap. 1
Economics and Economic reasoning Chap. 1
Fitri Safira
 
Summary of financial results for 1Q2024
Summary of financial  results for 1Q2024Summary of financial  results for 1Q2024
Summary of financial results for 1Q2024
InterCars
 
USDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptxUSDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptx
marketing367770
 
US Economic Outlook - Being Decided - M Capital Group August 2021.pdf
US Economic Outlook - Being Decided - M Capital Group August 2021.pdfUS Economic Outlook - Being Decided - M Capital Group August 2021.pdf
US Economic Outlook - Being Decided - M Capital Group August 2021.pdf
pchutichetpong
 
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
Amil Baba Dawood bangali
 
655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf
morearsh02
 
Falcon Invoice Discounting: Optimizing Returns with Minimal Risk
Falcon Invoice Discounting: Optimizing Returns with Minimal RiskFalcon Invoice Discounting: Optimizing Returns with Minimal Risk
Falcon Invoice Discounting: Optimizing Returns with Minimal Risk
Falcon Invoice Discounting
 

Recently uploaded (20)

how to sell pi coins at high rate quickly.
how to sell pi coins at high rate quickly.how to sell pi coins at high rate quickly.
how to sell pi coins at high rate quickly.
 
what is the future of Pi Network currency.
what is the future of Pi Network currency.what is the future of Pi Network currency.
what is the future of Pi Network currency.
 
how can I sell pi coins after successfully completing KYC
how can I sell pi coins after successfully completing KYChow can I sell pi coins after successfully completing KYC
how can I sell pi coins after successfully completing KYC
 
The European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
 
What website can I sell pi coins securely.
What website can I sell pi coins securely.What website can I sell pi coins securely.
What website can I sell pi coins securely.
 
Scope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theoriesScope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theories
 
how to sell pi coins on Binance exchange
how to sell pi coins on Binance exchangehow to sell pi coins on Binance exchange
how to sell pi coins on Binance exchange
 
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit CardPoonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
Poonawalla Fincorp and IndusInd Bank Introduce New Co-Branded Credit Card
 
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
 
how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.
 
Intro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptxIntro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptx
 
how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.
 
Proposer Builder Separation Problem in Ethereum
Proposer Builder Separation Problem in EthereumProposer Builder Separation Problem in Ethereum
Proposer Builder Separation Problem in Ethereum
 
Economics and Economic reasoning Chap. 1
Economics and Economic reasoning Chap. 1Economics and Economic reasoning Chap. 1
Economics and Economic reasoning Chap. 1
 
Summary of financial results for 1Q2024
Summary of financial  results for 1Q2024Summary of financial  results for 1Q2024
Summary of financial results for 1Q2024
 
USDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptxUSDA Loans in California: A Comprehensive Overview.pptx
USDA Loans in California: A Comprehensive Overview.pptx
 
US Economic Outlook - Being Decided - M Capital Group August 2021.pdf
US Economic Outlook - Being Decided - M Capital Group August 2021.pdfUS Economic Outlook - Being Decided - M Capital Group August 2021.pdf
US Economic Outlook - Being Decided - M Capital Group August 2021.pdf
 
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
NO1 Uk Divorce problem uk all amil baba in karachi,lahore,pakistan talaq ka m...
 
655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf
 
Falcon Invoice Discounting: Optimizing Returns with Minimal Risk
Falcon Invoice Discounting: Optimizing Returns with Minimal RiskFalcon Invoice Discounting: Optimizing Returns with Minimal Risk
Falcon Invoice Discounting: Optimizing Returns with Minimal Risk
 

Side 2019 #9

  • 1. Arthur Charpentier, SIDE Summer School, July 2019 # 9 Updates & Missing Values Arthur Charpentier (Universit´e du Qu´ebec `a Montr´eal) Machine Learning & Econometrics SIDE Summer School - July 2019 @freakonometrics freakonometrics freakonometrics.hypotheses.org 1
  • 2. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, Practical Issues Two important practical issue : • what if we cannot access the entire dataset ? • what if there is an update ? (new observation or new variable) Consider the case where datasets are located on various servers, and cannot be downloaded (e.g. hospitals), but one can run functions and obtain outputs. see Wolfson et al. (2010, Data Shield) or http://www.datashield.ac.uk/ Consider a regression model y = Xβ + ε @freakonometrics freakonometrics freakonometrics.hypotheses.org 2
  • 3. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, Practical Issues Use the QR decomposition of X, X = QR where Q is an orthogonal matrix QT Q = I. Then β = [XT X]−1 XT y = R−1 QT y Consider m blocks - map part y =         y1 y2 ... ym         and X =         X1 X2 ... Xm         =         Q (1) 1 R (1) 1 Q (1) 2 R (1) 2 ... Q(1) m R(1) m         @freakonometrics freakonometrics freakonometrics.hypotheses.org 3
  • 4. Arthur Charpentier, SIDE Summer School, July 2019 Machine Learning, Practical Issues Consider the QR decomposition of R(1) - step 1 of reduce part R(1) =         R1 R2 ... Rm         = Q(2) R(2) where Q(2) =         Q (2) 1 Q (2) 2 ... Q(2) m         define - step 2 of reduce part Q (3) j = Q (2) j Q (1) j and V j = Q (3) j T yj and finally set - step 3 of reduce part β = [R(2) ]−1 m j=1 V j @freakonometrics freakonometrics freakonometrics.hypotheses.org 4
  • 5. Arthur Charpentier, SIDE Summer School, July 2019 Online Learning Let Tn = {(y1, x1), · · · , (yn, xn)} denote the training dataset, with y ∈ Y. Learning A learning algorithm is a map A : Tn → Y Online Learning A pure online learning algorithm is a sequence of recursive algorithms (i) m0 is the initialization (ii) for k = 1, 2 · · · , mk = A(mk−1, (yn, xn)) Recall that the risk is R(m) = E (Y, mX) As in gradient boosting, consider some approximation of the gradient of R(m), mk = mk−1 + γkG(mk−1, (yn, xn)) @freakonometrics freakonometrics freakonometrics.hypotheses.org 5
  • 6. Arthur Charpentier, SIDE Summer School, July 2019 • Update with a new observation, as Ridell (1975, Recursive Estimation Algorithms for Economic Research) Let X1:n denote the matrix of covariates, with n observations (rows), and xn+1 denote a new one. Recall that βn = [X1:n T X1:n]−1 X1:n T y1:n = C−1 n X1:n T y1:n Since Cn+1 = X1:n+1X1:n+1 = Cn + xn+1xn+1, then βn+1 = βn + C−1 n+1xn+1[yn+1 − xn+1βn] This updating formation is also called a differential correction, since it is proportional to the prediction error. Note that the residual sum of squares can also be updated, with Sn+1 = Sn + 1 d [yn+1 − xn+1 T βn]2 @freakonometrics freakonometrics freakonometrics.hypotheses.org 6
  • 7. Arthur Charpentier, SIDE Summer School, July 2019 Online Learning Online Learning for OLS βn+1 = βn + C−1 n+1xn+1[yn+1 − xn+1βn] is a recursive formula, requires storing all the data (and inverting a matrix at each step). Good news, [A + BCD]−1 = A−1 − A−1 B DA−1 B + C−1 −1 DA−1 , so C−1 n+1 = C−1 n − C−1 n xn+1xn+1C−1 n 1 + xn+1C−1 n xn+1 We have an algorithm of the form for k = 1, 2 · · · , mk = A(mk−1, (yn, Cn, xn)) for some matrix Cn @freakonometrics freakonometrics freakonometrics.hypotheses.org 7
  • 8. Arthur Charpentier, SIDE Summer School, July 2019 Online Learning Online Learning for OLS βn+1 = βn + C−1 n+1xn+1[yn+1 − xn+1βn] is also a gradient-type algorithm, since yn+1 − xn+1β 2 = 2xn+1[yn+1 − xn+1β] One might consider using γn+1 ∈ R instead of Cn+1 (p × p matrix) Polyak-Ruppert Averaging suggests to use γn = n−α where α ∈ (1/2, 1) to ensure convergence @freakonometrics freakonometrics freakonometrics.hypotheses.org 8
  • 9. Arthur Charpentier, SIDE Summer School, July 2019 Update Formulas • Update with a new variable Let X1:k denote the matrix of covariates, with k explanatory variables (columns), and xk+1 denote a new one. Recall that βk = [X1:k T X1:k]−1 X1:k T y Then βk+1 = (βk , βk+1)T where βk = βk − [X1:k T X1:k]−1 X1:k T xk+1xk+1P⊥ k y xk+1 TP⊥ k xk+1 with P⊥ k = I − X1:k(X1:k T X1:k)−1 X1:k T , while βk+1 = xk+1 T P⊥ k y xk+1 TP⊥ k xk+1 If xk+1 is orthogonal to previous variables - X1:k T xk+1 = 0, then βk = βk. Observe that P⊥ k y = εk. @freakonometrics freakonometrics freakonometrics.hypotheses.org 9
  • 10. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values “There are two kinds of model in the world : those who can extrapolate from incomplete data...” From Tropical Atmosphere Ocean (TAO) dataset, see VIM::tao @freakonometrics freakonometrics freakonometrics.hypotheses.org 10
  • 11. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values With lm function, rows with missing values (in y or x) are deleted To deal with them, one should understand the mechanism leading to missing values Expectation - Maximization, see Dempster et al. (1977, Maximum Likelihood from Incomplete Data via the EM Algorithm) Consider a mixture model dF(y) = p1dFθ1 (y) + p2dFθ2 (y), i.e. there is Θ ∈ {1, 2} (with pj = P[Θ = j]) such that yi = y1,i with Y1 ∼ Fθ1 , if Θ = 1 y2,i with Y2 ∼ Fθ2 , if Θ = 2 see mixtools::normalmixEM for Gaussian mixtures @freakonometrics freakonometrics freakonometrics.hypotheses.org 11
  • 12. Arthur Charpentier, SIDE Summer School, July 2019 Observable and Non-Obsevable Heterogeneity Mixture distribution (with two classes) : • if θ = A, Y ∼ N(µA, σ2 A) • if θ = B, Y ∼ N(µB, σ2 B) f(y) = pAfA(y) + pBfB(y) 5 parameters to estimate, no interpretation of the mixture parameter θ Height (in cm) Density 150 160 170 180 190 200 0.000.010.020.030.040.05 @freakonometrics freakonometrics freakonometrics.hypotheses.org 12
  • 13. Arthur Charpentier, SIDE Summer School, July 2019 Observable and Non-Observable Heterogeneity One categorical variable (e.g. gender) • if gender=M, Y ∼ N(µM , σ2 M ) • if gender=F, Y ∼ N(µF , σ2 F ) f(y) = pM fM (y) + pF fF (y) 4 parameters to estimate, (pM and pF are known) clear interpretation of the mixture parameter Height (in cm) Density 150 160 170 180 190 200 0.000.010.020.030.040.05 @freakonometrics freakonometrics freakonometrics.hypotheses.org 13
  • 14. Arthur Charpentier, SIDE Summer School, July 2019 Expectation - Maximization EM for Mixtures (i) start with initial values θ1,0 and θ2,0, pj,0 (ii) for k = 1, 2, · · · E step : γk,j,i = dF θj,k−1 (yi) p1,k−1dF θ1,k−1 (yi) + p2,k−1dF θ2,k−1 (yi) M step : use ML techniques with weights γk,j,i M step with a Gaussian mixture, µj,k = γk,j,iyi γk,j,i and σ2 j,k = γk,j,i[yi − µj,k]2 γk,j,i @freakonometrics freakonometrics freakonometrics.hypotheses.org 14
  • 15. Arthur Charpentier, SIDE Summer School, July 2019 Expectation - Maximization Expectation - Maximization E step expectation : compute Q(θ, θk ) = E log f(Y |θ)|yobs, θk M step maximization : θk+1 = argmax θ Q(θ, θk ) Stochastic EM (for Mixtures) (i) start with initial values θ1,0 and θ2,0, pj,0 (ii) for k = 1, 2, · · · E step : γk,j,i = dF θj,k−1 (yi) p1,k−1dF θ1,k−1 (yi) + p2,k−1dF θ2,k−1 (yi) S step : generate ξk,i in {1, 2} with probabilities γk,1,i and γk,2,i M step : compute ML estimate θk,j on sample {yi : ξk,i = j} @freakonometrics freakonometrics freakonometrics.hypotheses.org 15
  • 16. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Single Imputation Classical idea : Principal Component Analysis (PCA) Approximate n × p matrix X with a lower rank matrix, Xs = argmin Y , rank(Y )≤s X − Y 2 2 = UsΛ 1/2 s V s (using Singular Value Decomposition) One can consider PCA with missing values, based on weighted least squares Xs = argmin Y , rank(Y )≤s W (X − Y ) 2 2 where W is the n × p matrix with 1’s, and Wi,j = 0 if xi,j is missing, see Gabriel & Zamir (1979, Lower rank approximation of matrices by least squares with any choice of weights) or Kiers (1997, Weighted least squares fitting using ordinary least squares algorithms) @freakonometrics freakonometrics freakonometrics.hypotheses.org 16
  • 17. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Single Imputation Iterative PCA (i) if xi,j is missing, Wi,j = 0, x1 i,j = Wi,j · x0 i,j + (1 − Wi,j) · 0 (ii) for k = 1, 2, · · · • Xs = argmin Y , rank(Y )≤s W (X − Y ) 2 2 • xk+1 i,j = Wi,j · xk i,j + (1 − Wi,j) · xi,j q q q q q q q q −0.5 0.0 0.5 1.0 1.5 −0.50.00.51.01.5 Connections with fixed effects model, xi,j = s k=1 fi,kuj,k + εi,j with εi,j ∼ N(0, σ2) and random effects model, xi = Γzi + εi with εi ∼ N(0, σ2I) and zi ∼ N(0, I) @freakonometrics freakonometrics freakonometrics.hypotheses.org 17
  • 18. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Single Imputation Iterative PCA (i) if xi,j is missing, Wi,j = 0, x1 i,j = Wi,j · x0 i,j + (1 − Wi,j) · 0 (ii) for k = 1, 2, · · · • Xs = argmin Y , rank(Y )≤s W (X − Y ) 2 2 • xk+1 i,j = Wi,j · xk i,j + (1 − Wi,j) · xi,j q q q q q q q q −0.5 0.0 0.5 1.0 1.5 −0.50.00.51.01.5 q q Connections with fixed effects model, xi,j = s k=1 fi,kuj,k + εi,j with εi,j ∼ N(0, σ2) and random effects model, xi = Γzi + εi with εi ∼ N(0, σ2I) and zi ∼ N(0, I) @freakonometrics freakonometrics freakonometrics.hypotheses.org 18
  • 19. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Single Imputation Iterative PCA (i) if xi,j is missing, Wi,j = 0, x1 i,j = Wi,j · x0 i,j + (1 − Wi,j) · 0 (ii) for k = 1, 2, · · · • Xs = argmin Y , rank(Y )≤s W (X − Y ) 2 2 • xk+1 i,j = Wi,j · xk i,j + (1 − Wi,j) · xi,j q q q q q q q q −0.5 0.0 0.5 1.0 1.5 −0.50.00.51.01.5 q q Connections with fixed effects model, xi,j = s k=1 fi,kuj,k + εi,j with εi,j ∼ N(0, σ2) and random effects model, xi = Γzi + εi with εi ∼ N(0, σ2I) and zi ∼ N(0, I) @freakonometrics freakonometrics freakonometrics.hypotheses.org 19
  • 20. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Single Imputation Iterative PCA (i) if xi,j is missing, Wi,j = 0, x1 i,j = Wi,j · x0 i,j + (1 − Wi,j) · 0 (ii) for k = 1, 2, · · · • Xs = argmin Y , rank(Y )≤s W (X − Y ) 2 2 • xk+1 i,j = Wi,j · xk i,j + (1 − Wi,j) · xi,j q q q q q q q q q q −0.5 0.0 0.5 1.0 1.5 −0.50.00.51.01.5 q q q q Connections with fixed effects model, xi,j = s k=1 fi,kuj,k + εi,j with εi,j ∼ N(0, σ2) and random effects model, xi = Γzi + εi with εi ∼ N(0, σ2I) and zi ∼ N(0, I) @freakonometrics freakonometrics freakonometrics.hypotheses.org 20
  • 21. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Single Imputation Iterative PCA (i) if xi,j is missing, Wi,j = 0, x1 i,j = Wi,j · x0 i,j + (1 − Wi,j) · 0 (ii) for k = 1, 2, · · · • Xs = argmin Y , rank(Y )≤s W (X − Y ) 2 2 • xk+1 i,j = Wi,j · xk i,j + (1 − Wi,j) · xi,j q q q q q q q q q q −0.5 0.0 0.5 1.0 1.5 −0.50.00.51.01.5 q q Connections with fixed effects model, xi,j = s k=1 fi,kuj,k + εi,j with εi,j ∼ N(0, σ2) and random effects model, xi = Γzi + εi with εi ∼ N(0, σ2I) and zi ∼ N(0, I) @freakonometrics freakonometrics freakonometrics.hypotheses.org 21
  • 22. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Single Imputation The iterative PCA is simply using EM on fixed effects model, xi,j = s i=1 fi,jui,j + εi,j with εi,j ∼ N(0, σ2 ) X n×p = F n×s U p×s Log-likelihood is here log L(F , u, σ2 ) = − np 2 log 2πσ2 − 1 2σ2 X − F u 2 E step : compute E Xi,j X, F k, Uk, σ2 k (imputation) M step : maximize the log-likelihood Uk+1 = Xk F k F k F k −1 and F k+1 = XkUk Uk Uk −1 @freakonometrics freakonometrics freakonometrics.hypotheses.org 22
  • 23. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Single Imputation One can use regularized iterative PCA. So far, we used (SVD) XsUsΛ 1/2 s V s Xi,j = s k=1 λkUi,kVj,k Following Efron & Morris (1972, Limiting the Risk of Bayes and Empirical Bayes Estimators) consider a shrinkage version Xi,j = s k=1 λk − σ2 λk λkUi,kVj,k = s k=1 λk − σ2 λk Ui,kVj,k where σ2 = n[λs + 1 + · · · + λp] np − p − ns − ps + s2 + s See package missMDA @freakonometrics freakonometrics freakonometrics.hypotheses.org 23
  • 24. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Single Imputation One can use soft-thresholding PCA. Following Hastie & Mazumber (2015, Matrix Completion and Low-Rank SVD) Xi,j = s k=1 λk − λ + Ui,kVj,k solution of Xs = argmin Y , rank(Y )≤s W (X − Y ) 2 2 + λ Y where the penalty is based on the nuclear norm (sum of the singular values). Complicated to select λ... See package softImpute @freakonometrics freakonometrics freakonometrics.hypotheses.org 24
  • 25. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Single Imputation One can also use k-nearest neigbors with missMDA::imputePCA(y,ncp=1) and VIM::kNN(y,k=5) @freakonometrics freakonometrics freakonometrics.hypotheses.org 25
  • 26. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Multiple Imputation It aims to allow for the uncertainty about the missing data by creating several different plausible imputed data sets (via Sterne et al. (2009, Multiple imputation for missing data) Reference, Rubin (2007, Multiple imputation for nonresponse in surveys) The idea is to generate N possible values for each missing value, see Honaker, King & Blackwell (2010, Amelia) and library Amelia using boostrap samples or van Buuren (2018, Multivariate Imputation by Chained Equations) with mice using bootstrap and regression The idea of imputation is both seductive and dangerous. It is seductive because it can lull the user into the pleasurable state of believing that the data are complete after all, and it is dangerous because it lumps together situations where the problem is sufficiently minor that it can be legitimately handled in this way and situations where standard estimators applied to the real and imputed data have substantial biases Dempster & Rubin (1983, Incomplete Data in Sample Surveys) @freakonometrics freakonometrics freakonometrics.hypotheses.org 26
  • 27. Arthur Charpentier, SIDE Summer School, July 2019 Missing Values : Gaussian process regression (and krigging) Extrapolation or interpolation ? x y 1 y1 2 y2 3 ? x y 1 y1 2 ? 3 y3    y1 y2 y3    ∼ N   0,    σ1,1 σ1,2 σ1,3 σ2,1 σ2,2 σ2,3 σ3,1 σ3,2 σ3,3       y y ∼ N 0, Σ Σ Σ Σ (y |y) ∼ N(µy, Σy) where µy = Σ Σ−1y Σy = Σ − Σ Σ−1Σ see Roberts et al. (2012, Gaussian Processes for Time Series) or Rasmussen & Williams (2006, Gaussian Processes for Machine Learning) @freakonometrics freakonometrics freakonometrics.hypotheses.org 27