SlideShare a Scribd company logo
1 of 31
Download to read offline
Arthur Charpentier, SIDE Summer School, July 2019
# 5 Classification & Boosting
Arthur Charpentier (Universit´e du Qu´ebec `a Montr´eal)
Machine Learning & Econometrics
SIDE Summer School - July 2019
@freakonometrics freakonometrics freakonometrics.hypotheses.org 1
Arthur Charpentier, SIDE Summer School, July 2019
Starting Point: Classification Tree
1 library(rpart)
2 cart = rpart(PRONO˜.,data=
myocarde)
3 library(rpart.plot)
4 prp(cart ,type=2, extra =1)
A (binary) split is based on one specific variable âĂŞ say xj âĂŞ and a cutoff,
say s. Then, there are two options:
• either xi,j ≤ s, then observation i goes on the left, in IL
• or xi,j > s, then observation i goes on the right, in IR
Thus, I = IL ∪ IR.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 2
Arthur Charpentier, SIDE Summer School, July 2019
Classification : Classification Trees
Gini for node I is defined as
G(I) = −
y∈{0,1}
py(1 − py)
where py is the proportion of individuals in the leaf of type y,
G(I) = −
y∈{0,1}
ny,I
nI
1 −
ny,I
nI
1 gini = function(y,classe){
2 T. = table(y,classe)
3 nx = apply(T,2,sum)
4 n. = sum(T)
5 pxy = T/matrix(rep(nx ,each =2) ,nrow =2)
6 omega = matrix(rep(nx ,each =2) ,nrow =2)/n
7 g. = -sum(omega*pxy*(1-pxy))
8 return(g)}
@freakonometrics freakonometrics freakonometrics.hypotheses.org 3
Arthur Charpentier, SIDE Summer School, July 2019
Classification : Classification Trees
1 -2*mean(myocarde$PRONO)*(1-mean(myocarde$PRONO))
2 [1] -0.4832375
3 gini(y=myocarde$PRONO ,classe=myocarde$PRONO <Inf)
4 [1] -0.4832375
5 gini(y=myocarde$PRONO ,classe=myocarde [ ,1] <=100)
6 [1] -0.4640415
@freakonometrics freakonometrics freakonometrics.hypotheses.org 4
Arthur Charpentier, SIDE Summer School, July 2019
Classification : Classification Trees
if we split, define index
G(IL, IR) = −
x∈{L,R}
nx
nIx
nI
y∈{0,1}
ny,Ix
nIx
1 −
ny,Ix
nIx
the entropic measure is
E(I) = −
y∈{0,1}
ny,I
nI
log
ny,I
nI
1 entropy = function(y,classe){
2 T = table(y,classe)
3 nx = apply(T,2,sum)
4 pxy = T/matrix(rep(nx ,each =2) ,nrow =2)
5 omega = matrix(rep(nx ,each =2) ,nrow =2)/sum(T)
6 g = sum(omega*pxy*log(pxy))
7 return(g)}
@freakonometrics freakonometrics freakonometrics.hypotheses.org 5
Arthur Charpentier, SIDE Summer School, July 2019
1 mat_gini = mat_v=matrix(NA ,7 ,101)
2 for(v in 1:7){
3 variable=myocarde[,v]
4 v_seuil=seq(quantile(myocarde[,v],
5 6/length(myocarde[,v])),
6 quantile(myocarde[,v],1-6/length(
7 myocarde[,v])),length =101)
8 mat_v[v,]=v_seuil
9 for(i in 1:101){
10 CLASSE=variable <=v_seuil[i]
11 mat_gini[v,i]=
12 gini(y=myocarde$PRONO ,classe=CLASSE)}}
13 -(gini(y=myocarde$PRONO ,classe =( myocarde
[ ,3] <19))-
14 gini(y=myocarde$PRONO ,classe =( myocarde [,3]<
Inf)))/
15 gini(y=myocarde$PRONO ,classe =( myocarde [,3]<
Inf))
16 [1] 0.5862131
@freakonometrics freakonometrics freakonometrics.hypotheses.org 6
Arthur Charpentier, SIDE Summer School, July 2019
1 idx = which(myocarde$INSYS <19)
2 mat_gini = mat_v = matrix(NA ,7 ,101)
3 for(v in 1:7){
4 variable = myocarde[idx ,v]
5 v_seuil = seq(quantile(myocarde[idx ,v],
6 7/length(myocarde[idx ,v])),
7 quantile(myocarde[idx ,v],1-7/length(
8 myocarde[idx ,v])), length =101)
9 mat_v[v,] = v_seuil
10 for(i in 1:101){
11 CLASSE = variable <=v_seuil[i]
12 mat_gini[v,i]=
13 gini(y=myocarde$PRONO[idx],classe=
CLASSE)}}
14 par(mfrow=c(3 ,2))
15 for(v in 2:7){
16 plot(mat_v[v,],mat_gini[v ,])
17 }
@freakonometrics freakonometrics freakonometrics.hypotheses.org 7
Arthur Charpentier, SIDE Summer School, July 2019
1 idx = which(myocarde$INSYS >=19)
2 mat_gini = mat_v = matrix(NA ,7 ,101)
3 for(v in 1:7){
4 variable=myocarde[idx ,v]
5 v_seuil=seq(quantile(myocarde[idx ,v],
6 6/length(myocarde[idx ,v])),
7 quantile(myocarde[idx ,v],1-6/length(
8 myocarde[idx ,v])), length =101)
9 mat_v[v,]=v_seuil
10 for(i in 1:101){
11 CLASSE=variable <=v_seuil[i]
12 mat_gini[v,i]=
13 gini(y=myocarde$PRONO[idx],
14 classe=CLASSE)}}
15 par(mfrow=c(3 ,2))
16 for(v in 2:7){
17 plot(mat_v[v,],mat_gini[v ,])
18 }
@freakonometrics freakonometrics freakonometrics.hypotheses.org 8
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
Classification problem, yi ∈ {•, •}, consider a model at stage
k − 1,
if mk−1(xi) = yi, increase the weight given to observation i
Boosting : weak learner
A weak model is a model slightly better than a pure
random one (head/tails)
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5q
q
q
q
q
q
q
q
q
@freakonometrics freakonometrics freakonometrics.hypotheses.org 9
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
Classification problem, yi ∈ {•, •}, consider a model at stage
k − 1,
if mk−1(xi) = yi, increase the weight given to observation i
Boosting : weak learner
A weak model is a model slightly better than a pure
random one (head/tails)
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5
q
q
q
q
q
q
q
q
q
q
@freakonometrics freakonometrics freakonometrics.hypotheses.org 10
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
Adaboost Algorithm
1. Set weights ωi = 1/n, i = 1, · · · , n
2 . For k = 1, · · ·
(i) fit model on (yi, xi) with weights ωi, get hk(x)
(ii) compute the error rate εk =
n
i=1
˜ωi1yi=hk(xi)
(iii) compute αk = log
1 − εk
εk
(iv) reevaluate the weights ωi = ωi · eαk1yi=hk(xi)
3. The final model is hκ(x) =
κ
k≥1
αkhk(x)
The error rate should not be too small (εk ≤ 50%) to insure αk > 0
@freakonometrics freakonometrics freakonometrics.hypotheses.org 11
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
The general problem in machine learning is to find m (·) = argmin
m∈M
E (Y, g(X)
Use loss (y, g(x)) = 1y=g(x.
Empirical version is mn(·) = argmin
m∈M
1
n
n
i=1
(yi, g(xi) = argmin
m∈M
1
n
n
i=1
1yi=g(xi)
Complicated problem : use a convex version of the loss function
(y, g(x) = exp[−y · g(x)]
From Hastie et al. (2009), with the adaboost algorithm,
hκ(·) = hκ−1(·) + ακhκ(x) = hκ−1(·) + 2β H (·)
where (β , H (·)) = argmin
(β,H)∈(R,M)
n
i=1
exp − yi · (hκ−1(xi) + βH(xi)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 12
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 13
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 14
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 15
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 16
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 17
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 18
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 19
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 20
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting
Newton-Raphson to minimize a strictly convex function g : R → R
At minimum, g (x ) = 0, so consider first order approximation
g (x + h) ≈ g (x) + h · g (x)
Consider sequence xk = xk−1 − αg (xk−1) where α = [g (xk−1)]−1
One can consider a functional version of that technique, ∀i = 1, · · · , n,
gk(xi) = gk−1(xi) − α
∂ (yi, g(xi))
∂g(xi) g(xi)=gk−1(xi)
This provides a sequence of function gk at points xi.
To get values at any point x use regression i’s on xi’s,
εi = −
∂ (yi, g))
∂g g=gk−1(xi)
If α = 1 and (y, g) = exp[−yg], we have (almost) adaboost
@freakonometrics freakonometrics freakonometrics.hypotheses.org 21
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting
Gradient Boosting Algorithm
1. Start with a constant model, h0(x) = argmin
c∈R
1
n
n
i=1
(yi, c) and a regu-
larization parameter α ∈ (0, 1)
2 . For k = 1, · · ·
(i) compute εi = −
∂ (yi, g))
∂g g=gk−1(xi)
(ii) fit the (weak) model on sample (εi, xi) and let Hk denote the mode
(iii) update the model hk(·) = hk−1(·) + αHk(·)
3. The final model is hκ(x)
The choice of α is (somehow) not important : use α ∼ 10%
@freakonometrics freakonometrics freakonometrics.hypotheses.org 22
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting
The logiboost model is obtained when y ∈ {0, 1} and loss function is
(y, m) = log[1 + exp(−2(2y − 1)m)]
Boosting (learning from the mistakes)
Sequential Learning
mk(·) = mk−1(·) + α · argmin
h∈H



n
i=1
yi − mk−1(xi)
εi
, h(xi)



Hence, learning is sequential, as opposed to bagging...
@freakonometrics freakonometrics freakonometrics.hypotheses.org 23
Arthur Charpentier, SIDE Summer School, July 2019
Bagging
Gradient Boosting Algorithm
1. For k = 1, · · ·
(i) draw a bootstrap sample from (yi, xi)’s
(ii) estimate a model mk on that sample
2. The final model is m (·) =
1
κ
κ
i=1
mk(·)
To illustrate, suppose that m is some parametric model mθ.
mk = mθk
, obtained some sample Sk = {(yi, xi), i ∈ Ik}.
Let σ2
(x) = Var[mθ
(x)] and ρ(x) = Corr[mθ1
(x), mθ2
(x)] obtained on two
ramdom boostrap samples
Var[m (x)] = ρ(x)σ2
(x) +
1 − ρ(x)
κ
σ2
(x)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 24
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
We have used (y, g(x) = exp[−y · m(x)] instead of 1y=m(x.
Misclassification error is (upper) bounded by the exponential loss
1
n
n
i=1
1yi·m(xi
≤
1
n
n
i=1
exp[−yi · m(xi]
Here m(x) is a linear combination of weak classifier, m(x) =
κ
j=1
αjhj(x).
Let M = [Mi,j] where Mi,j = yi · hj(xi) ∈ {−1, +1}, i.e. Mi,j = 1 whenever
(weak) classifier j correctly classifies individual i.
yi · m(xi) =
κ
j=1
αjyihj(xi) = Mα i
thus, R(α) =
1
n
n
i=1
exp[−yi · m(xi)] =
1
n
n
i=1
exp − (Mα)i
@freakonometrics freakonometrics freakonometrics.hypotheses.org 25
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
One can use coordinate descent, in direction j in which the directional derivative
is the steepest,
j ∈ argmin −
∂R(α + aej)
∂a a=0
where the objective can be written
−
∂
∂a
1
n
n
i=1
exp − (Mα)i − a(Mej)i
a=0
=
1
n
n
i=1
Mij exp − (Mα)i
Then
j ∈ argmin (d M)j where di =
exp[−(Mα)i]
i exp[−(Mα)i]
@freakonometrics freakonometrics freakonometrics.hypotheses.org 26
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
Then do a line-search to see how far we should go. The derivative is null if
−
∂R(α + aej)
∂a
= 0 i.e. a =
1
2
log
d+
=
1
2
log
1 − d−
d−
where d− =
i:Mi,j =−1
di and d+ =
i:Mi,j =+1
di.
Coordinate Descent Algorithm
1. di = 1/n for i = 1, · · · , n and α = 0
2 . For k = 1, · · ·
(i) find optimal direction j ∈ argmin (d M)j
(ii) compute − =
i:Mi,j =−1
di and ak =
1
2
log
1 − d−
d−
(iii) set α = α + akej and di =
exp[−(Mα)i]
i exp[−(Mα)i]
@freakonometrics freakonometrics freakonometrics.hypotheses.org 27
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
very close to Adaboost : αj is the sum of ak where direction j was considered,
αj =
κ
k=1
ak1j (k)=j
Thus
m (x) =
κ
k=1
αjhj(x) =
κ
k=1
akhj (k)(x)
With Adaboost, we go in the same direction, with the same intensity : Adaboost
is equivalent to minimizing the exponential loss by coordinate descent.
Thus, we seek m (·) = argmin E(Y,X)∼F
exp (−Y · m(X))
which is minimized at m (x) =
1
2
log
P[Y = +1|X = x]
P[Y = −1|X = x]
(very close to the logistic regression)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 28
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
Several packages can be used with R, such as adabag::boosting
1 library(adabag)
2 library(caret)
3 indexes= createDataPartition (myocarde$PRONO , p=.70 , list = FALSE)
4 train = myocarde[indexes , ]
5 test = myocarde[-indexes , ]
6 model = boosting(PRONO˜., data=train , boos=TRUE , mfinal =50)
7 pred = predict(model , test)
8 print(pred$confusion)
9 Observed Class
10 Predicted Class DECES SURVIE
11 DECES 5 0
12 SURVIE 3 12
or use cross-validation
1 cvmodel = boosting.cv(PRONO˜., data=myocarde , boos=TRUE , mfinal =10, v
=5)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 29
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
or xgboost::xgboost
1 library(xgboost)
2 library(caret)
3 train_x = data.matrix(train [,-8])
4 train_y = train [,8]
5 test_x = data.matrix(test [,-8])
6 test_y = test [,8]
7 xgb_train = xgb.DMatrix(data=train_x, label=train_y)
8 xgb_test = xgb.DMatrix(data=test_x, label=test_y)
9 xgbc = xgboost(data=xgb_train , max.depth =3, nrounds =50)
10 pred = predict(xgbc , xgb_test)
11 pred_y = as.factor (( levels(test_y))[round(pred)])
12 (cm = e1071 :: confusionMatrix (test_y, pred_y))
13 Reference
14 Prediction DECES SURVIE
15 DECES 6 2
16 SURVIE 0 12
@freakonometrics freakonometrics freakonometrics.hypotheses.org 30
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
or gbm::gbm
1 library(gbm)
2 library(caret)
3 mod_gbm = gbm(PRONO =="SURVIE" ˜.,
4 data = train ,
5 distribution = "bernoulli",
6 cv.folds = 7,
7 shrinkage = .01,
8 n. minobsinnode = 10,
9 n.trees = 200)
10 pred = predict.gbm(object = mod_gbm ,
11 newdata = test ,
12 n.trees = 200,
13 type = "response")
@freakonometrics freakonometrics freakonometrics.hypotheses.org 31

More Related Content

What's hot (20)

Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
 
Side 2019 #6
Side 2019 #6Side 2019 #6
Side 2019 #6
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
 
Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
Slides univ-van-amsterdam
Slides univ-van-amsterdamSlides univ-van-amsterdam
Slides univ-van-amsterdam
 
Slides networks-2017-2
Slides networks-2017-2Slides networks-2017-2
Slides networks-2017-2
 
Slides erm-cea-ia
Slides erm-cea-iaSlides erm-cea-ia
Slides erm-cea-ia
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
Econometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 NonlinearitiesEconometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 Nonlinearities
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
 
Sildes buenos aires
Sildes buenos airesSildes buenos aires
Sildes buenos aires
 
Econ. Seminar Uqam
Econ. Seminar UqamEcon. Seminar Uqam
Econ. Seminar Uqam
 

Similar to Side 2019 #5 (20)

Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
 
Classification
ClassificationClassification
Classification
 
Slides ub-3
Slides ub-3Slides ub-3
Slides ub-3
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
 
Slides ub-2
Slides ub-2Slides ub-2
Slides ub-2
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Probability QA 7
Probability QA 7Probability QA 7
Probability QA 7
 
Slides ensae-2016-9
Slides ensae-2016-9Slides ensae-2016-9
Slides ensae-2016-9
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 
Mutualisation et Segmentation
Mutualisation et SegmentationMutualisation et Segmentation
Mutualisation et Segmentation
 
Slides ub-7
Slides ub-7Slides ub-7
Slides ub-7
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Slides ensae 9
Slides ensae 9Slides ensae 9
Slides ensae 9
 
Inequality #4
Inequality #4Inequality #4
Inequality #4
 
Tutorial_2.pdf
Tutorial_2.pdfTutorial_2.pdf
Tutorial_2.pdf
 
Slides barcelona Machine Learning
Slides barcelona Machine LearningSlides barcelona Machine Learning
Slides barcelona Machine Learning
 
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
 
Slides ineq-4
Slides ineq-4Slides ineq-4
Slides ineq-4
 

More from Arthur Charpentier

More from Arthur Charpentier (11)

Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
ACT6100 introduction
ACT6100 introductionACT6100 introduction
ACT6100 introduction
 
Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
 

Recently uploaded

Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesMarketing847413
 
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130Suhani Kapoor
 
Quarter 4- Module 3 Principles of Marketing
Quarter 4- Module 3 Principles of MarketingQuarter 4- Module 3 Principles of Marketing
Quarter 4- Module 3 Principles of MarketingMaristelaRamos12
 
The Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfThe Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfGale Pooley
 
The Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfThe Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfGale Pooley
 
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services  9892124323 | ₹,4500 With Room Free DeliveryMalad Call Girl in Services  9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free DeliveryPooja Nehwal
 
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptxFinTech Belgium
 
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Pooja Nehwal
 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfGale Pooley
 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...Call Girls in Nagpur High Profile
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignHenry Tapper
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Commonwealth
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure servicePooja Nehwal
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...ssifa0344
 
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdfAdnet Communications
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfGale Pooley
 
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Delhi Call girls
 

Recently uploaded (20)

Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast Slides
 
Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024
 
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
 
Quarter 4- Module 3 Principles of Marketing
Quarter 4- Module 3 Principles of MarketingQuarter 4- Module 3 Principles of Marketing
Quarter 4- Module 3 Principles of Marketing
 
The Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfThe Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdf
 
The Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfThe Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdf
 
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services  9892124323 | ₹,4500 With Room Free DeliveryMalad Call Girl in Services  9892124323 | ₹,4500 With Room Free Delivery
Malad Call Girl in Services 9892124323 | ₹,4500 With Room Free Delivery
 
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
 
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdf
 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
 
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdf
 
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
 

Side 2019 #5

  • 1. Arthur Charpentier, SIDE Summer School, July 2019 # 5 Classification & Boosting Arthur Charpentier (Universit´e du Qu´ebec `a Montr´eal) Machine Learning & Econometrics SIDE Summer School - July 2019 @freakonometrics freakonometrics freakonometrics.hypotheses.org 1
  • 2. Arthur Charpentier, SIDE Summer School, July 2019 Starting Point: Classification Tree 1 library(rpart) 2 cart = rpart(PRONO˜.,data= myocarde) 3 library(rpart.plot) 4 prp(cart ,type=2, extra =1) A (binary) split is based on one specific variable âĂŞ say xj âĂŞ and a cutoff, say s. Then, there are two options: • either xi,j ≤ s, then observation i goes on the left, in IL • or xi,j > s, then observation i goes on the right, in IR Thus, I = IL ∪ IR. @freakonometrics freakonometrics freakonometrics.hypotheses.org 2
  • 3. Arthur Charpentier, SIDE Summer School, July 2019 Classification : Classification Trees Gini for node I is defined as G(I) = − y∈{0,1} py(1 − py) where py is the proportion of individuals in the leaf of type y, G(I) = − y∈{0,1} ny,I nI 1 − ny,I nI 1 gini = function(y,classe){ 2 T. = table(y,classe) 3 nx = apply(T,2,sum) 4 n. = sum(T) 5 pxy = T/matrix(rep(nx ,each =2) ,nrow =2) 6 omega = matrix(rep(nx ,each =2) ,nrow =2)/n 7 g. = -sum(omega*pxy*(1-pxy)) 8 return(g)} @freakonometrics freakonometrics freakonometrics.hypotheses.org 3
  • 4. Arthur Charpentier, SIDE Summer School, July 2019 Classification : Classification Trees 1 -2*mean(myocarde$PRONO)*(1-mean(myocarde$PRONO)) 2 [1] -0.4832375 3 gini(y=myocarde$PRONO ,classe=myocarde$PRONO <Inf) 4 [1] -0.4832375 5 gini(y=myocarde$PRONO ,classe=myocarde [ ,1] <=100) 6 [1] -0.4640415 @freakonometrics freakonometrics freakonometrics.hypotheses.org 4
  • 5. Arthur Charpentier, SIDE Summer School, July 2019 Classification : Classification Trees if we split, define index G(IL, IR) = − x∈{L,R} nx nIx nI y∈{0,1} ny,Ix nIx 1 − ny,Ix nIx the entropic measure is E(I) = − y∈{0,1} ny,I nI log ny,I nI 1 entropy = function(y,classe){ 2 T = table(y,classe) 3 nx = apply(T,2,sum) 4 pxy = T/matrix(rep(nx ,each =2) ,nrow =2) 5 omega = matrix(rep(nx ,each =2) ,nrow =2)/sum(T) 6 g = sum(omega*pxy*log(pxy)) 7 return(g)} @freakonometrics freakonometrics freakonometrics.hypotheses.org 5
  • 6. Arthur Charpentier, SIDE Summer School, July 2019 1 mat_gini = mat_v=matrix(NA ,7 ,101) 2 for(v in 1:7){ 3 variable=myocarde[,v] 4 v_seuil=seq(quantile(myocarde[,v], 5 6/length(myocarde[,v])), 6 quantile(myocarde[,v],1-6/length( 7 myocarde[,v])),length =101) 8 mat_v[v,]=v_seuil 9 for(i in 1:101){ 10 CLASSE=variable <=v_seuil[i] 11 mat_gini[v,i]= 12 gini(y=myocarde$PRONO ,classe=CLASSE)}} 13 -(gini(y=myocarde$PRONO ,classe =( myocarde [ ,3] <19))- 14 gini(y=myocarde$PRONO ,classe =( myocarde [,3]< Inf)))/ 15 gini(y=myocarde$PRONO ,classe =( myocarde [,3]< Inf)) 16 [1] 0.5862131 @freakonometrics freakonometrics freakonometrics.hypotheses.org 6
  • 7. Arthur Charpentier, SIDE Summer School, July 2019 1 idx = which(myocarde$INSYS <19) 2 mat_gini = mat_v = matrix(NA ,7 ,101) 3 for(v in 1:7){ 4 variable = myocarde[idx ,v] 5 v_seuil = seq(quantile(myocarde[idx ,v], 6 7/length(myocarde[idx ,v])), 7 quantile(myocarde[idx ,v],1-7/length( 8 myocarde[idx ,v])), length =101) 9 mat_v[v,] = v_seuil 10 for(i in 1:101){ 11 CLASSE = variable <=v_seuil[i] 12 mat_gini[v,i]= 13 gini(y=myocarde$PRONO[idx],classe= CLASSE)}} 14 par(mfrow=c(3 ,2)) 15 for(v in 2:7){ 16 plot(mat_v[v,],mat_gini[v ,]) 17 } @freakonometrics freakonometrics freakonometrics.hypotheses.org 7
  • 8. Arthur Charpentier, SIDE Summer School, July 2019 1 idx = which(myocarde$INSYS >=19) 2 mat_gini = mat_v = matrix(NA ,7 ,101) 3 for(v in 1:7){ 4 variable=myocarde[idx ,v] 5 v_seuil=seq(quantile(myocarde[idx ,v], 6 6/length(myocarde[idx ,v])), 7 quantile(myocarde[idx ,v],1-6/length( 8 myocarde[idx ,v])), length =101) 9 mat_v[v,]=v_seuil 10 for(i in 1:101){ 11 CLASSE=variable <=v_seuil[i] 12 mat_gini[v,i]= 13 gini(y=myocarde$PRONO[idx], 14 classe=CLASSE)}} 15 par(mfrow=c(3 ,2)) 16 for(v in 2:7){ 17 plot(mat_v[v,],mat_gini[v ,]) 18 } @freakonometrics freakonometrics freakonometrics.hypotheses.org 8
  • 9. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost Classification problem, yi ∈ {•, •}, consider a model at stage k − 1, if mk−1(xi) = yi, increase the weight given to observation i Boosting : weak learner A weak model is a model slightly better than a pure random one (head/tails) q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5 q q q q q q q q q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5q q q q q q q q q @freakonometrics freakonometrics freakonometrics.hypotheses.org 9
  • 10. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost Classification problem, yi ∈ {•, •}, consider a model at stage k − 1, if mk−1(xi) = yi, increase the weight given to observation i Boosting : weak learner A weak model is a model slightly better than a pure random one (head/tails) q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5 q q q q q q q q q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5 q q q q q q q q q q @freakonometrics freakonometrics freakonometrics.hypotheses.org 10
  • 11. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost Adaboost Algorithm 1. Set weights ωi = 1/n, i = 1, · · · , n 2 . For k = 1, · · · (i) fit model on (yi, xi) with weights ωi, get hk(x) (ii) compute the error rate εk = n i=1 ˜ωi1yi=hk(xi) (iii) compute αk = log 1 − εk εk (iv) reevaluate the weights ωi = ωi · eαk1yi=hk(xi) 3. The final model is hκ(x) = κ k≥1 αkhk(x) The error rate should not be too small (εk ≤ 50%) to insure αk > 0 @freakonometrics freakonometrics freakonometrics.hypotheses.org 11
  • 12. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost The general problem in machine learning is to find m (·) = argmin m∈M E (Y, g(X) Use loss (y, g(x)) = 1y=g(x. Empirical version is mn(·) = argmin m∈M 1 n n i=1 (yi, g(xi) = argmin m∈M 1 n n i=1 1yi=g(xi) Complicated problem : use a convex version of the loss function (y, g(x) = exp[−y · g(x)] From Hastie et al. (2009), with the adaboost algorithm, hκ(·) = hκ−1(·) + ακhκ(x) = hκ−1(·) + 2β H (·) where (β , H (·)) = argmin (β,H)∈(R,M) n i=1 exp − yi · (hκ−1(xi) + βH(xi) @freakonometrics freakonometrics freakonometrics.hypotheses.org 12
  • 13. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 13
  • 14. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 14
  • 15. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 15
  • 16. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 16
  • 17. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 17
  • 18. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 18
  • 19. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 19
  • 20. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 20
  • 21. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting Newton-Raphson to minimize a strictly convex function g : R → R At minimum, g (x ) = 0, so consider first order approximation g (x + h) ≈ g (x) + h · g (x) Consider sequence xk = xk−1 − αg (xk−1) where α = [g (xk−1)]−1 One can consider a functional version of that technique, ∀i = 1, · · · , n, gk(xi) = gk−1(xi) − α ∂ (yi, g(xi)) ∂g(xi) g(xi)=gk−1(xi) This provides a sequence of function gk at points xi. To get values at any point x use regression i’s on xi’s, εi = − ∂ (yi, g)) ∂g g=gk−1(xi) If α = 1 and (y, g) = exp[−yg], we have (almost) adaboost @freakonometrics freakonometrics freakonometrics.hypotheses.org 21
  • 22. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting Gradient Boosting Algorithm 1. Start with a constant model, h0(x) = argmin c∈R 1 n n i=1 (yi, c) and a regu- larization parameter α ∈ (0, 1) 2 . For k = 1, · · · (i) compute εi = − ∂ (yi, g)) ∂g g=gk−1(xi) (ii) fit the (weak) model on sample (εi, xi) and let Hk denote the mode (iii) update the model hk(·) = hk−1(·) + αHk(·) 3. The final model is hκ(x) The choice of α is (somehow) not important : use α ∼ 10% @freakonometrics freakonometrics freakonometrics.hypotheses.org 22
  • 23. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting The logiboost model is obtained when y ∈ {0, 1} and loss function is (y, m) = log[1 + exp(−2(2y − 1)m)] Boosting (learning from the mistakes) Sequential Learning mk(·) = mk−1(·) + α · argmin h∈H    n i=1 yi − mk−1(xi) εi , h(xi)    Hence, learning is sequential, as opposed to bagging... @freakonometrics freakonometrics freakonometrics.hypotheses.org 23
  • 24. Arthur Charpentier, SIDE Summer School, July 2019 Bagging Gradient Boosting Algorithm 1. For k = 1, · · · (i) draw a bootstrap sample from (yi, xi)’s (ii) estimate a model mk on that sample 2. The final model is m (·) = 1 κ κ i=1 mk(·) To illustrate, suppose that m is some parametric model mθ. mk = mθk , obtained some sample Sk = {(yi, xi), i ∈ Ik}. Let σ2 (x) = Var[mθ (x)] and ρ(x) = Corr[mθ1 (x), mθ2 (x)] obtained on two ramdom boostrap samples Var[m (x)] = ρ(x)σ2 (x) + 1 − ρ(x) κ σ2 (x) @freakonometrics freakonometrics freakonometrics.hypotheses.org 24
  • 25. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues We have used (y, g(x) = exp[−y · m(x)] instead of 1y=m(x. Misclassification error is (upper) bounded by the exponential loss 1 n n i=1 1yi·m(xi ≤ 1 n n i=1 exp[−yi · m(xi] Here m(x) is a linear combination of weak classifier, m(x) = κ j=1 αjhj(x). Let M = [Mi,j] where Mi,j = yi · hj(xi) ∈ {−1, +1}, i.e. Mi,j = 1 whenever (weak) classifier j correctly classifies individual i. yi · m(xi) = κ j=1 αjyihj(xi) = Mα i thus, R(α) = 1 n n i=1 exp[−yi · m(xi)] = 1 n n i=1 exp − (Mα)i @freakonometrics freakonometrics freakonometrics.hypotheses.org 25
  • 26. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues One can use coordinate descent, in direction j in which the directional derivative is the steepest, j ∈ argmin − ∂R(α + aej) ∂a a=0 where the objective can be written − ∂ ∂a 1 n n i=1 exp − (Mα)i − a(Mej)i a=0 = 1 n n i=1 Mij exp − (Mα)i Then j ∈ argmin (d M)j where di = exp[−(Mα)i] i exp[−(Mα)i] @freakonometrics freakonometrics freakonometrics.hypotheses.org 26
  • 27. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues Then do a line-search to see how far we should go. The derivative is null if − ∂R(α + aej) ∂a = 0 i.e. a = 1 2 log d+ = 1 2 log 1 − d− d− where d− = i:Mi,j =−1 di and d+ = i:Mi,j =+1 di. Coordinate Descent Algorithm 1. di = 1/n for i = 1, · · · , n and α = 0 2 . For k = 1, · · · (i) find optimal direction j ∈ argmin (d M)j (ii) compute − = i:Mi,j =−1 di and ak = 1 2 log 1 − d− d− (iii) set α = α + akej and di = exp[−(Mα)i] i exp[−(Mα)i] @freakonometrics freakonometrics freakonometrics.hypotheses.org 27
  • 28. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues very close to Adaboost : αj is the sum of ak where direction j was considered, αj = κ k=1 ak1j (k)=j Thus m (x) = κ k=1 αjhj(x) = κ k=1 akhj (k)(x) With Adaboost, we go in the same direction, with the same intensity : Adaboost is equivalent to minimizing the exponential loss by coordinate descent. Thus, we seek m (·) = argmin E(Y,X)∼F exp (−Y · m(X)) which is minimized at m (x) = 1 2 log P[Y = +1|X = x] P[Y = −1|X = x] (very close to the logistic regression) @freakonometrics freakonometrics freakonometrics.hypotheses.org 28
  • 29. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues Several packages can be used with R, such as adabag::boosting 1 library(adabag) 2 library(caret) 3 indexes= createDataPartition (myocarde$PRONO , p=.70 , list = FALSE) 4 train = myocarde[indexes , ] 5 test = myocarde[-indexes , ] 6 model = boosting(PRONO˜., data=train , boos=TRUE , mfinal =50) 7 pred = predict(model , test) 8 print(pred$confusion) 9 Observed Class 10 Predicted Class DECES SURVIE 11 DECES 5 0 12 SURVIE 3 12 or use cross-validation 1 cvmodel = boosting.cv(PRONO˜., data=myocarde , boos=TRUE , mfinal =10, v =5) @freakonometrics freakonometrics freakonometrics.hypotheses.org 29
  • 30. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues or xgboost::xgboost 1 library(xgboost) 2 library(caret) 3 train_x = data.matrix(train [,-8]) 4 train_y = train [,8] 5 test_x = data.matrix(test [,-8]) 6 test_y = test [,8] 7 xgb_train = xgb.DMatrix(data=train_x, label=train_y) 8 xgb_test = xgb.DMatrix(data=test_x, label=test_y) 9 xgbc = xgboost(data=xgb_train , max.depth =3, nrounds =50) 10 pred = predict(xgbc , xgb_test) 11 pred_y = as.factor (( levels(test_y))[round(pred)]) 12 (cm = e1071 :: confusionMatrix (test_y, pred_y)) 13 Reference 14 Prediction DECES SURVIE 15 DECES 6 2 16 SURVIE 0 12 @freakonometrics freakonometrics freakonometrics.hypotheses.org 30
  • 31. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues or gbm::gbm 1 library(gbm) 2 library(caret) 3 mod_gbm = gbm(PRONO =="SURVIE" ˜., 4 data = train , 5 distribution = "bernoulli", 6 cv.folds = 7, 7 shrinkage = .01, 8 n. minobsinnode = 10, 9 n.trees = 200) 10 pred = predict.gbm(object = mod_gbm , 11 newdata = test , 12 n.trees = 200, 13 type = "response") @freakonometrics freakonometrics freakonometrics.hypotheses.org 31