SlideShare a Scribd company logo
Arthur Charpentier, SIDE Summer School, July 2019
# 5 Classification & Boosting
Arthur Charpentier (Universit´e du Qu´ebec `a Montr´eal)
Machine Learning & Econometrics
SIDE Summer School - July 2019
@freakonometrics freakonometrics freakonometrics.hypotheses.org 1
Arthur Charpentier, SIDE Summer School, July 2019
Starting Point: Classification Tree
1 library(rpart)
2 cart = rpart(PRONO˜.,data=
myocarde)
3 library(rpart.plot)
4 prp(cart ,type=2, extra =1)
A (binary) split is based on one specific variable âĂŞ say xj âĂŞ and a cutoff,
say s. Then, there are two options:
• either xi,j ≤ s, then observation i goes on the left, in IL
• or xi,j > s, then observation i goes on the right, in IR
Thus, I = IL ∪ IR.
@freakonometrics freakonometrics freakonometrics.hypotheses.org 2
Arthur Charpentier, SIDE Summer School, July 2019
Classification : Classification Trees
Gini for node I is defined as
G(I) = −
y∈{0,1}
py(1 − py)
where py is the proportion of individuals in the leaf of type y,
G(I) = −
y∈{0,1}
ny,I
nI
1 −
ny,I
nI
1 gini = function(y,classe){
2 T. = table(y,classe)
3 nx = apply(T,2,sum)
4 n. = sum(T)
5 pxy = T/matrix(rep(nx ,each =2) ,nrow =2)
6 omega = matrix(rep(nx ,each =2) ,nrow =2)/n
7 g. = -sum(omega*pxy*(1-pxy))
8 return(g)}
@freakonometrics freakonometrics freakonometrics.hypotheses.org 3
Arthur Charpentier, SIDE Summer School, July 2019
Classification : Classification Trees
1 -2*mean(myocarde$PRONO)*(1-mean(myocarde$PRONO))
2 [1] -0.4832375
3 gini(y=myocarde$PRONO ,classe=myocarde$PRONO <Inf)
4 [1] -0.4832375
5 gini(y=myocarde$PRONO ,classe=myocarde [ ,1] <=100)
6 [1] -0.4640415
@freakonometrics freakonometrics freakonometrics.hypotheses.org 4
Arthur Charpentier, SIDE Summer School, July 2019
Classification : Classification Trees
if we split, define index
G(IL, IR) = −
x∈{L,R}
nx
nIx
nI
y∈{0,1}
ny,Ix
nIx
1 −
ny,Ix
nIx
the entropic measure is
E(I) = −
y∈{0,1}
ny,I
nI
log
ny,I
nI
1 entropy = function(y,classe){
2 T = table(y,classe)
3 nx = apply(T,2,sum)
4 pxy = T/matrix(rep(nx ,each =2) ,nrow =2)
5 omega = matrix(rep(nx ,each =2) ,nrow =2)/sum(T)
6 g = sum(omega*pxy*log(pxy))
7 return(g)}
@freakonometrics freakonometrics freakonometrics.hypotheses.org 5
Arthur Charpentier, SIDE Summer School, July 2019
1 mat_gini = mat_v=matrix(NA ,7 ,101)
2 for(v in 1:7){
3 variable=myocarde[,v]
4 v_seuil=seq(quantile(myocarde[,v],
5 6/length(myocarde[,v])),
6 quantile(myocarde[,v],1-6/length(
7 myocarde[,v])),length =101)
8 mat_v[v,]=v_seuil
9 for(i in 1:101){
10 CLASSE=variable <=v_seuil[i]
11 mat_gini[v,i]=
12 gini(y=myocarde$PRONO ,classe=CLASSE)}}
13 -(gini(y=myocarde$PRONO ,classe =( myocarde
[ ,3] <19))-
14 gini(y=myocarde$PRONO ,classe =( myocarde [,3]<
Inf)))/
15 gini(y=myocarde$PRONO ,classe =( myocarde [,3]<
Inf))
16 [1] 0.5862131
@freakonometrics freakonometrics freakonometrics.hypotheses.org 6
Arthur Charpentier, SIDE Summer School, July 2019
1 idx = which(myocarde$INSYS <19)
2 mat_gini = mat_v = matrix(NA ,7 ,101)
3 for(v in 1:7){
4 variable = myocarde[idx ,v]
5 v_seuil = seq(quantile(myocarde[idx ,v],
6 7/length(myocarde[idx ,v])),
7 quantile(myocarde[idx ,v],1-7/length(
8 myocarde[idx ,v])), length =101)
9 mat_v[v,] = v_seuil
10 for(i in 1:101){
11 CLASSE = variable <=v_seuil[i]
12 mat_gini[v,i]=
13 gini(y=myocarde$PRONO[idx],classe=
CLASSE)}}
14 par(mfrow=c(3 ,2))
15 for(v in 2:7){
16 plot(mat_v[v,],mat_gini[v ,])
17 }
@freakonometrics freakonometrics freakonometrics.hypotheses.org 7
Arthur Charpentier, SIDE Summer School, July 2019
1 idx = which(myocarde$INSYS >=19)
2 mat_gini = mat_v = matrix(NA ,7 ,101)
3 for(v in 1:7){
4 variable=myocarde[idx ,v]
5 v_seuil=seq(quantile(myocarde[idx ,v],
6 6/length(myocarde[idx ,v])),
7 quantile(myocarde[idx ,v],1-6/length(
8 myocarde[idx ,v])), length =101)
9 mat_v[v,]=v_seuil
10 for(i in 1:101){
11 CLASSE=variable <=v_seuil[i]
12 mat_gini[v,i]=
13 gini(y=myocarde$PRONO[idx],
14 classe=CLASSE)}}
15 par(mfrow=c(3 ,2))
16 for(v in 2:7){
17 plot(mat_v[v,],mat_gini[v ,])
18 }
@freakonometrics freakonometrics freakonometrics.hypotheses.org 8
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
Classification problem, yi ∈ {•, •}, consider a model at stage
k − 1,
if mk−1(xi) = yi, increase the weight given to observation i
Boosting : weak learner
A weak model is a model slightly better than a pure
random one (head/tails)
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5q
q
q
q
q
q
q
q
q
@freakonometrics freakonometrics freakonometrics.hypotheses.org 9
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
Classification problem, yi ∈ {•, •}, consider a model at stage
k − 1,
if mk−1(xi) = yi, increase the weight given to observation i
Boosting : weak learner
A weak model is a model slightly better than a pure
random one (head/tails)
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.5
q
q
q
q
q
q
q
q
q
q
@freakonometrics freakonometrics freakonometrics.hypotheses.org 10
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
Adaboost Algorithm
1. Set weights ωi = 1/n, i = 1, · · · , n
2 . For k = 1, · · ·
(i) fit model on (yi, xi) with weights ωi, get hk(x)
(ii) compute the error rate εk =
n
i=1
˜ωi1yi=hk(xi)
(iii) compute αk = log
1 − εk
εk
(iv) reevaluate the weights ωi = ωi · eαk1yi=hk(xi)
3. The final model is hκ(x) =
κ
k≥1
αkhk(x)
The error rate should not be too small (εk ≤ 50%) to insure αk > 0
@freakonometrics freakonometrics freakonometrics.hypotheses.org 11
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
The general problem in machine learning is to find m (·) = argmin
m∈M
E (Y, g(X)
Use loss (y, g(x)) = 1y=g(x.
Empirical version is mn(·) = argmin
m∈M
1
n
n
i=1
(yi, g(xi) = argmin
m∈M
1
n
n
i=1
1yi=g(xi)
Complicated problem : use a convex version of the loss function
(y, g(x) = exp[−y · g(x)]
From Hastie et al. (2009), with the adaboost algorithm,
hκ(·) = hκ−1(·) + ακhκ(x) = hκ−1(·) + 2β H (·)
where (β , H (·)) = argmin
(β,H)∈(R,M)
n
i=1
exp − yi · (hκ−1(xi) + βH(xi)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 12
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 13
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 14
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 15
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 16
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 17
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 18
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 19
Arthur Charpentier, SIDE Summer School, July 2019
Boosting & Adaboost
From Freund & Shapire (1999), empirical error of hκ(·) satisfies
1
n
n
i=1
1yi=hκ(xi) ≤ exp −2
κ
k=1
(εk − 0.5)2
(when weak learners are better than random classification, empirical error tends
to 0 - exponentially fast)
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
0 2000 4000 6000 8000
0.00.10.20.30.40.5
Number of iterations
Error
training sample
validation sample
@freakonometrics freakonometrics freakonometrics.hypotheses.org 20
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting
Newton-Raphson to minimize a strictly convex function g : R → R
At minimum, g (x ) = 0, so consider first order approximation
g (x + h) ≈ g (x) + h · g (x)
Consider sequence xk = xk−1 − αg (xk−1) where α = [g (xk−1)]−1
One can consider a functional version of that technique, ∀i = 1, · · · , n,
gk(xi) = gk−1(xi) − α
∂ (yi, g(xi))
∂g(xi) g(xi)=gk−1(xi)
This provides a sequence of function gk at points xi.
To get values at any point x use regression i’s on xi’s,
εi = −
∂ (yi, g))
∂g g=gk−1(xi)
If α = 1 and (y, g) = exp[−yg], we have (almost) adaboost
@freakonometrics freakonometrics freakonometrics.hypotheses.org 21
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting
Gradient Boosting Algorithm
1. Start with a constant model, h0(x) = argmin
c∈R
1
n
n
i=1
(yi, c) and a regu-
larization parameter α ∈ (0, 1)
2 . For k = 1, · · ·
(i) compute εi = −
∂ (yi, g))
∂g g=gk−1(xi)
(ii) fit the (weak) model on sample (εi, xi) and let Hk denote the mode
(iii) update the model hk(·) = hk−1(·) + αHk(·)
3. The final model is hκ(x)
The choice of α is (somehow) not important : use α ∼ 10%
@freakonometrics freakonometrics freakonometrics.hypotheses.org 22
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting
The logiboost model is obtained when y ∈ {0, 1} and loss function is
(y, m) = log[1 + exp(−2(2y − 1)m)]
Boosting (learning from the mistakes)
Sequential Learning
mk(·) = mk−1(·) + α · argmin
h∈H



n
i=1
yi − mk−1(xi)
εi
, h(xi)



Hence, learning is sequential, as opposed to bagging...
@freakonometrics freakonometrics freakonometrics.hypotheses.org 23
Arthur Charpentier, SIDE Summer School, July 2019
Bagging
Gradient Boosting Algorithm
1. For k = 1, · · ·
(i) draw a bootstrap sample from (yi, xi)’s
(ii) estimate a model mk on that sample
2. The final model is m (·) =
1
κ
κ
i=1
mk(·)
To illustrate, suppose that m is some parametric model mθ.
mk = mθk
, obtained some sample Sk = {(yi, xi), i ∈ Ik}.
Let σ2
(x) = Var[mθ
(x)] and ρ(x) = Corr[mθ1
(x), mθ2
(x)] obtained on two
ramdom boostrap samples
Var[m (x)] = ρ(x)σ2
(x) +
1 − ρ(x)
κ
σ2
(x)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 24
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
We have used (y, g(x) = exp[−y · m(x)] instead of 1y=m(x.
Misclassification error is (upper) bounded by the exponential loss
1
n
n
i=1
1yi·m(xi
≤
1
n
n
i=1
exp[−yi · m(xi]
Here m(x) is a linear combination of weak classifier, m(x) =
κ
j=1
αjhj(x).
Let M = [Mi,j] where Mi,j = yi · hj(xi) ∈ {−1, +1}, i.e. Mi,j = 1 whenever
(weak) classifier j correctly classifies individual i.
yi · m(xi) =
κ
j=1
αjyihj(xi) = Mα i
thus, R(α) =
1
n
n
i=1
exp[−yi · m(xi)] =
1
n
n
i=1
exp − (Mα)i
@freakonometrics freakonometrics freakonometrics.hypotheses.org 25
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
One can use coordinate descent, in direction j in which the directional derivative
is the steepest,
j ∈ argmin −
∂R(α + aej)
∂a a=0
where the objective can be written
−
∂
∂a
1
n
n
i=1
exp − (Mα)i − a(Mej)i
a=0
=
1
n
n
i=1
Mij exp − (Mα)i
Then
j ∈ argmin (d M)j where di =
exp[−(Mα)i]
i exp[−(Mα)i]
@freakonometrics freakonometrics freakonometrics.hypotheses.org 26
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
Then do a line-search to see how far we should go. The derivative is null if
−
∂R(α + aej)
∂a
= 0 i.e. a =
1
2
log
d+
=
1
2
log
1 − d−
d−
where d− =
i:Mi,j =−1
di and d+ =
i:Mi,j =+1
di.
Coordinate Descent Algorithm
1. di = 1/n for i = 1, · · · , n and α = 0
2 . For k = 1, · · ·
(i) find optimal direction j ∈ argmin (d M)j
(ii) compute − =
i:Mi,j =−1
di and ak =
1
2
log
1 − d−
d−
(iii) set α = α + akej and di =
exp[−(Mα)i]
i exp[−(Mα)i]
@freakonometrics freakonometrics freakonometrics.hypotheses.org 27
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
very close to Adaboost : αj is the sum of ak where direction j was considered,
αj =
κ
k=1
ak1j (k)=j
Thus
m (x) =
κ
k=1
αjhj(x) =
κ
k=1
akhj (k)(x)
With Adaboost, we go in the same direction, with the same intensity : Adaboost
is equivalent to minimizing the exponential loss by coordinate descent.
Thus, we seek m (·) = argmin E(Y,X)∼F
exp (−Y · m(X))
which is minimized at m (x) =
1
2
log
P[Y = +1|X = x]
P[Y = −1|X = x]
(very close to the logistic regression)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 28
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
Several packages can be used with R, such as adabag::boosting
1 library(adabag)
2 library(caret)
3 indexes= createDataPartition (myocarde$PRONO , p=.70 , list = FALSE)
4 train = myocarde[indexes , ]
5 test = myocarde[-indexes , ]
6 model = boosting(PRONO˜., data=train , boos=TRUE , mfinal =50)
7 pred = predict(model , test)
8 print(pred$confusion)
9 Observed Class
10 Predicted Class DECES SURVIE
11 DECES 5 0
12 SURVIE 3 12
or use cross-validation
1 cvmodel = boosting.cv(PRONO˜., data=myocarde , boos=TRUE , mfinal =10, v
=5)
@freakonometrics freakonometrics freakonometrics.hypotheses.org 29
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
or xgboost::xgboost
1 library(xgboost)
2 library(caret)
3 train_x = data.matrix(train [,-8])
4 train_y = train [,8]
5 test_x = data.matrix(test [,-8])
6 test_y = test [,8]
7 xgb_train = xgb.DMatrix(data=train_x, label=train_y)
8 xgb_test = xgb.DMatrix(data=test_x, label=test_y)
9 xgbc = xgboost(data=xgb_train , max.depth =3, nrounds =50)
10 pred = predict(xgbc , xgb_test)
11 pred_y = as.factor (( levels(test_y))[round(pred)])
12 (cm = e1071 :: confusionMatrix (test_y, pred_y))
13 Reference
14 Prediction DECES SURVIE
15 DECES 6 2
16 SURVIE 0 12
@freakonometrics freakonometrics freakonometrics.hypotheses.org 30
Arthur Charpentier, SIDE Summer School, July 2019
Gradient Boosting & Computational Issues
or gbm::gbm
1 library(gbm)
2 library(caret)
3 mod_gbm = gbm(PRONO =="SURVIE" ˜.,
4 data = train ,
5 distribution = "bernoulli",
6 cv.folds = 7,
7 shrinkage = .01,
8 n. minobsinnode = 10,
9 n.trees = 200)
10 pred = predict.gbm(object = mod_gbm ,
11 newdata = test ,
12 n.trees = 200,
13 type = "response")
@freakonometrics freakonometrics freakonometrics.hypotheses.org 31

More Related Content

What's hot

Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
Arthur Charpentier
 
Side 2019 #6
Side 2019 #6Side 2019 #6
Side 2019 #6
Arthur Charpentier
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
Arthur Charpentier
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4
Arthur Charpentier
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
Arthur Charpentier
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
Arthur Charpentier
 
Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
Arthur Charpentier
 
Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017
Arthur Charpentier
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
Arthur Charpentier
 
Slides networks-2017-2
Slides networks-2017-2Slides networks-2017-2
Slides networks-2017-2
Arthur Charpentier
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
Arthur Charpentier
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
Arthur Charpentier
 
Econometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 NonlinearitiesEconometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 Nonlinearities
Arthur Charpentier
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
Arthur Charpentier
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
Arthur Charpentier
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
Arthur Charpentier
 
Sildes buenos aires
Sildes buenos airesSildes buenos aires
Sildes buenos aires
Arthur Charpentier
 
Econ. Seminar Uqam
Econ. Seminar UqamEcon. Seminar Uqam
Econ. Seminar Uqam
Arthur Charpentier
 

What's hot (20)

Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
 
Side 2019 #6
Side 2019 #6Side 2019 #6
Side 2019 #6
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
 
Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
Slides univ-van-amsterdam
Slides univ-van-amsterdamSlides univ-van-amsterdam
Slides univ-van-amsterdam
 
Slides networks-2017-2
Slides networks-2017-2Slides networks-2017-2
Slides networks-2017-2
 
Slides erm-cea-ia
Slides erm-cea-iaSlides erm-cea-ia
Slides erm-cea-ia
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
Econometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 NonlinearitiesEconometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 Nonlinearities
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
 
Sildes buenos aires
Sildes buenos airesSildes buenos aires
Sildes buenos aires
 
Econ. Seminar Uqam
Econ. Seminar UqamEcon. Seminar Uqam
Econ. Seminar Uqam
 

Similar to Side 2019 #5

Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
Arthur Charpentier
 
Classification
ClassificationClassification
Classification
Arthur Charpentier
 
Slides ub-3
Slides ub-3Slides ub-3
Slides ub-3
Arthur Charpentier
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
Arthur Charpentier
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
Arthur Charpentier
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
Arthur Charpentier
 
Slides ub-2
Slides ub-2Slides ub-2
Slides ub-2
Arthur Charpentier
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
Arthur Charpentier
 
Probability QA 7
Probability QA 7Probability QA 7
Probability QA 7
Lakshmikanta Satapathy
 
Slides ensae-2016-9
Slides ensae-2016-9Slides ensae-2016-9
Slides ensae-2016-9
Arthur Charpentier
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
Arthur Charpentier
 
Mutualisation et Segmentation
Mutualisation et SegmentationMutualisation et Segmentation
Mutualisation et Segmentation
Arthur Charpentier
 
Slides ub-7
Slides ub-7Slides ub-7
Slides ub-7
Arthur Charpentier
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
Arthur Charpentier
 
Slides ensae 9
Slides ensae 9Slides ensae 9
Slides ensae 9
Arthur Charpentier
 
Inequality #4
Inequality #4Inequality #4
Inequality #4
Arthur Charpentier
 
Tutorial_2.pdf
Tutorial_2.pdfTutorial_2.pdf
Tutorial_2.pdf
mayooran1987v
 
Slides barcelona Machine Learning
Slides barcelona Machine LearningSlides barcelona Machine Learning
Slides barcelona Machine Learning
Arthur Charpentier
 
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Quinn Lathrop
 

Similar to Side 2019 #5 (20)

Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
 
Classification
ClassificationClassification
Classification
 
Slides ub-3
Slides ub-3Slides ub-3
Slides ub-3
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
 
Slides ub-2
Slides ub-2Slides ub-2
Slides ub-2
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Probability QA 7
Probability QA 7Probability QA 7
Probability QA 7
 
Slides ensae-2016-9
Slides ensae-2016-9Slides ensae-2016-9
Slides ensae-2016-9
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 
Mutualisation et Segmentation
Mutualisation et SegmentationMutualisation et Segmentation
Mutualisation et Segmentation
 
Slides ub-7
Slides ub-7Slides ub-7
Slides ub-7
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Slides ensae 9
Slides ensae 9Slides ensae 9
Slides ensae 9
 
Inequality #4
Inequality #4Inequality #4
Inequality #4
 
Tutorial_2.pdf
Tutorial_2.pdfTutorial_2.pdf
Tutorial_2.pdf
 
Slides barcelona Machine Learning
Slides barcelona Machine LearningSlides barcelona Machine Learning
Slides barcelona Machine Learning
 
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
 
Slides ineq-4
Slides ineq-4Slides ineq-4
Slides ineq-4
 

More from Arthur Charpentier

Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
Arthur Charpentier
 
ACT6100 introduction
ACT6100 introductionACT6100 introduction
ACT6100 introduction
Arthur Charpentier
 
Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
Arthur Charpentier
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
Arthur Charpentier
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
Arthur Charpentier
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
Arthur Charpentier
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
Arthur Charpentier
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
Arthur Charpentier
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
Arthur Charpentier
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
Arthur Charpentier
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
Arthur Charpentier
 

More from Arthur Charpentier (11)

Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
ACT6100 introduction
ACT6100 introductionACT6100 introduction
ACT6100 introduction
 
Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
 

Recently uploaded

Instant Issue Debit Cards
Instant Issue Debit CardsInstant Issue Debit Cards
Instant Issue Debit Cards
egoetzinger
 
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
Quotidiano Piemontese
 
The WhatsPump Pseudonym Problem and the Hilarious Downfall of Artificial Enga...
The WhatsPump Pseudonym Problem and the Hilarious Downfall of Artificial Enga...The WhatsPump Pseudonym Problem and the Hilarious Downfall of Artificial Enga...
The WhatsPump Pseudonym Problem and the Hilarious Downfall of Artificial Enga...
muslimdavidovich670
 
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
obyzuk
 
What price will pi network be listed on exchanges
What price will pi network be listed on exchangesWhat price will pi network be listed on exchanges
What price will pi network be listed on exchanges
DOT TECH
 
Seminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership NetworksSeminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership Networks
GRAPE
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
egoetzinger
 
WhatsPump Thriving in the Whirlwind of Biden’s Crypto Roller Coaster
WhatsPump Thriving in the Whirlwind of Biden’s Crypto Roller CoasterWhatsPump Thriving in the Whirlwind of Biden’s Crypto Roller Coaster
WhatsPump Thriving in the Whirlwind of Biden’s Crypto Roller Coaster
muslimdavidovich670
 
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
Vighnesh Shashtri
 
how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.
DOT TECH
 
Patronage and Good Governance 5.pptx pptc
Patronage and Good Governance 5.pptx pptcPatronage and Good Governance 5.pptx pptc
Patronage and Good Governance 5.pptx pptc
AbdulNasirNichari
 
Analyzing the instability of equilibrium in thr harrod domar model
Analyzing the instability of equilibrium in thr harrod domar modelAnalyzing the instability of equilibrium in thr harrod domar model
Analyzing the instability of equilibrium in thr harrod domar model
ManthanBhardwaj4
 
can I really make money with pi network.
can I really make money with pi network.can I really make money with pi network.
can I really make money with pi network.
DOT TECH
 
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
sameer shah
 
What website can I sell pi coins securely.
What website can I sell pi coins securely.What website can I sell pi coins securely.
What website can I sell pi coins securely.
DOT TECH
 
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
nexop1
 
Donald Trump Presentation and his life.pptx
Donald Trump Presentation and his life.pptxDonald Trump Presentation and his life.pptx
Donald Trump Presentation and his life.pptx
SerdarHudaykuliyew
 
when will pi network coin be available on crypto exchange.
when will pi network coin be available on crypto exchange.when will pi network coin be available on crypto exchange.
when will pi network coin be available on crypto exchange.
DOT TECH
 
5 Tips for Creating Standard Financial Reports
5 Tips for Creating Standard Financial Reports5 Tips for Creating Standard Financial Reports
5 Tips for Creating Standard Financial Reports
EasyReports
 
where can I find a legit pi merchant online
where can I find a legit pi merchant onlinewhere can I find a legit pi merchant online
where can I find a legit pi merchant online
DOT TECH
 

Recently uploaded (20)

Instant Issue Debit Cards
Instant Issue Debit CardsInstant Issue Debit Cards
Instant Issue Debit Cards
 
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
 
The WhatsPump Pseudonym Problem and the Hilarious Downfall of Artificial Enga...
The WhatsPump Pseudonym Problem and the Hilarious Downfall of Artificial Enga...The WhatsPump Pseudonym Problem and the Hilarious Downfall of Artificial Enga...
The WhatsPump Pseudonym Problem and the Hilarious Downfall of Artificial Enga...
 
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
 
What price will pi network be listed on exchanges
What price will pi network be listed on exchangesWhat price will pi network be listed on exchanges
What price will pi network be listed on exchanges
 
Seminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership NetworksSeminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership Networks
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
 
WhatsPump Thriving in the Whirlwind of Biden’s Crypto Roller Coaster
WhatsPump Thriving in the Whirlwind of Biden’s Crypto Roller CoasterWhatsPump Thriving in the Whirlwind of Biden’s Crypto Roller Coaster
WhatsPump Thriving in the Whirlwind of Biden’s Crypto Roller Coaster
 
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
 
how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.
 
Patronage and Good Governance 5.pptx pptc
Patronage and Good Governance 5.pptx pptcPatronage and Good Governance 5.pptx pptc
Patronage and Good Governance 5.pptx pptc
 
Analyzing the instability of equilibrium in thr harrod domar model
Analyzing the instability of equilibrium in thr harrod domar modelAnalyzing the instability of equilibrium in thr harrod domar model
Analyzing the instability of equilibrium in thr harrod domar model
 
can I really make money with pi network.
can I really make money with pi network.can I really make money with pi network.
can I really make money with pi network.
 
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
 
What website can I sell pi coins securely.
What website can I sell pi coins securely.What website can I sell pi coins securely.
What website can I sell pi coins securely.
 
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
 
Donald Trump Presentation and his life.pptx
Donald Trump Presentation and his life.pptxDonald Trump Presentation and his life.pptx
Donald Trump Presentation and his life.pptx
 
when will pi network coin be available on crypto exchange.
when will pi network coin be available on crypto exchange.when will pi network coin be available on crypto exchange.
when will pi network coin be available on crypto exchange.
 
5 Tips for Creating Standard Financial Reports
5 Tips for Creating Standard Financial Reports5 Tips for Creating Standard Financial Reports
5 Tips for Creating Standard Financial Reports
 
where can I find a legit pi merchant online
where can I find a legit pi merchant onlinewhere can I find a legit pi merchant online
where can I find a legit pi merchant online
 

Side 2019 #5

  • 1. Arthur Charpentier, SIDE Summer School, July 2019 # 5 Classification & Boosting Arthur Charpentier (Universit´e du Qu´ebec `a Montr´eal) Machine Learning & Econometrics SIDE Summer School - July 2019 @freakonometrics freakonometrics freakonometrics.hypotheses.org 1
  • 2. Arthur Charpentier, SIDE Summer School, July 2019 Starting Point: Classification Tree 1 library(rpart) 2 cart = rpart(PRONO˜.,data= myocarde) 3 library(rpart.plot) 4 prp(cart ,type=2, extra =1) A (binary) split is based on one specific variable âĂŞ say xj âĂŞ and a cutoff, say s. Then, there are two options: • either xi,j ≤ s, then observation i goes on the left, in IL • or xi,j > s, then observation i goes on the right, in IR Thus, I = IL ∪ IR. @freakonometrics freakonometrics freakonometrics.hypotheses.org 2
  • 3. Arthur Charpentier, SIDE Summer School, July 2019 Classification : Classification Trees Gini for node I is defined as G(I) = − y∈{0,1} py(1 − py) where py is the proportion of individuals in the leaf of type y, G(I) = − y∈{0,1} ny,I nI 1 − ny,I nI 1 gini = function(y,classe){ 2 T. = table(y,classe) 3 nx = apply(T,2,sum) 4 n. = sum(T) 5 pxy = T/matrix(rep(nx ,each =2) ,nrow =2) 6 omega = matrix(rep(nx ,each =2) ,nrow =2)/n 7 g. = -sum(omega*pxy*(1-pxy)) 8 return(g)} @freakonometrics freakonometrics freakonometrics.hypotheses.org 3
  • 4. Arthur Charpentier, SIDE Summer School, July 2019 Classification : Classification Trees 1 -2*mean(myocarde$PRONO)*(1-mean(myocarde$PRONO)) 2 [1] -0.4832375 3 gini(y=myocarde$PRONO ,classe=myocarde$PRONO <Inf) 4 [1] -0.4832375 5 gini(y=myocarde$PRONO ,classe=myocarde [ ,1] <=100) 6 [1] -0.4640415 @freakonometrics freakonometrics freakonometrics.hypotheses.org 4
  • 5. Arthur Charpentier, SIDE Summer School, July 2019 Classification : Classification Trees if we split, define index G(IL, IR) = − x∈{L,R} nx nIx nI y∈{0,1} ny,Ix nIx 1 − ny,Ix nIx the entropic measure is E(I) = − y∈{0,1} ny,I nI log ny,I nI 1 entropy = function(y,classe){ 2 T = table(y,classe) 3 nx = apply(T,2,sum) 4 pxy = T/matrix(rep(nx ,each =2) ,nrow =2) 5 omega = matrix(rep(nx ,each =2) ,nrow =2)/sum(T) 6 g = sum(omega*pxy*log(pxy)) 7 return(g)} @freakonometrics freakonometrics freakonometrics.hypotheses.org 5
  • 6. Arthur Charpentier, SIDE Summer School, July 2019 1 mat_gini = mat_v=matrix(NA ,7 ,101) 2 for(v in 1:7){ 3 variable=myocarde[,v] 4 v_seuil=seq(quantile(myocarde[,v], 5 6/length(myocarde[,v])), 6 quantile(myocarde[,v],1-6/length( 7 myocarde[,v])),length =101) 8 mat_v[v,]=v_seuil 9 for(i in 1:101){ 10 CLASSE=variable <=v_seuil[i] 11 mat_gini[v,i]= 12 gini(y=myocarde$PRONO ,classe=CLASSE)}} 13 -(gini(y=myocarde$PRONO ,classe =( myocarde [ ,3] <19))- 14 gini(y=myocarde$PRONO ,classe =( myocarde [,3]< Inf)))/ 15 gini(y=myocarde$PRONO ,classe =( myocarde [,3]< Inf)) 16 [1] 0.5862131 @freakonometrics freakonometrics freakonometrics.hypotheses.org 6
  • 7. Arthur Charpentier, SIDE Summer School, July 2019 1 idx = which(myocarde$INSYS <19) 2 mat_gini = mat_v = matrix(NA ,7 ,101) 3 for(v in 1:7){ 4 variable = myocarde[idx ,v] 5 v_seuil = seq(quantile(myocarde[idx ,v], 6 7/length(myocarde[idx ,v])), 7 quantile(myocarde[idx ,v],1-7/length( 8 myocarde[idx ,v])), length =101) 9 mat_v[v,] = v_seuil 10 for(i in 1:101){ 11 CLASSE = variable <=v_seuil[i] 12 mat_gini[v,i]= 13 gini(y=myocarde$PRONO[idx],classe= CLASSE)}} 14 par(mfrow=c(3 ,2)) 15 for(v in 2:7){ 16 plot(mat_v[v,],mat_gini[v ,]) 17 } @freakonometrics freakonometrics freakonometrics.hypotheses.org 7
  • 8. Arthur Charpentier, SIDE Summer School, July 2019 1 idx = which(myocarde$INSYS >=19) 2 mat_gini = mat_v = matrix(NA ,7 ,101) 3 for(v in 1:7){ 4 variable=myocarde[idx ,v] 5 v_seuil=seq(quantile(myocarde[idx ,v], 6 6/length(myocarde[idx ,v])), 7 quantile(myocarde[idx ,v],1-6/length( 8 myocarde[idx ,v])), length =101) 9 mat_v[v,]=v_seuil 10 for(i in 1:101){ 11 CLASSE=variable <=v_seuil[i] 12 mat_gini[v,i]= 13 gini(y=myocarde$PRONO[idx], 14 classe=CLASSE)}} 15 par(mfrow=c(3 ,2)) 16 for(v in 2:7){ 17 plot(mat_v[v,],mat_gini[v ,]) 18 } @freakonometrics freakonometrics freakonometrics.hypotheses.org 8
  • 9. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost Classification problem, yi ∈ {•, •}, consider a model at stage k − 1, if mk−1(xi) = yi, increase the weight given to observation i Boosting : weak learner A weak model is a model slightly better than a pure random one (head/tails) q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5 q q q q q q q q q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5q q q q q q q q q @freakonometrics freakonometrics freakonometrics.hypotheses.org 9
  • 10. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost Classification problem, yi ∈ {•, •}, consider a model at stage k − 1, if mk−1(xi) = yi, increase the weight given to observation i Boosting : weak learner A weak model is a model slightly better than a pure random one (head/tails) q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5 q q q q q q q q q q q q q q q q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 0.5 q q q q q q q q q q @freakonometrics freakonometrics freakonometrics.hypotheses.org 10
  • 11. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost Adaboost Algorithm 1. Set weights ωi = 1/n, i = 1, · · · , n 2 . For k = 1, · · · (i) fit model on (yi, xi) with weights ωi, get hk(x) (ii) compute the error rate εk = n i=1 ˜ωi1yi=hk(xi) (iii) compute αk = log 1 − εk εk (iv) reevaluate the weights ωi = ωi · eαk1yi=hk(xi) 3. The final model is hκ(x) = κ k≥1 αkhk(x) The error rate should not be too small (εk ≤ 50%) to insure αk > 0 @freakonometrics freakonometrics freakonometrics.hypotheses.org 11
  • 12. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost The general problem in machine learning is to find m (·) = argmin m∈M E (Y, g(X) Use loss (y, g(x)) = 1y=g(x. Empirical version is mn(·) = argmin m∈M 1 n n i=1 (yi, g(xi) = argmin m∈M 1 n n i=1 1yi=g(xi) Complicated problem : use a convex version of the loss function (y, g(x) = exp[−y · g(x)] From Hastie et al. (2009), with the adaboost algorithm, hκ(·) = hκ−1(·) + ακhκ(x) = hκ−1(·) + 2β H (·) where (β , H (·)) = argmin (β,H)∈(R,M) n i=1 exp − yi · (hκ−1(xi) + βH(xi) @freakonometrics freakonometrics freakonometrics.hypotheses.org 12
  • 13. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 13
  • 14. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 14
  • 15. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 15
  • 16. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 16
  • 17. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 17
  • 18. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 18
  • 19. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 19
  • 20. Arthur Charpentier, SIDE Summer School, July 2019 Boosting & Adaboost From Freund & Shapire (1999), empirical error of hκ(·) satisfies 1 n n i=1 1yi=hκ(xi) ≤ exp −2 κ k=1 (εk − 0.5)2 (when weak learners are better than random classification, empirical error tends to 0 - exponentially fast) −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q −3 −2 −1 0 1 2 3 0.00.20.40.60.81.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 0 2000 4000 6000 8000 0.00.10.20.30.40.5 Number of iterations Error training sample validation sample @freakonometrics freakonometrics freakonometrics.hypotheses.org 20
  • 21. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting Newton-Raphson to minimize a strictly convex function g : R → R At minimum, g (x ) = 0, so consider first order approximation g (x + h) ≈ g (x) + h · g (x) Consider sequence xk = xk−1 − αg (xk−1) where α = [g (xk−1)]−1 One can consider a functional version of that technique, ∀i = 1, · · · , n, gk(xi) = gk−1(xi) − α ∂ (yi, g(xi)) ∂g(xi) g(xi)=gk−1(xi) This provides a sequence of function gk at points xi. To get values at any point x use regression i’s on xi’s, εi = − ∂ (yi, g)) ∂g g=gk−1(xi) If α = 1 and (y, g) = exp[−yg], we have (almost) adaboost @freakonometrics freakonometrics freakonometrics.hypotheses.org 21
  • 22. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting Gradient Boosting Algorithm 1. Start with a constant model, h0(x) = argmin c∈R 1 n n i=1 (yi, c) and a regu- larization parameter α ∈ (0, 1) 2 . For k = 1, · · · (i) compute εi = − ∂ (yi, g)) ∂g g=gk−1(xi) (ii) fit the (weak) model on sample (εi, xi) and let Hk denote the mode (iii) update the model hk(·) = hk−1(·) + αHk(·) 3. The final model is hκ(x) The choice of α is (somehow) not important : use α ∼ 10% @freakonometrics freakonometrics freakonometrics.hypotheses.org 22
  • 23. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting The logiboost model is obtained when y ∈ {0, 1} and loss function is (y, m) = log[1 + exp(−2(2y − 1)m)] Boosting (learning from the mistakes) Sequential Learning mk(·) = mk−1(·) + α · argmin h∈H    n i=1 yi − mk−1(xi) εi , h(xi)    Hence, learning is sequential, as opposed to bagging... @freakonometrics freakonometrics freakonometrics.hypotheses.org 23
  • 24. Arthur Charpentier, SIDE Summer School, July 2019 Bagging Gradient Boosting Algorithm 1. For k = 1, · · · (i) draw a bootstrap sample from (yi, xi)’s (ii) estimate a model mk on that sample 2. The final model is m (·) = 1 κ κ i=1 mk(·) To illustrate, suppose that m is some parametric model mθ. mk = mθk , obtained some sample Sk = {(yi, xi), i ∈ Ik}. Let σ2 (x) = Var[mθ (x)] and ρ(x) = Corr[mθ1 (x), mθ2 (x)] obtained on two ramdom boostrap samples Var[m (x)] = ρ(x)σ2 (x) + 1 − ρ(x) κ σ2 (x) @freakonometrics freakonometrics freakonometrics.hypotheses.org 24
  • 25. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues We have used (y, g(x) = exp[−y · m(x)] instead of 1y=m(x. Misclassification error is (upper) bounded by the exponential loss 1 n n i=1 1yi·m(xi ≤ 1 n n i=1 exp[−yi · m(xi] Here m(x) is a linear combination of weak classifier, m(x) = κ j=1 αjhj(x). Let M = [Mi,j] where Mi,j = yi · hj(xi) ∈ {−1, +1}, i.e. Mi,j = 1 whenever (weak) classifier j correctly classifies individual i. yi · m(xi) = κ j=1 αjyihj(xi) = Mα i thus, R(α) = 1 n n i=1 exp[−yi · m(xi)] = 1 n n i=1 exp − (Mα)i @freakonometrics freakonometrics freakonometrics.hypotheses.org 25
  • 26. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues One can use coordinate descent, in direction j in which the directional derivative is the steepest, j ∈ argmin − ∂R(α + aej) ∂a a=0 where the objective can be written − ∂ ∂a 1 n n i=1 exp − (Mα)i − a(Mej)i a=0 = 1 n n i=1 Mij exp − (Mα)i Then j ∈ argmin (d M)j where di = exp[−(Mα)i] i exp[−(Mα)i] @freakonometrics freakonometrics freakonometrics.hypotheses.org 26
  • 27. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues Then do a line-search to see how far we should go. The derivative is null if − ∂R(α + aej) ∂a = 0 i.e. a = 1 2 log d+ = 1 2 log 1 − d− d− where d− = i:Mi,j =−1 di and d+ = i:Mi,j =+1 di. Coordinate Descent Algorithm 1. di = 1/n for i = 1, · · · , n and α = 0 2 . For k = 1, · · · (i) find optimal direction j ∈ argmin (d M)j (ii) compute − = i:Mi,j =−1 di and ak = 1 2 log 1 − d− d− (iii) set α = α + akej and di = exp[−(Mα)i] i exp[−(Mα)i] @freakonometrics freakonometrics freakonometrics.hypotheses.org 27
  • 28. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues very close to Adaboost : αj is the sum of ak where direction j was considered, αj = κ k=1 ak1j (k)=j Thus m (x) = κ k=1 αjhj(x) = κ k=1 akhj (k)(x) With Adaboost, we go in the same direction, with the same intensity : Adaboost is equivalent to minimizing the exponential loss by coordinate descent. Thus, we seek m (·) = argmin E(Y,X)∼F exp (−Y · m(X)) which is minimized at m (x) = 1 2 log P[Y = +1|X = x] P[Y = −1|X = x] (very close to the logistic regression) @freakonometrics freakonometrics freakonometrics.hypotheses.org 28
  • 29. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues Several packages can be used with R, such as adabag::boosting 1 library(adabag) 2 library(caret) 3 indexes= createDataPartition (myocarde$PRONO , p=.70 , list = FALSE) 4 train = myocarde[indexes , ] 5 test = myocarde[-indexes , ] 6 model = boosting(PRONO˜., data=train , boos=TRUE , mfinal =50) 7 pred = predict(model , test) 8 print(pred$confusion) 9 Observed Class 10 Predicted Class DECES SURVIE 11 DECES 5 0 12 SURVIE 3 12 or use cross-validation 1 cvmodel = boosting.cv(PRONO˜., data=myocarde , boos=TRUE , mfinal =10, v =5) @freakonometrics freakonometrics freakonometrics.hypotheses.org 29
  • 30. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues or xgboost::xgboost 1 library(xgboost) 2 library(caret) 3 train_x = data.matrix(train [,-8]) 4 train_y = train [,8] 5 test_x = data.matrix(test [,-8]) 6 test_y = test [,8] 7 xgb_train = xgb.DMatrix(data=train_x, label=train_y) 8 xgb_test = xgb.DMatrix(data=test_x, label=test_y) 9 xgbc = xgboost(data=xgb_train , max.depth =3, nrounds =50) 10 pred = predict(xgbc , xgb_test) 11 pred_y = as.factor (( levels(test_y))[round(pred)]) 12 (cm = e1071 :: confusionMatrix (test_y, pred_y)) 13 Reference 14 Prediction DECES SURVIE 15 DECES 6 2 16 SURVIE 0 12 @freakonometrics freakonometrics freakonometrics.hypotheses.org 30
  • 31. Arthur Charpentier, SIDE Summer School, July 2019 Gradient Boosting & Computational Issues or gbm::gbm 1 library(gbm) 2 library(caret) 3 mod_gbm = gbm(PRONO =="SURVIE" ˜., 4 data = train , 5 distribution = "bernoulli", 6 cv.folds = 7, 7 shrinkage = .01, 8 n. minobsinnode = 10, 9 n.trees = 200) 10 pred = predict.gbm(object = mod_gbm , 11 newdata = test , 12 n.trees = 200, 13 type = "response") @freakonometrics freakonometrics freakonometrics.hypotheses.org 31