Poster

RESEARCH POSTER PRESENTATION DESIGN © 2012
www.PosterPresentations.com
• Display advertising is graphical advertising on websites that appears next to
content on web pages, instant message (IM) applications, email, etc. According
to recent reports by Forbes.com, 90% of ad agencies and marketers believe that
display ads are a great way to increase branding for the company.
• An important part of optimizing profit of display advertising is to predict ad click-
through rate (CTR), i.e., what is the probability that a visitor to a web page
clicks on a given ad.
• The dataset, provided by CriteoLabs, which is publicly available, contains data on
millions of ad impressions, characteristics of the impression, and a record of
whether the ad was clicked on, though a substantial number of records have
some degree of missing data.
• We built different types of linear models using variable selection procedures
seeking to maximize the predictive accuracy.
• The results of distinct logistic regression and elastic net regression models were
compared using logloss and AUC.
INTRODUCTION
• The properties of training dataset which contains 13 numeric and 26 categorical
variables were analyzed by histograms, box plots and frequency tables.
• Numeric variables: log transformation were adopted in the model since the data
was highly right skewed.
• Categorical variables: some variables had over 100 thousands of levels, so we
combined the levels below 1% frequency to avoid outliers and too many dummy
variables.
• The Pearson correlation coefficients and interaction plots were checked for
collinearity and two factors interactions.
• Univariate analyses tested the association of one predictor at a time with the
response to shortlist variables for modelling. Here we identified I5, I6, I11, I13
• Over 70% observation had missing values and we used unconditional mean and
median imputation. And the results were compared in the terms of logloss and
AUC. Ideally, the logloss score should be low while the AUC score should be high.
Thus we decided to adopt median imputation to get the full dataset.
• The full dataset was divided into two parts, 70% as training dataset and 30% as
testing dataset. In this way we could avoid over fitting, and also we could check
and compare different models since the actual response was available here.
• Three link functions, logit, probit and cloglog, were compared with the main
effects model. Clearly, the logit link function gave the best performance when
considering both Logloss and AUC.
DATA ANALYSIS AND IMPUTATION
• LASSO
Least absolute shrinkage and selection operator uses the constraint that 𝛽𝛽 1, the
𝐿𝐿1
-norm of the parameter vector, is no greater than a given value. Equivalently, it
may solve an unconstrained minimization of the least-squares penalty with
𝜆𝜆1 𝛽𝛽 1added.
• Ridge
Ridge regression adds a constraint that 𝛽𝛽 2, the 𝐿𝐿2
-norm of the parameter vector,
is not greater than a given value. Equivalently, it may solve an unconstrained
minimization of the least-squares penalty with 𝜆𝜆2 𝛽𝛽 2added.
• Elastic net
The elastic net method includes the LASSO (α = 0) and Ridge (α = 1) regression.
Here 𝜆𝜆 is the regularization parameter; changing the regularization parameter
allows us to directly balance the bias-variance tradeoff. When it is large enough,
the constraint has no effect and the solution is just the usual multiple linear least
squares regression of y. However when for smaller values the solutions are shrunken
versions of the least squares estimates.
ELASTIC NET REGRESSION
Logistic regression is used to predict the binomial outcome of a response variable
using one or several predictor variables. The predictors can be binomial,
categorical, or numerical. It is a way to map a continuous function of predictors to
the probability of a binary response from 0 to 1.
The main effects model (logit link function):
Here, are estimated coefficients for predictors. Adding 4 to the logarithm to
avoid negative values.
Two factors Interaction (logit link function):
Here,
Prediction (logit link function):
LOGISTIC REGRESSION RESULTS
• Imputation:
In this setting, median imputation performed better than mean imputation for the
numeric variables. Actually, mean imputation created new values for integer
predictor variables which might lead to a worse prediction.
• Link function:
Out of the three link functions, the logit link function performed better than the
probit and cloglog link functions when considering both Logloss and AUC.
• Logistic regression:
Two factors interaction analysis added one interaction term which related to the
significant variables (I5, I6, I11 and I13) in the main effects model. Then the Logloss
and AUC were calculated to compare the different new models. Both criterions
indicated that the I5*I11 interaction increased the model performance.
MODEL SELECTION
• Best model for logistic regression:
The Logloss is 0.4997, and AUC is 0.7371.
• Best model for elastic net regression:
𝛼𝛼 = 0.25, λ =0.001. The Logloss is 0.5131, and AUC is 0.7199.
Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, 60660
Fan Yang, Hyunyong Cho, Earvin Balderama, Gregory J. Matthews
CriteoLabs Display Advertising Challenge
• Criterions
Logloss: the lower the better
AUC: the higher the better
In an ROC curve the positive rate is plotted in function of the false positive rate for
different cut-off points of a parameter. The area under the ROC curve (AUC) is a
measure of how well a parameter can distinguish between two response groups
(i.e. click or no click).
• Stepwise model selection
1. Start with an intercept only model.
2. Do a forward selection step.
3. Do a backward selection step.
4. Repeat until no further variable can be added to the model or if the variable just
entered into the model is the only one eliminated in the subsequent backward
elimination.
( )
( )
( ) ( ) ( )0 1 1 2 2 13 13 14 1 15 2 39 26
ˆ
ˆ ˆ ˆ ˆ ˆ ˆ ˆlog log 4 log 4 log 4
ˆ1
P Click
I I I C C C
P Click
β β β β β β β
 
= + + + + + + + + + + +  − 
 
1 39
ˆ ˆ, ,β β
( )
( )
( )0 1 1 39 26 40
ˆ
ˆ ˆ ˆ ˆlog log 4
ˆ1
P Click
I C XY
P Click
β β β β
 
= + + + + +  − 

( ), {log 4 , }, 1 13, 1 26,i jX Y I C i j X Y∈ + = = ≠ 
( )
( )
( )
0 1 1 39 26 40
0 1 1 39 26 40
ˆ ˆ ˆ ˆlog 4
ˆ ˆ ˆ ˆlog 4
ˆ
1
I C XY
I C XY
e
P Click
e
β β β β
β β β β
+ + + + +
+ + + + +
=
+


1
1
log [ log( ) (1 )log(1 )]
N
i i i i
i
loss y p y p
N =
=− + − −∑
1 22 1 2
ˆ arg min( (1 ) )y X
β
β β α λ β αλ β= − + − +
Main effects + 1 Two-Factors Interaction
LoglossAUC
• Elastic net regression: 𝜶𝜶
In R glmnet package, the parameter 𝛼𝛼 = 0 stands for LASSO while 𝛼𝛼 = 1 stands for
ridge. LASSO lead to the highest Logloss while Ridge performed relatively better.
The lowest Logloss reached at 𝛼𝛼 = 0.25.
• Elastic net regression: 𝝀𝝀
Within each 𝛼𝛼, the Logloss was calculated when λ was changing from 0 to 100. As
expected, smaller λ, associated with a more constrained set of possible solutions,
performed better in this setting.
CONCLUSION
( )
( )
( ) ( ) ( )0 1 1 39 26 40 5 11
ˆ
ˆ ˆ ˆ ˆlog log 4 log 4 log 4
ˆ1
P Click
I C I I
P Click
β β β β
 
= + + + + + + × +  − 


Poster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Poster

Similar to Poster (20)

Poster