SlideShare a Scribd company logo
1 of 1
Download to read offline
RESEARCH POSTER PRESENTATION DESIGN © 2012
www.PosterPresentations.com
• Display advertising is graphical advertising on websites that appears next to
content on web pages, instant message (IM) applications, email, etc. According
to recent reports by Forbes.com, 90% of ad agencies and marketers believe that
display ads are a great way to increase branding for the company.
• An important part of optimizing profit of display advertising is to predict ad click-
through rate (CTR), i.e., what is the probability that a visitor to a web page
clicks on a given ad.
• The dataset, provided by CriteoLabs, which is publicly available, contains data on
millions of ad impressions, characteristics of the impression, and a record of
whether the ad was clicked on, though a substantial number of records have
some degree of missing data.
• We built different types of linear models using variable selection procedures
seeking to maximize the predictive accuracy.
• The results of distinct logistic regression and elastic net regression models were
compared using logloss and AUC.
INTRODUCTION
• The properties of training dataset which contains 13 numeric and 26 categorical
variables were analyzed by histograms, box plots and frequency tables.
• Numeric variables: log transformation were adopted in the model since the data
was highly right skewed.
• Categorical variables: some variables had over 100 thousands of levels, so we
combined the levels below 1% frequency to avoid outliers and too many dummy
variables.
• The Pearson correlation coefficients and interaction plots were checked for
collinearity and two factors interactions.
• Univariate analyses tested the association of one predictor at a time with the
response to shortlist variables for modelling. Here we identified I5, I6, I11, I13
• Over 70% observation had missing values and we used unconditional mean and
median imputation. And the results were compared in the terms of logloss and
AUC. Ideally, the logloss score should be low while the AUC score should be high.
Thus we decided to adopt median imputation to get the full dataset.
• The full dataset was divided into two parts, 70% as training dataset and 30% as
testing dataset. In this way we could avoid over fitting, and also we could check
and compare different models since the actual response was available here.
• Three link functions, logit, probit and cloglog, were compared with the main
effects model. Clearly, the logit link function gave the best performance when
considering both Logloss and AUC.
DATA ANALYSIS AND IMPUTATION
• LASSO
Least absolute shrinkage and selection operator uses the constraint that 𝛽𝛽 1, the
𝐿𝐿1
-norm of the parameter vector, is no greater than a given value. Equivalently, it
may solve an unconstrained minimization of the least-squares penalty with
𝜆𝜆1 𝛽𝛽 1added.
• Ridge
Ridge regression adds a constraint that 𝛽𝛽 2, the 𝐿𝐿2
-norm of the parameter vector,
is not greater than a given value. Equivalently, it may solve an unconstrained
minimization of the least-squares penalty with 𝜆𝜆2 𝛽𝛽 2added.
• Elastic net
The elastic net method includes the LASSO (α = 0) and Ridge (α = 1) regression.
Here 𝜆𝜆 is the regularization parameter; changing the regularization parameter
allows us to directly balance the bias-variance tradeoff. When it is large enough,
the constraint has no effect and the solution is just the usual multiple linear least
squares regression of y. However when for smaller values the solutions are shrunken
versions of the least squares estimates.
ELASTIC NET REGRESSION
Logistic regression is used to predict the binomial outcome of a response variable
using one or several predictor variables. The predictors can be binomial,
categorical, or numerical. It is a way to map a continuous function of predictors to
the probability of a binary response from 0 to 1.
The main effects model (logit link function):
Here, are estimated coefficients for predictors. Adding 4 to the logarithm to
avoid negative values.
Two factors Interaction (logit link function):
Here,
Prediction (logit link function):
LOGISTIC REGRESSION RESULTS
• Imputation:
In this setting, median imputation performed better than mean imputation for the
numeric variables. Actually, mean imputation created new values for integer
predictor variables which might lead to a worse prediction.
• Link function:
Out of the three link functions, the logit link function performed better than the
probit and cloglog link functions when considering both Logloss and AUC.
• Logistic regression:
Two factors interaction analysis added one interaction term which related to the
significant variables (I5, I6, I11 and I13) in the main effects model. Then the Logloss
and AUC were calculated to compare the different new models. Both criterions
indicated that the I5*I11 interaction increased the model performance.
MODEL SELECTION
• Best model for logistic regression:
The Logloss is 0.4997, and AUC is 0.7371.
• Best model for elastic net regression:
𝛼𝛼 = 0.25, λ =0.001. The Logloss is 0.5131, and AUC is 0.7199.
Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, 60660
Fan Yang, Hyunyong Cho, Earvin Balderama, Gregory J. Matthews
CriteoLabs Display Advertising Challenge
• Criterions
Logloss: the lower the better
AUC: the higher the better
In an ROC curve the positive rate is plotted in function of the false positive rate for
different cut-off points of a parameter. The area under the ROC curve (AUC) is a
measure of how well a parameter can distinguish between two response groups
(i.e. click or no click).
• Stepwise model selection
1. Start with an intercept only model.
2. Do a forward selection step.
3. Do a backward selection step.
4. Repeat until no further variable can be added to the model or if the variable just
entered into the model is the only one eliminated in the subsequent backward
elimination.
( )
( )
( ) ( ) ( )0 1 1 2 2 13 13 14 1 15 2 39 26
ˆ
ˆ ˆ ˆ ˆ ˆ ˆ ˆlog log 4 log 4 log 4
ˆ1
P Click
I I I C C C
P Click
β β β β β β β
 
= + + + + + + + + + + +  − 
 
1 39
ˆ ˆ, ,β β
( )
( )
( )0 1 1 39 26 40
ˆ
ˆ ˆ ˆ ˆlog log 4
ˆ1
P Click
I C XY
P Click
β β β β
 
= + + + + +  − 

( ), {log 4 , }, 1 13, 1 26,i jX Y I C i j X Y∈ + = = ≠ 
( )
( )
( )
0 1 1 39 26 40
0 1 1 39 26 40
ˆ ˆ ˆ ˆlog 4
ˆ ˆ ˆ ˆlog 4
ˆ
1
I C XY
I C XY
e
P Click
e
β β β β
β β β β
+ + + + +
+ + + + +
=
+


1
1
log [ log( ) (1 )log(1 )]
N
i i i i
i
loss y p y p
N =
=− + − −∑
1 22 1 2
ˆ arg min( (1 ) )y X
β
β β α λ β αλ β= − + − +
Main effects + 1 Two-Factors Interaction
LoglossAUC
• Elastic net regression: 𝜶𝜶
In R glmnet package, the parameter 𝛼𝛼 = 0 stands for LASSO while 𝛼𝛼 = 1 stands for
ridge. LASSO lead to the highest Logloss while Ridge performed relatively better.
The lowest Logloss reached at 𝛼𝛼 = 0.25.
• Elastic net regression: 𝝀𝝀
Within each 𝛼𝛼, the Logloss was calculated when λ was changing from 0 to 100. As
expected, smaller λ, associated with a more constrained set of possible solutions,
performed better in this setting.
CONCLUSION
( )
( )
( ) ( ) ( )0 1 1 39 26 40 5 11
ˆ
ˆ ˆ ˆ ˆlog log 4 log 4 log 4
ˆ1
P Click
I C I I
P Click
β β β β
 
= + + + + + + × +  − 


More Related Content

What's hot

Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and RegressionShubham Mehta
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)MikeBlyth
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regressionA M
 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupNeelam Zafar
 
Murach : How to develop a single-page MVC web
Murach : How to develop a single-page MVC web Murach : How to develop a single-page MVC web
Murach : How to develop a single-page MVC web MahmoudOHassouna
 
Linear Regression in R
Linear Regression in RLinear Regression in R
Linear Regression in REdureka!
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionMohit Asija
 
Hypothesis Test Selection Guide
Hypothesis Test Selection GuideHypothesis Test Selection Guide
Hypothesis Test Selection GuideLeanleaders.org
 
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliRegression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliAkanksha Bali
 
Reporting single sample z-test for proportions
Reporting single sample z-test for proportionsReporting single sample z-test for proportions
Reporting single sample z-test for proportionsKen Plummer
 
Application of ordinal logistic regression in the study of students’ performance
Application of ordinal logistic regression in the study of students’ performanceApplication of ordinal logistic regression in the study of students’ performance
Application of ordinal logistic regression in the study of students’ performanceAlexander Decker
 
Hypothesis testing chi square test for independence
Hypothesis testing chi square test for independenceHypothesis testing chi square test for independence
Hypothesis testing chi square test for independenceNadeem Uddin
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
 
Is a parametric or nonparametric method appropriate with relationship-oriente...
Is a parametric or nonparametric method appropriate with relationship-oriente...Is a parametric or nonparametric method appropriate with relationship-oriente...
Is a parametric or nonparametric method appropriate with relationship-oriente...Ken Plummer
 
Effect size presentation.rob
Effect size presentation.robEffect size presentation.rob
Effect size presentation.robRob Darrow
 
Ancova and Mancova
Ancova and MancovaAncova and Mancova
Ancova and MancovaPrum Rotana
 

What's hot (20)

Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)
 
Ordinal Logistic Regression
Ordinal Logistic RegressionOrdinal Logistic Regression
Ordinal Logistic Regression
 
Contingency Tables
Contingency TablesContingency Tables
Contingency Tables
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regression
 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & group
 
Murach : How to develop a single-page MVC web
Murach : How to develop a single-page MVC web Murach : How to develop a single-page MVC web
Murach : How to develop a single-page MVC web
 
Linear Regression in R
Linear Regression in RLinear Regression in R
Linear Regression in R
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Hypothesis Test Selection Guide
Hypothesis Test Selection GuideHypothesis Test Selection Guide
Hypothesis Test Selection Guide
 
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliRegression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
 
Reporting single sample z-test for proportions
Reporting single sample z-test for proportionsReporting single sample z-test for proportions
Reporting single sample z-test for proportions
 
Application of ordinal logistic regression in the study of students’ performance
Application of ordinal logistic regression in the study of students’ performanceApplication of ordinal logistic regression in the study of students’ performance
Application of ordinal logistic regression in the study of students’ performance
 
Mediation analysis
Mediation analysisMediation analysis
Mediation analysis
 
Hypothesis testing chi square test for independence
Hypothesis testing chi square test for independenceHypothesis testing chi square test for independence
Hypothesis testing chi square test for independence
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
 
Is a parametric or nonparametric method appropriate with relationship-oriente...
Is a parametric or nonparametric method appropriate with relationship-oriente...Is a parametric or nonparametric method appropriate with relationship-oriente...
Is a parametric or nonparametric method appropriate with relationship-oriente...
 
Simple Linear Regression
Simple Linear RegressionSimple Linear Regression
Simple Linear Regression
 
Effect size presentation.rob
Effect size presentation.robEffect size presentation.rob
Effect size presentation.rob
 
Ancova and Mancova
Ancova and MancovaAncova and Mancova
Ancova and Mancova
 

Viewers also liked

Have Kinect Games For Kids
Have Kinect Games For Kids
Have Kinect Games For Kids
Have Kinect Games For Kids gameany8
 
Empresas detectan solo uno de cada diez casos de violencia contra la mujer
Empresas detectan solo uno de cada diez casos de violencia contra la mujer Empresas detectan solo uno de cada diez casos de violencia contra la mujer
Empresas detectan solo uno de cada diez casos de violencia contra la mujer Perú 2021
 
Ed Sczesnak Assignment #5
Ed Sczesnak Assignment #5Ed Sczesnak Assignment #5
Ed Sczesnak Assignment #5Edward Sczesnak
 
fotos editadas en gimp de gisela cachimuel y melisa jetacama
fotos editadas en gimp de gisela cachimuel y melisa jetacamafotos editadas en gimp de gisela cachimuel y melisa jetacama
fotos editadas en gimp de gisela cachimuel y melisa jetacamaAliss Gisel Love
 
Presentación1
Presentación1Presentación1
Presentación1jonar2015
 
presentación 1
presentación 1presentación 1
presentación 1cmiguelez
 
Glosario de términos importantes sobre páginas y herramientas
Glosario de términos importantes sobre páginas y herramientasGlosario de términos importantes sobre páginas y herramientas
Glosario de términos importantes sobre páginas y herramientasIngrid Julieth Silva
 
Exposición: Manuel payno
Exposición: Manuel paynoExposición: Manuel payno
Exposición: Manuel paynoDiana Carrillo
 
Unidad educativa imantag
Unidad educativa imantagUnidad educativa imantag
Unidad educativa imantagLûis Chavz
 
Hardware
HardwareHardware
HardwareYanys23
 
LearnEnjoy : les Apps des enfants extra-ordinaires
LearnEnjoy : les Apps des enfants extra-ordinairesLearnEnjoy : les Apps des enfants extra-ordinaires
LearnEnjoy : les Apps des enfants extra-ordinairesGaele Regnault
 
Der deroma 5.2.2.1.1.2
Der deroma 5.2.2.1.1.2Der deroma 5.2.2.1.1.2
Der deroma 5.2.2.1.1.2uceuss
 
La web 2.0
La web 2.0La web 2.0
La web 2.0sonlogo
 
Cronica
CronicaCronica
Cronicaoluveh
 

Viewers also liked (20)

Have Kinect Games For Kids
Have Kinect Games For Kids
Have Kinect Games For Kids
Have Kinect Games For Kids
 
Empresas detectan solo uno de cada diez casos de violencia contra la mujer
Empresas detectan solo uno de cada diez casos de violencia contra la mujer Empresas detectan solo uno de cada diez casos de violencia contra la mujer
Empresas detectan solo uno de cada diez casos de violencia contra la mujer
 
Ed Sczesnak Assignment #5
Ed Sczesnak Assignment #5Ed Sczesnak Assignment #5
Ed Sczesnak Assignment #5
 
Tabla actividad tpack lasozu
Tabla actividad tpack lasozuTabla actividad tpack lasozu
Tabla actividad tpack lasozu
 
fotos editadas en gimp de gisela cachimuel y melisa jetacama
fotos editadas en gimp de gisela cachimuel y melisa jetacamafotos editadas en gimp de gisela cachimuel y melisa jetacama
fotos editadas en gimp de gisela cachimuel y melisa jetacama
 
Luis de gongora expo
Luis de gongora expoLuis de gongora expo
Luis de gongora expo
 
Presentación1
Presentación1Presentación1
Presentación1
 
presentación 1
presentación 1presentación 1
presentación 1
 
Glosario de términos importantes sobre páginas y herramientas
Glosario de términos importantes sobre páginas y herramientasGlosario de términos importantes sobre páginas y herramientas
Glosario de términos importantes sobre páginas y herramientas
 
Exposición: Manuel payno
Exposición: Manuel paynoExposición: Manuel payno
Exposición: Manuel payno
 
Unidad educativa imantag
Unidad educativa imantagUnidad educativa imantag
Unidad educativa imantag
 
PTFC Company Presentation
PTFC Company PresentationPTFC Company Presentation
PTFC Company Presentation
 
Hardware
HardwareHardware
Hardware
 
LearnEnjoy : les Apps des enfants extra-ordinaires
LearnEnjoy : les Apps des enfants extra-ordinairesLearnEnjoy : les Apps des enfants extra-ordinaires
LearnEnjoy : les Apps des enfants extra-ordinaires
 
¿Qué es un blog?
¿Qué es un blog?¿Qué es un blog?
¿Qué es un blog?
 
Der deroma 5.2.2.1.1.2
Der deroma 5.2.2.1.1.2Der deroma 5.2.2.1.1.2
Der deroma 5.2.2.1.1.2
 
El mundo de google
El mundo de googleEl mundo de google
El mundo de google
 
04 la celula
04 la celula04 la celula
04 la celula
 
La web 2.0
La web 2.0La web 2.0
La web 2.0
 
Cronica
CronicaCronica
Cronica
 

Similar to Poster

Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationAsadJaved304231
 
Logistic Regression.pptx
Logistic Regression.pptxLogistic Regression.pptx
Logistic Regression.pptxMuskaan194530
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
Business Analytics Foundation with R tools - Part 2
Business Analytics Foundation with R tools - Part 2Business Analytics Foundation with R tools - Part 2
Business Analytics Foundation with R tools - Part 2Beamsync
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakashShivaram Prakash
 
Machine-Learning-with-Ridge-and-Lasso-Regression.pdf
Machine-Learning-with-Ridge-and-Lasso-Regression.pdfMachine-Learning-with-Ridge-and-Lasso-Regression.pdf
Machine-Learning-with-Ridge-and-Lasso-Regression.pdfAyadIliass
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.pptTanyaWadhwani4
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptxarsh260174
 
Regression Analysis Techniques.pptx
Regression Analysis Techniques.pptxRegression Analysis Techniques.pptx
Regression Analysis Techniques.pptxYutaItadori
 
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithminventionjournals
 
Beyond Classification and Ranking: Constrained Optimization of the ROI
Beyond Classification and Ranking: Constrained Optimization of the ROIBeyond Classification and Ranking: Constrained Optimization of the ROI
Beyond Classification and Ranking: Constrained Optimization of the ROInkaf61
 

Similar to Poster (20)

Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical information
 
Logistic Regression.pptx
Logistic Regression.pptxLogistic Regression.pptx
Logistic Regression.pptx
 
Six sigma
Six sigma Six sigma
Six sigma
 
Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
 
Logistical Regression.pptx
Logistical Regression.pptxLogistical Regression.pptx
Logistical Regression.pptx
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
MF Presentation.pptx
MF Presentation.pptxMF Presentation.pptx
MF Presentation.pptx
 
report
reportreport
report
 
Business Analytics Foundation with R tools - Part 2
Business Analytics Foundation with R tools - Part 2Business Analytics Foundation with R tools - Part 2
Business Analytics Foundation with R tools - Part 2
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
 
Machine-Learning-with-Ridge-and-Lasso-Regression.pdf
Machine-Learning-with-Ridge-and-Lasso-Regression.pdfMachine-Learning-with-Ridge-and-Lasso-Regression.pdf
Machine-Learning-with-Ridge-and-Lasso-Regression.pdf
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Ai saturdays presentation
Ai saturdays presentationAi saturdays presentation
Ai saturdays presentation
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
 
Regression Analysis Techniques.pptx
Regression Analysis Techniques.pptxRegression Analysis Techniques.pptx
Regression Analysis Techniques.pptx
 
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
 
Beyond Classification and Ranking: Constrained Optimization of the ROI
Beyond Classification and Ranking: Constrained Optimization of the ROIBeyond Classification and Ranking: Constrained Optimization of the ROI
Beyond Classification and Ranking: Constrained Optimization of the ROI
 

Poster

  • 1. RESEARCH POSTER PRESENTATION DESIGN © 2012 www.PosterPresentations.com • Display advertising is graphical advertising on websites that appears next to content on web pages, instant message (IM) applications, email, etc. According to recent reports by Forbes.com, 90% of ad agencies and marketers believe that display ads are a great way to increase branding for the company. • An important part of optimizing profit of display advertising is to predict ad click- through rate (CTR), i.e., what is the probability that a visitor to a web page clicks on a given ad. • The dataset, provided by CriteoLabs, which is publicly available, contains data on millions of ad impressions, characteristics of the impression, and a record of whether the ad was clicked on, though a substantial number of records have some degree of missing data. • We built different types of linear models using variable selection procedures seeking to maximize the predictive accuracy. • The results of distinct logistic regression and elastic net regression models were compared using logloss and AUC. INTRODUCTION • The properties of training dataset which contains 13 numeric and 26 categorical variables were analyzed by histograms, box plots and frequency tables. • Numeric variables: log transformation were adopted in the model since the data was highly right skewed. • Categorical variables: some variables had over 100 thousands of levels, so we combined the levels below 1% frequency to avoid outliers and too many dummy variables. • The Pearson correlation coefficients and interaction plots were checked for collinearity and two factors interactions. • Univariate analyses tested the association of one predictor at a time with the response to shortlist variables for modelling. Here we identified I5, I6, I11, I13 • Over 70% observation had missing values and we used unconditional mean and median imputation. And the results were compared in the terms of logloss and AUC. Ideally, the logloss score should be low while the AUC score should be high. Thus we decided to adopt median imputation to get the full dataset. • The full dataset was divided into two parts, 70% as training dataset and 30% as testing dataset. In this way we could avoid over fitting, and also we could check and compare different models since the actual response was available here. • Three link functions, logit, probit and cloglog, were compared with the main effects model. Clearly, the logit link function gave the best performance when considering both Logloss and AUC. DATA ANALYSIS AND IMPUTATION • LASSO Least absolute shrinkage and selection operator uses the constraint that 𝛽𝛽 1, the 𝐿𝐿1 -norm of the parameter vector, is no greater than a given value. Equivalently, it may solve an unconstrained minimization of the least-squares penalty with 𝜆𝜆1 𝛽𝛽 1added. • Ridge Ridge regression adds a constraint that 𝛽𝛽 2, the 𝐿𝐿2 -norm of the parameter vector, is not greater than a given value. Equivalently, it may solve an unconstrained minimization of the least-squares penalty with 𝜆𝜆2 𝛽𝛽 2added. • Elastic net The elastic net method includes the LASSO (α = 0) and Ridge (α = 1) regression. Here 𝜆𝜆 is the regularization parameter; changing the regularization parameter allows us to directly balance the bias-variance tradeoff. When it is large enough, the constraint has no effect and the solution is just the usual multiple linear least squares regression of y. However when for smaller values the solutions are shrunken versions of the least squares estimates. ELASTIC NET REGRESSION Logistic regression is used to predict the binomial outcome of a response variable using one or several predictor variables. The predictors can be binomial, categorical, or numerical. It is a way to map a continuous function of predictors to the probability of a binary response from 0 to 1. The main effects model (logit link function): Here, are estimated coefficients for predictors. Adding 4 to the logarithm to avoid negative values. Two factors Interaction (logit link function): Here, Prediction (logit link function): LOGISTIC REGRESSION RESULTS • Imputation: In this setting, median imputation performed better than mean imputation for the numeric variables. Actually, mean imputation created new values for integer predictor variables which might lead to a worse prediction. • Link function: Out of the three link functions, the logit link function performed better than the probit and cloglog link functions when considering both Logloss and AUC. • Logistic regression: Two factors interaction analysis added one interaction term which related to the significant variables (I5, I6, I11 and I13) in the main effects model. Then the Logloss and AUC were calculated to compare the different new models. Both criterions indicated that the I5*I11 interaction increased the model performance. MODEL SELECTION • Best model for logistic regression: The Logloss is 0.4997, and AUC is 0.7371. • Best model for elastic net regression: 𝛼𝛼 = 0.25, λ =0.001. The Logloss is 0.5131, and AUC is 0.7199. Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, 60660 Fan Yang, Hyunyong Cho, Earvin Balderama, Gregory J. Matthews CriteoLabs Display Advertising Challenge • Criterions Logloss: the lower the better AUC: the higher the better In an ROC curve the positive rate is plotted in function of the false positive rate for different cut-off points of a parameter. The area under the ROC curve (AUC) is a measure of how well a parameter can distinguish between two response groups (i.e. click or no click). • Stepwise model selection 1. Start with an intercept only model. 2. Do a forward selection step. 3. Do a backward selection step. 4. Repeat until no further variable can be added to the model or if the variable just entered into the model is the only one eliminated in the subsequent backward elimination. ( ) ( ) ( ) ( ) ( )0 1 1 2 2 13 13 14 1 15 2 39 26 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆlog log 4 log 4 log 4 ˆ1 P Click I I I C C C P Click β β β β β β β   = + + + + + + + + + + +  −    1 39 ˆ ˆ, ,β β ( ) ( ) ( )0 1 1 39 26 40 ˆ ˆ ˆ ˆ ˆlog log 4 ˆ1 P Click I C XY P Click β β β β   = + + + + +  −   ( ), {log 4 , }, 1 13, 1 26,i jX Y I C i j X Y∈ + = = ≠  ( ) ( ) ( ) 0 1 1 39 26 40 0 1 1 39 26 40 ˆ ˆ ˆ ˆlog 4 ˆ ˆ ˆ ˆlog 4 ˆ 1 I C XY I C XY e P Click e β β β β β β β β + + + + + + + + + + = +   1 1 log [ log( ) (1 )log(1 )] N i i i i i loss y p y p N = =− + − −∑ 1 22 1 2 ˆ arg min( (1 ) )y X β β β α λ β αλ β= − + − + Main effects + 1 Two-Factors Interaction LoglossAUC • Elastic net regression: 𝜶𝜶 In R glmnet package, the parameter 𝛼𝛼 = 0 stands for LASSO while 𝛼𝛼 = 1 stands for ridge. LASSO lead to the highest Logloss while Ridge performed relatively better. The lowest Logloss reached at 𝛼𝛼 = 0.25. • Elastic net regression: 𝝀𝝀 Within each 𝛼𝛼, the Logloss was calculated when λ was changing from 0 to 100. As expected, smaller λ, associated with a more constrained set of possible solutions, performed better in this setting. CONCLUSION ( ) ( ) ( ) ( ) ( )0 1 1 39 26 40 5 11 ˆ ˆ ˆ ˆ ˆlog log 4 log 4 log 4 ˆ1 P Click I C I I P Click β β β β   = + + + + + + × +  −  