3. Overview
● RTB quick intro
● ML competitions
○ Criteo Ad Placement Challenge
○ Outbrain Click Prediction
● Real Life
4. Real Time Bidding
● Run the auction or not?
● Price
● Which DSPs to send?
Publisher SSP DSP Advertizer
● Bid or not?
● How much?
● Which campaign to show?
Click Prediction
6. Criteo Ad Placement Challenge
● Create a policy
● Scores candidates
● Pick best scored
Creating the policy?
● From historical data!
● Record everything
● Train the model
● Deploy to production
https://www.crowdai.org/challenges/nips-17-workshop-criteo-ad-placement-challenge
9. FTRL = Follow The Regularized Leader
● Simple online linear model
● “Ad Click Prediction: a View from the Trenches” by McMahan et al*
● Good for sparse problems
● Logreg from LIBLINEAR too slow
● Used my own implementation: libftrl-python**
* https://ai.google/research/pubs/pub41159
** https://github.com/alexeygrigorev/libftrl-python
12. Ensembling
● First level models:
○ Strong CTR features
○ SVM, FTRL
○ XGB, ET on CTR features
○ FFM on base+leaf features
○ Rank features
● Second level:
○ XGBoost on ½ of data
○ Pairwise Loss (LambdaMART)
13. FM & FFM
● FM - Factorization Machines
● Allows to model all quadratic interactions (~poly kernel in SVM)
○ Interactions: outer product of a low-rank matrix with itself
○ Better than explicit modeling of interactions for sparse data
○ ~ Like in SVD or ALS (two low-rank matrices)
○ Allow to add arbitrary features
● “Factorization Machines” by S. Rendle
● FFM: Field-Aware FM
○ Interactions: not everything with everything, limit to fields
○ Suppose have 3 fields: F1, F2, F3 (user, doc_on, doc_ad)
○ Factorize (F1, F2), (F2, F3), (F1, F3) into separate latent spaces
○ LibFFM - implementation https://github.com/guestwalk/libffm/
○ Wrapper - https://github.com/alexeygrigorev/libffm-python
14. Leaf Features
● Consider 3 trees:
● Generate 3 features: tree1=4, tree2=7, tree3=6
● Add them to the old feature set, train FTRL, FFM, etc
Source: http://www.csie.ntu.edu.tw/%7Er01922136/kaggle-2014-criteo.pdf, slide 9
15. FFM + XGB Leaves
● Train FFM on the following features:
-1 |u cb8c55702adb93
|p p_3
|g US US_SC US_SC_519
|t dow_1 hour_4
|a d_938164 src_5802 pub_0
|o d_379743 src_6482 pub_24
|x leaf_0_65 leaf_1_90 leaf_2_102 ...
20. User Profiles
More features:
● Apps user has + campaign app
● Genres of apps + campaign
● Activity (when clicks)
SSP
DSP
SSP
SSP
SSP
DB
device_a: [app1, app2]
device_b: [app1]
device_c: [app3, app4]
21. Precompute everything!
Traffic features Device + campaign features
online offline
DB
device_a: -5.2 -3.0 -4.0
device_b: -3.1 -4.1 -3.4
...
campaign_a
campaign_b
campaign_c
-3 + 0.3 +Serving time:
device_bbias
22. Are competitions useful?
Yes!
● Research advancement
● Playground to test ideas
● Tools and libraries
● Inspiration
● Learning (a lot!)
● Modeling is quite important
● Visibility & self-branding
● Can talk about them :)