SlideShare a Scribd company logo
1 of 24
Download to read offline
Click Prediction
Kaggle Competitions vs Reality
Alexey Grigorev
14.03.2019
About me
Software Engineer BI Masters @ TU Berlin Data Scientist
Overview
● RTB quick intro
● ML competitions
○ Criteo Ad Placement Challenge
○ Outbrain Click Prediction
● Real Life
Real Time Bidding
● Run the auction or not?
● Price
● Which DSPs to send?
Publisher SSP DSP Advertizer
● Bid or not?
● How much?
● Which campaign to show?
Click Prediction
ML Competitions
● Kaggle
● CrowdAI
● TopCoder
● Codalab
● Conference workshops
○ KDD, RecSys, WSDM, etc
● Many others
Criteo Ad Placement Challenge
● Create a policy
● Scores candidates
● Pick best scored
Creating the policy?
● From historical data!
● Record everything
● Train the model
● Deploy to production
https://www.crowdai.org/challenges/nips-17-workshop-criteo-ad-placement-challenge
Candidate set
Click
Anonymised features
Data (from Criteo)
Approach
Binary OHE (present/absent)
Only first rows with target info
FTRL = Follow The Regularized Leader
● Simple online linear model
● “Ad Click Prediction: a View from the Trenches” by McMahan et al*
● Good for sparse problems
● Logreg from LIBLINEAR too slow
● Used my own implementation: libftrl-python**
* https://ai.google/research/pubs/pub41159
** https://github.com/alexeygrigorev/libftrl-python
https://www.kaggle.com/c/outbrain-click-prediction
Outbrain Click Prediction
Ads by outbrain
Data 2b 87m + 32m23m 600k
3m
Ensembling
● First level models:
○ Strong CTR features
○ SVM, FTRL
○ XGB, ET on CTR features
○ FFM on base+leaf features
○ Rank features
● Second level:
○ XGBoost on ½ of data
○ Pairwise Loss (LambdaMART)
FM & FFM
● FM - Factorization Machines
● Allows to model all quadratic interactions (~poly kernel in SVM)
○ Interactions: outer product of a low-rank matrix with itself
○ Better than explicit modeling of interactions for sparse data
○ ~ Like in SVD or ALS (two low-rank matrices)
○ Allow to add arbitrary features
● “Factorization Machines” by S. Rendle
● FFM: Field-Aware FM
○ Interactions: not everything with everything, limit to fields
○ Suppose have 3 fields: F1, F2, F3 (user, doc_on, doc_ad)
○ Factorize (F1, F2), (F2, F3), (F1, F3) into separate latent spaces
○ LibFFM - implementation https://github.com/guestwalk/libffm/
○ Wrapper - https://github.com/alexeygrigorev/libffm-python
Leaf Features
● Consider 3 trees:
● Generate 3 features: tree1=4, tree2=7, tree3=6
● Add them to the old feature set, train FTRL, FFM, etc
Source: http://www.csie.ntu.edu.tw/%7Er01922136/kaggle-2014-criteo.pdf, slide 9
FFM + XGB Leaves
● Train FFM on the following features:
-1 |u cb8c55702adb93
|p p_3
|g US US_SC US_SC_519
|t dow_1 hour_4
|a d_938164 src_5802 pub_0
|o d_379743 src_6482 pub_24
|x leaf_0_65 leaf_1_90 leaf_2_102 ...
Reality
https://en.wikipedia.org/wiki/CRISP-DM
I will take my model from
kaggle and deploy to
production
Reality in Ad-Tech
● Response Time! <= 100 ms
● High QPS (30k-100k per second)
● Billions of events and TBs of data per day
● Very sparse
{
"app": {
"cat": ["IAB14"],
"id": "10000001", "name": "Demo_US_480x80",
"publisher": { "id": "100000001", "name": "Demo" }
},
"device": {
"connectiontype": 0, "devicetype": 1,
"ifa": "e4273e31-97a9-4b29-93a8-8a99f0cea068",
"geo": {
"country": "USA", "lat": 29.8327, "lon": -95.6627,
"type": 1, "zip": "77084"
},
"ip": "172.56.14.6",
"make": "Generic", "model": "Windows Phone 8",
"os": "Windows Phone OS", "osv": "8"
},
"user": {
"gender": "M", "yob": 1976
}
}
● App: name, category, publisher
● Device: device id, type, connection
● Device: make, model, os
● Geo: country, lat/lon
● User: gender, yob
Bid Requests
https://wiki.smaato.com/pages/viewpage.action?pageId=3670020
OHE ⇒ model.predict(X)
User Profiles
More features:
● Apps user has + campaign app
● Genres of apps + campaign
● Activity (when clicks)
SSP
DSP
SSP
SSP
SSP
DB
device_a: [app1, app2]
device_b: [app1]
device_c: [app3, app4]
Precompute everything!
Traffic features Device + campaign features
online offline
DB
device_a: -5.2 -3.0 -4.0
device_b: -3.1 -4.1 -3.4
...
campaign_a
campaign_b
campaign_c
-3 + 0.3 +Serving time:
device_bbias
Are competitions useful?
Yes!
● Research advancement
● Playground to test ideas
● Tools and libraries
● Inspiration
● Learning (a lot!)
● Modeling is quite important
● Visibility & self-branding
● Can talk about them :)
Contact Info
● http://alexeygrigorev.com & contact@alexeygrigorev.com
● https://github.com/alexeygrigorev
○ https://github.com/alexeygrigorev/libffm-python
○ https://github.com/alexeygrigorev/libftrl-python
● https://www.linkedin.com/in/agrigorev
Questions

More Related Content

Similar to Click prediction: kaggle competitions vs real life

BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideLinaro
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI AI Frontiers
 
Big data101kagglepresentation
Big data101kagglepresentationBig data101kagglepresentation
Big data101kagglepresentationAlexandru Sisu
 
Postgres Vision 2018: Will Postgres Live Forever?
Postgres Vision 2018: Will Postgres Live Forever?Postgres Vision 2018: Will Postgres Live Forever?
Postgres Vision 2018: Will Postgres Live Forever?EDB
 
Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Alberto Danese
 
Mastering Machine Learning with Competitions
Mastering Machine Learning with CompetitionsMastering Machine Learning with Competitions
Mastering Machine Learning with CompetitionsJeong-Yoon Lee
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDBArangoDB Database
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series ForecastingBillTubbs
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionPreferred Networks
 
Data science-summit MTL 2015 - The end of IT departments and data-science emp...
Data science-summit MTL 2015 - The end of IT departments and data-science emp...Data science-summit MTL 2015 - The end of IT departments and data-science emp...
Data science-summit MTL 2015 - The end of IT departments and data-science emp...Francis Piéraut
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkDatabricks
 
Serverless? How (not) to develop, deploy and operate serverless applications.
Serverless? How (not) to develop, deploy and operate serverless applications.Serverless? How (not) to develop, deploy and operate serverless applications.
Serverless? How (not) to develop, deploy and operate serverless applications.gjdevos
 
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1TigerGraph
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangVivian S. Zhang
 
Machine learning using TensorFlow on DSX
Machine learning using TensorFlow on DSX Machine learning using TensorFlow on DSX
Machine learning using TensorFlow on DSX Tuhin Mahmud
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloRahul Singh
 

Similar to Click prediction: kaggle competitions vs real life (20)

BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
 
Big data101kagglepresentation
Big data101kagglepresentationBig data101kagglepresentation
Big data101kagglepresentation
 
Postgres Vision 2018: Will Postgres Live Forever?
Postgres Vision 2018: Will Postgres Live Forever?Postgres Vision 2018: Will Postgres Live Forever?
Postgres Vision 2018: Will Postgres Live Forever?
 
Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019
 
Mastering Machine Learning with Competitions
Mastering Machine Learning with CompetitionsMastering Machine Learning with Competitions
Mastering Machine Learning with Competitions
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
 
Data science-summit MTL 2015 - The end of IT departments and data-science emp...
Data science-summit MTL 2015 - The end of IT departments and data-science emp...Data science-summit MTL 2015 - The end of IT departments and data-science emp...
Data science-summit MTL 2015 - The end of IT departments and data-science emp...
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
Serverless? How (not) to develop, deploy and operate serverless applications.
Serverless? How (not) to develop, deploy and operate serverless applications.Serverless? How (not) to develop, deploy and operate serverless applications.
Serverless? How (not) to develop, deploy and operate serverless applications.
 
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 
Machine learning using TensorFlow on DSX
Machine learning using TensorFlow on DSX Machine learning using TensorFlow on DSX
Machine learning using TensorFlow on DSX
 
Drools & jBPM Workshop Barcelona 2013
Drools & jBPM Workshop  Barcelona 2013Drools & jBPM Workshop  Barcelona 2013
Drools & jBPM Workshop Barcelona 2013
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
R for Python Users
R for Python UsersR for Python Users
R for Python Users
 
Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and Accumulo
 

More from Alexey Grigorev

Codementor - Data Science at OLX
Codementor - Data Science at OLX Codementor - Data Science at OLX
Codementor - Data Science at OLX Alexey Grigorev
 
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogsAlexey Grigorev
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introductionAlexey Grigorev
 
AI in Fashion - Size & Fit - Nour Karessli
 AI in Fashion - Size & Fit - Nour Karessli AI in Fashion - Size & Fit - Nour Karessli
AI in Fashion - Size & Fit - Nour KaressliAlexey Grigorev
 
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAlexey Grigorev
 
ML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - KubernetesML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - KubernetesAlexey Grigorev
 
Paradoxes in Data Science
Paradoxes in Data ScienceParadoxes in Data Science
Paradoxes in Data ScienceAlexey Grigorev
 
ML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learningML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learningAlexey Grigorev
 
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble LearningML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble LearningAlexey Grigorev
 
ML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentAlexey Grigorev
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
ML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for ClassificationML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for ClassificationAlexey Grigorev
 
ML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for ClassificationML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for ClassificationAlexey Grigorev
 
ML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office HoursML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office HoursAlexey Grigorev
 
AMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplacesAMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplacesAlexey Grigorev
 
ML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction ProjectML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction ProjectAlexey Grigorev
 

More from Alexey Grigorev (20)

MLOps week 1 intro
MLOps week 1 introMLOps week 1 intro
MLOps week 1 intro
 
Codementor - Data Science at OLX
Codementor - Data Science at OLX Codementor - Data Science at OLX
Codementor - Data Science at OLX
 
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogs
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
 
AI in Fashion - Size & Fit - Nour Karessli
 AI in Fashion - Size & Fit - Nour Karessli AI in Fashion - Size & Fit - Nour Karessli
AI in Fashion - Size & Fit - Nour Karessli
 
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
 
ML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - KubernetesML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - Kubernetes
 
Paradoxes in Data Science
Paradoxes in Data ScienceParadoxes in Data Science
Paradoxes in Data Science
 
ML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learningML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learning
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairness
 
MLOps at OLX
MLOps at OLXMLOps at OLX
MLOps at OLX
 
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble LearningML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
 
ML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deployment
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
ML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for ClassificationML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for Classification
 
ML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for ClassificationML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for Classification
 
ML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office HoursML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office Hours
 
AMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplacesAMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplaces
 
ML Zoomcamp 2 - Slides
ML Zoomcamp 2 - SlidesML Zoomcamp 2 - Slides
ML Zoomcamp 2 - Slides
 
ML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction ProjectML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction Project
 

Recently uploaded

Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...akbard9823
 

Recently uploaded (20)

Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
sasti delhi Call Girls in munirka 🔝 9953056974 🔝 escort Service-
sasti delhi Call Girls in munirka 🔝 9953056974 🔝 escort Service-sasti delhi Call Girls in munirka 🔝 9953056974 🔝 escort Service-
sasti delhi Call Girls in munirka 🔝 9953056974 🔝 escort Service-
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
 
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
 

Click prediction: kaggle competitions vs real life

  • 1. Click Prediction Kaggle Competitions vs Reality Alexey Grigorev 14.03.2019
  • 2. About me Software Engineer BI Masters @ TU Berlin Data Scientist
  • 3. Overview ● RTB quick intro ● ML competitions ○ Criteo Ad Placement Challenge ○ Outbrain Click Prediction ● Real Life
  • 4. Real Time Bidding ● Run the auction or not? ● Price ● Which DSPs to send? Publisher SSP DSP Advertizer ● Bid or not? ● How much? ● Which campaign to show? Click Prediction
  • 5. ML Competitions ● Kaggle ● CrowdAI ● TopCoder ● Codalab ● Conference workshops ○ KDD, RecSys, WSDM, etc ● Many others
  • 6. Criteo Ad Placement Challenge ● Create a policy ● Scores candidates ● Pick best scored Creating the policy? ● From historical data! ● Record everything ● Train the model ● Deploy to production https://www.crowdai.org/challenges/nips-17-workshop-criteo-ad-placement-challenge
  • 8. Approach Binary OHE (present/absent) Only first rows with target info
  • 9. FTRL = Follow The Regularized Leader ● Simple online linear model ● “Ad Click Prediction: a View from the Trenches” by McMahan et al* ● Good for sparse problems ● Logreg from LIBLINEAR too slow ● Used my own implementation: libftrl-python** * https://ai.google/research/pubs/pub41159 ** https://github.com/alexeygrigorev/libftrl-python
  • 11. Data 2b 87m + 32m23m 600k 3m
  • 12. Ensembling ● First level models: ○ Strong CTR features ○ SVM, FTRL ○ XGB, ET on CTR features ○ FFM on base+leaf features ○ Rank features ● Second level: ○ XGBoost on ½ of data ○ Pairwise Loss (LambdaMART)
  • 13. FM & FFM ● FM - Factorization Machines ● Allows to model all quadratic interactions (~poly kernel in SVM) ○ Interactions: outer product of a low-rank matrix with itself ○ Better than explicit modeling of interactions for sparse data ○ ~ Like in SVD or ALS (two low-rank matrices) ○ Allow to add arbitrary features ● “Factorization Machines” by S. Rendle ● FFM: Field-Aware FM ○ Interactions: not everything with everything, limit to fields ○ Suppose have 3 fields: F1, F2, F3 (user, doc_on, doc_ad) ○ Factorize (F1, F2), (F2, F3), (F1, F3) into separate latent spaces ○ LibFFM - implementation https://github.com/guestwalk/libffm/ ○ Wrapper - https://github.com/alexeygrigorev/libffm-python
  • 14. Leaf Features ● Consider 3 trees: ● Generate 3 features: tree1=4, tree2=7, tree3=6 ● Add them to the old feature set, train FTRL, FFM, etc Source: http://www.csie.ntu.edu.tw/%7Er01922136/kaggle-2014-criteo.pdf, slide 9
  • 15. FFM + XGB Leaves ● Train FFM on the following features: -1 |u cb8c55702adb93 |p p_3 |g US US_SC US_SC_519 |t dow_1 hour_4 |a d_938164 src_5802 pub_0 |o d_379743 src_6482 pub_24 |x leaf_0_65 leaf_1_90 leaf_2_102 ...
  • 17. I will take my model from kaggle and deploy to production
  • 18. Reality in Ad-Tech ● Response Time! <= 100 ms ● High QPS (30k-100k per second) ● Billions of events and TBs of data per day ● Very sparse
  • 19. { "app": { "cat": ["IAB14"], "id": "10000001", "name": "Demo_US_480x80", "publisher": { "id": "100000001", "name": "Demo" } }, "device": { "connectiontype": 0, "devicetype": 1, "ifa": "e4273e31-97a9-4b29-93a8-8a99f0cea068", "geo": { "country": "USA", "lat": 29.8327, "lon": -95.6627, "type": 1, "zip": "77084" }, "ip": "172.56.14.6", "make": "Generic", "model": "Windows Phone 8", "os": "Windows Phone OS", "osv": "8" }, "user": { "gender": "M", "yob": 1976 } } ● App: name, category, publisher ● Device: device id, type, connection ● Device: make, model, os ● Geo: country, lat/lon ● User: gender, yob Bid Requests https://wiki.smaato.com/pages/viewpage.action?pageId=3670020 OHE ⇒ model.predict(X)
  • 20. User Profiles More features: ● Apps user has + campaign app ● Genres of apps + campaign ● Activity (when clicks) SSP DSP SSP SSP SSP DB device_a: [app1, app2] device_b: [app1] device_c: [app3, app4]
  • 21. Precompute everything! Traffic features Device + campaign features online offline DB device_a: -5.2 -3.0 -4.0 device_b: -3.1 -4.1 -3.4 ... campaign_a campaign_b campaign_c -3 + 0.3 +Serving time: device_bbias
  • 22. Are competitions useful? Yes! ● Research advancement ● Playground to test ideas ● Tools and libraries ● Inspiration ● Learning (a lot!) ● Modeling is quite important ● Visibility & self-branding ● Can talk about them :)
  • 23. Contact Info ● http://alexeygrigorev.com & contact@alexeygrigorev.com ● https://github.com/alexeygrigorev ○ https://github.com/alexeygrigorev/libffm-python ○ https://github.com/alexeygrigorev/libftrl-python ● https://www.linkedin.com/in/agrigorev