write your own data story!
Personalized Web Search 
Fri 11 Oct 2013 – Fri 10 Jan 2014 
194 Teams 
$9,000 cash prize 
Using Historical Logs of a search engine 
QUERIES 
RESULTS 
CLICKS 
and a set of new QUERIES and RESULTS 
rerank the RESULTS in order to optimize relevance 
34,573,630 Sessions with user id 
21,073,569 Queries 
64,693,054 Clicks 
~ 15GB
A METRIC FOR RELEVANCE RIGHT FROM THE LOG? 
ASSUMING WE SEARCH FOR "FRENCH NEWSPAPER", WE TAKE 
A LOOK AT THE LOGS.
DWELL TIME 
WE COMPUTE THE SO CALLED DWELL TIME OF A CLICK 
I.E. THE TIME ELAPSED BEFORE THE NEXT ACTION
DWELL TIME HAS BEEN SHOWN TO BE CORRELATED WITH 
THE RELEVANCE
GOOD WE HAVE A MEASURE OF RELEVANCE ! 
CAN WE GET AN OVERALL SCORE FOR OUR SEARCH ENGINE 
NOW?
Emphasis on relevant 
documents 
Discount per ranking 
Discount Cumulative Gain 
Normalized Discount Cumulative Gain 
Just Normalize Between 0 and 1
PERSONALIZED RERANKING 
IS ABOUT REORDERING THE N-BEST RESULTS BASED ON 
THE USER PAST SEARCH HISTORY 
Results Obtained in the contest: 
Original NCDG 0.79056 
ReRanked NCDG 0.80714 
Equivalent To 
~ Raising the rank of a relevant ( relevancy = 2) result 
from Rank #6 to Rank #5 on each query 
~ Raising the rank of a relevant ( relevancy = 2) result 
from Rank #6 to Rank #2 in 20% of the queries
No researcher. 
No experience in reranking. 
Not much experience in ML for most of us. 
Not exactly our job. No expectations. 
Kenji Lefevre 
37 
Algebraic Geometry 
Learning Python 
Christophe Bourguignat 
37 
Signal Processing Eng. 
Learning Scikit 
Mathieu Scordia 
24 
Data Scientist 
Paul Masurel 
33 
Soft. Engineer 
The Team
A-Team?
Data Hobbits
Understanding 
The Problem
53% OF THE COMPETITORS 
COULD NOT IMPROVE THE BASELINE 
Worse 
53% 
Better 
47%
IDEAL SETUP 
1. compute non-personalized rank 
2. select 10 best hits and serves them in order 
3. re-rank using log analysis. 
4. put new ranking algorithm in prod (yeah right!) 
5. compute NDCG on new logs 
6. … 
7. Profits !!
REAL SETUP 
1. compute non-personalized rank 
2. select 10 bests hits 
3. serve 10 bests hits ranked in random 
order 
4. re-rank using log analysis, including non-personalized 
rank as a feature 
5. compute score against the log with the 
former rank 
IDEAL 
rank 
serves them in order 
prod (yeah right!)
PROBLEM 
Users tend to click on the first few urls. 
User satisfaction metric is influenced by the display rank. 
Our score is not aligned with our goal. 
We cannot discriminate the effect of the signal 
of the non-personalized rank from effect of the display rank
PROMOTES 
OVER CONSERVATIVE RE-RANKING POLICY 
Even if we know for sure that the url with rank 9 would be clicked by the user if it was presented at 
rank 1, it would be probably a bad idea to rerank it to rank 1 in this contest. 
Average per session of the max position jump
Simple, point wise approach 
Session 1 Session 2 .... 
2 
1 
0 
For each (URL, Session) predict relevance (0,1 or 2)
Supervised Learning on History 
We split 27 days of the train dataset 24 (history) + 3 days (annotated). 
Stop randomly in the last 3 days at a “test" session (like Yandex) 
Train Set 
(24 history) 
Train Set 
(annotation) Test Set
How They Did It
Features Construction : 
Split Train & Validation 
Team Member work independantly
FEATURES 
The Existing Rank (base rank) 
Revisits (Query-(User)-URL) features and variants 
Query Features 
Cumulative Features 
User Click Habits Features 
Collaborative Filtering Features 
Seasonality Features
REVISITS 
In the past, when the user was displayed this url, with the exact same query 
what is the probability that : 
• satisfaction=2 
• satisfaction=1 
• satisfaction=0 
• miss (not-clicked) 
• skipped (after the last click) 
5 Conditional Probability Features 
1 An overall counter of display 
4 mean reciprocal rank 
(kind of the harmonic mean of the rank) 
1 snippet quality score 
(twisted formula used to compute 
snippet quality) 
11 Base Features
MANY VARIATIONS 
• (In the past|within the same sesssion), 
• (with this very query | whatever query | a subquery | a super query) 
• and was offered (this url/this domain) 
X2 
X 3 
X 2 
12 variants 
With the same user 
Without being the same user ( URL - query features) 
• Same Domain 
• Same URL 
• Same Query and Same URL 
3 variants 
15 Variants 
X 11 Base Features 
165 Features
Labelled 30 days data 
Features Construction : 
Team Member work independantly 
Learning : 
Team Member work independantly 
Split Train & Validation 
> 200 Potential Features 
on 30 days
Short Story 
Point Wise, Random Forest, 30 Features, 4th Place (*) 
Optimize & Train in ~ 1 hour (12 cores), 24 trees 
List Wise , LambdaMART, 90 Features, 1st Place (*) 
Trained in 2 days, 1135 Trees 
(*) A Yandex “PaceMaker" Team was also displaying results on the leaderboard and were 
at the first place during the whole competition even if not officially contestant
Lambda Mart 
Original Ranking Re Ranked 
13 errors 11 errors 
High Quality Hit 
Low Quality Hit 
Rank Net Gradient 
LambdaRank "Gradient" 
Gradient Boosted Trees 
with a special gradient called 
“Lambda Rank" 
From RankNet to LambdaRank to LambdaMART: An Overview 
Christopher J.C. Burges - Microsoft Research Technical Report MSR-TR-2010-82
Grid Search 
We are not doing typical classification here. It is extremely important to perform grid 
search directly against NDCG final score. 
NDCG “conservatism” end up with large “min samples per leaf” 
(between 40 and 80 )
Feature Selection 
Top-Down approach : Starting from 
a high number of features, 
iteratively removed subsets of 
features. This approach led to the 
subset of 90 features for the 
LambdaMart winning solutions 
(Similar strategy now implemented by 
sklearn.feature_selection.RFECV) 
Bottom-up approach : Starting from a low 
number of features, add the features that 
produce the best marginal improvement. 
Gave the 30 features that lead to the best 
solution with the point-wise approach.
Take Away 
Set up a Valid and Solid Cross Validation scheme 
Prototype with fast ML methods, optimize with boosting 
Be systematic in terms of feature selection 
Setup a reproductible workflows early on 
Split tasks when running as a team
Special Offer 
We offer a free server (with DSS) for 
teams running on Kaggle 
Competitions 
Conditions: 
- Be at least 3 people 
- Up to 3 three teams Max. sponsored 
per competition 
competitions@dataiku.com 
Florian DOUETTEAU 
florian.douetteau@daitaku.com
References 
Ranklib ( Implementation of LambdaMART) 
http://sourceforge.net/p/lemur/wiki/RankLib/ 
These Slides 
http://www.slideshare.net/Dataiku 
Blog Post About Additive Smoothing 
http://fumicoton.com/posts/bayesian_rating 
Blog Posts about the solution 
http://www.dataiku.com/blog/2014/01/14/winning-kaggle.html 
http://blog.kaggle.com/2014/02/06/winning-personalized-web-search-team-dataiku/ 
Contest Url 
https://www.kaggle.com/c/yandex-personalized-web-search-challenge 
Paper with Detailed Description 
http://research.microsoft.com/en-us/um/people/nickcr/wscd2014/papers/wscdchallenge2014dataiku.pdf 
Research Papers 
From RankNet to LambdaRank to LambdaMART: An Overview 
Christopher J.C. Burges - Microsoft Research Technical Report MSR-TR-2010-82 
Learning to rank using multiple classification and gradient boosting. 
P. Li, C. J. C. Burges, and Q. Wu. Mcrank - In NIPS, 2007

Florian Douetteau @ Dataiku

  • 1.
    write your owndata story!
  • 2.
    Personalized Web Search Fri 11 Oct 2013 – Fri 10 Jan 2014 194 Teams $9,000 cash prize Using Historical Logs of a search engine QUERIES RESULTS CLICKS and a set of new QUERIES and RESULTS rerank the RESULTS in order to optimize relevance 34,573,630 Sessions with user id 21,073,569 Queries 64,693,054 Clicks ~ 15GB
  • 3.
    A METRIC FORRELEVANCE RIGHT FROM THE LOG? ASSUMING WE SEARCH FOR "FRENCH NEWSPAPER", WE TAKE A LOOK AT THE LOGS.
  • 4.
    DWELL TIME WECOMPUTE THE SO CALLED DWELL TIME OF A CLICK I.E. THE TIME ELAPSED BEFORE THE NEXT ACTION
  • 5.
    DWELL TIME HASBEEN SHOWN TO BE CORRELATED WITH THE RELEVANCE
  • 6.
    GOOD WE HAVEA MEASURE OF RELEVANCE ! CAN WE GET AN OVERALL SCORE FOR OUR SEARCH ENGINE NOW?
  • 7.
    Emphasis on relevant documents Discount per ranking Discount Cumulative Gain Normalized Discount Cumulative Gain Just Normalize Between 0 and 1
  • 8.
    PERSONALIZED RERANKING ISABOUT REORDERING THE N-BEST RESULTS BASED ON THE USER PAST SEARCH HISTORY Results Obtained in the contest: Original NCDG 0.79056 ReRanked NCDG 0.80714 Equivalent To ~ Raising the rank of a relevant ( relevancy = 2) result from Rank #6 to Rank #5 on each query ~ Raising the rank of a relevant ( relevancy = 2) result from Rank #6 to Rank #2 in 20% of the queries
  • 9.
    No researcher. Noexperience in reranking. Not much experience in ML for most of us. Not exactly our job. No expectations. Kenji Lefevre 37 Algebraic Geometry Learning Python Christophe Bourguignat 37 Signal Processing Eng. Learning Scikit Mathieu Scordia 24 Data Scientist Paul Masurel 33 Soft. Engineer The Team
  • 10.
  • 11.
  • 12.
  • 13.
    53% OF THECOMPETITORS COULD NOT IMPROVE THE BASELINE Worse 53% Better 47%
  • 14.
    IDEAL SETUP 1.compute non-personalized rank 2. select 10 best hits and serves them in order 3. re-rank using log analysis. 4. put new ranking algorithm in prod (yeah right!) 5. compute NDCG on new logs 6. … 7. Profits !!
  • 15.
    REAL SETUP 1.compute non-personalized rank 2. select 10 bests hits 3. serve 10 bests hits ranked in random order 4. re-rank using log analysis, including non-personalized rank as a feature 5. compute score against the log with the former rank IDEAL rank serves them in order prod (yeah right!)
  • 16.
    PROBLEM Users tendto click on the first few urls. User satisfaction metric is influenced by the display rank. Our score is not aligned with our goal. We cannot discriminate the effect of the signal of the non-personalized rank from effect of the display rank
  • 17.
    PROMOTES OVER CONSERVATIVERE-RANKING POLICY Even if we know for sure that the url with rank 9 would be clicked by the user if it was presented at rank 1, it would be probably a bad idea to rerank it to rank 1 in this contest. Average per session of the max position jump
  • 18.
    Simple, point wiseapproach Session 1 Session 2 .... 2 1 0 For each (URL, Session) predict relevance (0,1 or 2)
  • 19.
    Supervised Learning onHistory We split 27 days of the train dataset 24 (history) + 3 days (annotated). Stop randomly in the last 3 days at a “test" session (like Yandex) Train Set (24 history) Train Set (annotation) Test Set
  • 20.
  • 21.
    Features Construction : Split Train & Validation Team Member work independantly
  • 22.
    FEATURES The ExistingRank (base rank) Revisits (Query-(User)-URL) features and variants Query Features Cumulative Features User Click Habits Features Collaborative Filtering Features Seasonality Features
  • 23.
    REVISITS In thepast, when the user was displayed this url, with the exact same query what is the probability that : • satisfaction=2 • satisfaction=1 • satisfaction=0 • miss (not-clicked) • skipped (after the last click) 5 Conditional Probability Features 1 An overall counter of display 4 mean reciprocal rank (kind of the harmonic mean of the rank) 1 snippet quality score (twisted formula used to compute snippet quality) 11 Base Features
  • 24.
    MANY VARIATIONS •(In the past|within the same sesssion), • (with this very query | whatever query | a subquery | a super query) • and was offered (this url/this domain) X2 X 3 X 2 12 variants With the same user Without being the same user ( URL - query features) • Same Domain • Same URL • Same Query and Same URL 3 variants 15 Variants X 11 Base Features 165 Features
  • 25.
    Labelled 30 daysdata Features Construction : Team Member work independantly Learning : Team Member work independantly Split Train & Validation > 200 Potential Features on 30 days
  • 26.
    Short Story PointWise, Random Forest, 30 Features, 4th Place (*) Optimize & Train in ~ 1 hour (12 cores), 24 trees List Wise , LambdaMART, 90 Features, 1st Place (*) Trained in 2 days, 1135 Trees (*) A Yandex “PaceMaker" Team was also displaying results on the leaderboard and were at the first place during the whole competition even if not officially contestant
  • 27.
    Lambda Mart OriginalRanking Re Ranked 13 errors 11 errors High Quality Hit Low Quality Hit Rank Net Gradient LambdaRank "Gradient" Gradient Boosted Trees with a special gradient called “Lambda Rank" From RankNet to LambdaRank to LambdaMART: An Overview Christopher J.C. Burges - Microsoft Research Technical Report MSR-TR-2010-82
  • 28.
    Grid Search Weare not doing typical classification here. It is extremely important to perform grid search directly against NDCG final score. NDCG “conservatism” end up with large “min samples per leaf” (between 40 and 80 )
  • 29.
    Feature Selection Top-Downapproach : Starting from a high number of features, iteratively removed subsets of features. This approach led to the subset of 90 features for the LambdaMart winning solutions (Similar strategy now implemented by sklearn.feature_selection.RFECV) Bottom-up approach : Starting from a low number of features, add the features that produce the best marginal improvement. Gave the 30 features that lead to the best solution with the point-wise approach.
  • 30.
    Take Away Setup a Valid and Solid Cross Validation scheme Prototype with fast ML methods, optimize with boosting Be systematic in terms of feature selection Setup a reproductible workflows early on Split tasks when running as a team
  • 31.
    Special Offer Weoffer a free server (with DSS) for teams running on Kaggle Competitions Conditions: - Be at least 3 people - Up to 3 three teams Max. sponsored per competition competitions@dataiku.com Florian DOUETTEAU florian.douetteau@daitaku.com
  • 32.
    References Ranklib (Implementation of LambdaMART) http://sourceforge.net/p/lemur/wiki/RankLib/ These Slides http://www.slideshare.net/Dataiku Blog Post About Additive Smoothing http://fumicoton.com/posts/bayesian_rating Blog Posts about the solution http://www.dataiku.com/blog/2014/01/14/winning-kaggle.html http://blog.kaggle.com/2014/02/06/winning-personalized-web-search-team-dataiku/ Contest Url https://www.kaggle.com/c/yandex-personalized-web-search-challenge Paper with Detailed Description http://research.microsoft.com/en-us/um/people/nickcr/wscd2014/papers/wscdchallenge2014dataiku.pdf Research Papers From RankNet to LambdaRank to LambdaMART: An Overview Christopher J.C. Burges - Microsoft Research Technical Report MSR-TR-2010-82 Learning to rank using multiple classification and gradient boosting. P. Li, C. J. C. Burges, and Q. Wu. Mcrank - In NIPS, 2007