Florian Douetteau @ Dataiku

Personalized Web Search
Fri 11 Oct 2013 – Fri 10 Jan 2014
194 Teams
$9,000 cash prize
Using Historical Logs of a search engine
QUERIES
RESULTS
CLICKS
and a set of new QUERIES and RESULTS
rerank the RESULTS in order to optimize relevance
34,573,630 Sessions with user id
21,073,569 Queries
64,693,054 Clicks
~ 15GB

A METRIC FOR RELEVANCE RIGHT FROM THE LOG?
ASSUMING WE SEARCH FOR "FRENCH NEWSPAPER", WE TAKE
A LOOK AT THE LOGS.

DWELL TIME
WE COMPUTE THE SO CALLED DWELL TIME OF A CLICK
I.E. THE TIME ELAPSED BEFORE THE NEXT ACTION

DWELL TIME HAS BEEN SHOWN TO BE CORRELATED WITH
THE RELEVANCE

GOOD WE HAVE A MEASURE OF RELEVANCE !
CAN WE GET AN OVERALL SCORE FOR OUR SEARCH ENGINE
NOW?

Emphasis on relevant
documents
Discount per ranking
Discount Cumulative Gain
Normalized Discount Cumulative Gain
Just Normalize Between 0 and 1

PERSONALIZED RERANKING
IS ABOUT REORDERING THE N-BEST RESULTS BASED ON
THE USER PAST SEARCH HISTORY
Results Obtained in the contest:
Original NCDG 0.79056
ReRanked NCDG 0.80714
Equivalent To
~ Raising the rank of a relevant ( relevancy = 2) result
from Rank #6 to Rank #5 on each query
~ Raising the rank of a relevant ( relevancy = 2) result
from Rank #6 to Rank #2 in 20% of the queries

No researcher.
No experience in reranking.
Not much experience in ML for most of us.
Not exactly our job. No expectations.
Kenji Lefevre
37
Algebraic Geometry
Learning Python
Christophe Bourguignat
37
Signal Processing Eng.
Learning Scikit
Mathieu Scordia
24
Data Scientist
Paul Masurel
33
Soft. Engineer
The Team

53% OF THE COMPETITORS
COULD NOT IMPROVE THE BASELINE
Worse
53%
Better
47%

IDEAL SETUP
1. compute non-personalized rank
2. select 10 best hits and serves them in order
3. re-rank using log analysis.
4. put new ranking algorithm in prod (yeah right!)
5. compute NDCG on new logs
6. …
7. Profits !!

REAL SETUP
1. compute non-personalized rank
2. select 10 bests hits
3. serve 10 bests hits ranked in random
order
4. re-rank using log analysis, including non-personalized
rank as a feature
5. compute score against the log with the
former rank
IDEAL
rank
serves them in order
prod (yeah right!)

PROBLEM
Users tend to click on the first few urls.
User satisfaction metric is influenced by the display rank.
Our score is not aligned with our goal.
We cannot discriminate the effect of the signal
of the non-personalized rank from effect of the display rank

PROMOTES
OVER CONSERVATIVE RE-RANKING POLICY
Even if we know for sure that the url with rank 9 would be clicked by the user if it was presented at
rank 1, it would be probably a bad idea to rerank it to rank 1 in this contest.
Average per session of the max position jump

Simple, point wise approach
Session 1 Session 2 ....
2
1
0
For each (URL, Session) predict relevance (0,1 or 2)

Supervised Learning on History
We split 27 days of the train dataset 24 (history) + 3 days (annotated).
Stop randomly in the last 3 days at a “test" session (like Yandex)
Train Set
(24 history)
Train Set
(annotation) Test Set

Features Construction :
Split Train & Validation
Team Member work independantly

FEATURES
The Existing Rank (base rank)
Revisits (Query-(User)-URL) features and variants
Query Features
Cumulative Features
User Click Habits Features
Collaborative Filtering Features
Seasonality Features

REVISITS
In the past, when the user was displayed this url, with the exact same query
what is the probability that :
• satisfaction=2
• satisfaction=1
• satisfaction=0
• miss (not-clicked)
• skipped (after the last click)
5 Conditional Probability Features
1 An overall counter of display
4 mean reciprocal rank
(kind of the harmonic mean of the rank)
1 snippet quality score
(twisted formula used to compute
snippet quality)
11 Base Features

MANY VARIATIONS
• (In the past|within the same sesssion),
• (with this very query | whatever query | a subquery | a super query)
• and was offered (this url/this domain)
X2
X 3
X 2
12 variants
With the same user
Without being the same user ( URL - query features)
• Same Domain
• Same URL
• Same Query and Same URL
3 variants
15 Variants
X 11 Base Features
165 Features

Labelled 30 days data
Features Construction :
Learning :
Split Train & Validation
> 200 Potential Features
on 30 days

Short Story
Point Wise, Random Forest, 30 Features, 4th Place (*)
Optimize & Train in ~ 1 hour (12 cores), 24 trees
List Wise , LambdaMART, 90 Features, 1st Place (*)
Trained in 2 days, 1135 Trees
(*) A Yandex “PaceMaker" Team was also displaying results on the leaderboard and were
at the first place during the whole competition even if not officially contestant

Lambda Mart
Original Ranking Re Ranked
13 errors 11 errors
High Quality Hit
Low Quality Hit
Rank Net Gradient
LambdaRank "Gradient"
Gradient Boosted Trees
with a special gradient called
“Lambda Rank"
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges - Microsoft Research Technical Report MSR-TR-2010-82

Grid Search
We are not doing typical classification here. It is extremely important to perform grid
search directly against NDCG final score.
NDCG “conservatism” end up with large “min samples per leaf”
(between 40 and 80 )

Feature Selection
Top-Down approach : Starting from
a high number of features,
iteratively removed subsets of
features. This approach led to the
subset of 90 features for the
LambdaMart winning solutions
(Similar strategy now implemented by
sklearn.feature_selection.RFECV)
Bottom-up approach : Starting from a low
number of features, add the features that
produce the best marginal improvement.
Gave the 30 features that lead to the best
solution with the point-wise approach.

Take Away
Set up a Valid and Solid Cross Validation scheme
Prototype with fast ML methods, optimize with boosting
Be systematic in terms of feature selection
Setup a reproductible workflows early on
Split tasks when running as a team

Special Offer
We offer a free server (with DSS) for
teams running on Kaggle
Competitions
Conditions:
- Be at least 3 people
- Up to 3 three teams Max. sponsored
per competition
competitions@dataiku.com
Florian DOUETTEAU
florian.douetteau@daitaku.com

References
Ranklib ( Implementation of LambdaMART)
http://sourceforge.net/p/lemur/wiki/RankLib/
These Slides
http://www.slideshare.net/Dataiku
Blog Post About Additive Smoothing
http://fumicoton.com/posts/bayesian_rating
Blog Posts about the solution
http://www.dataiku.com/blog/2014/01/14/winning-kaggle.html
http://blog.kaggle.com/2014/02/06/winning-personalized-web-search-team-dataiku/
Contest Url
https://www.kaggle.com/c/yandex-personalized-web-search-challenge
Paper with Detailed Description
http://research.microsoft.com/en-us/um/people/nickcr/wscd2014/papers/wscdchallenge2014dataiku.pdf
Research Papers
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges - Microsoft Research Technical Report MSR-TR-2010-82
Learning to rank using multiple classification and gradient boosting.
P. Li, C. J. C. Burges, and Q. Wu. Mcrank - In NIPS, 2007

Florian Douetteau @ Dataiku

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Florian Douetteau @ Dataiku

Similar to Florian Douetteau @ Dataiku (20)

More from PAPIs.io

More from PAPIs.io (20)

Recently uploaded

Recently uploaded (20)

Florian Douetteau @ Dataiku