Combining machine
learning and search
through learning to rank
Jettro Coenradie
Jettro Coenradie
I talk about (Elastic) Search
@jettroCoenradie
https://github.com/jettro
https://www.linkedin.com/in/jettro/
Combining machine
learning and search
through learning to rank
Search and Ranking
in eCommerce
Order Matters
Changing order influences sales
DEMO
How do we come from
‘U2’ to a list of albums?
Meet Elasticsearch
Content Pipeline
elastic
search
Document
Store
Inverted
Index
Analyse
{
query
}
Match
Recap: Inverted Index
Terms doc_ids ttf
against 5 1
band 6 1
beatles 3 1
machine 5 1
radiohead 4 1
rage 5 1
rolling 2 1
stones 2 1
the 5,6 2
u2 1 1
Doc Id Artist
1 U2
2 Rolling Stones
3 Beatles
4 Radiohead
5 Rage against the machine
6 The Band
{
"album": "War",
"image": "rs-137178-883f5fe955b2745cd539.jpg",
"information": "<p>U2 were on the cusp of becoming one
of the Eighties&apos; most important groups when their
third album came out. It&apos;s the band&apos;s most
overtly political album, with songs about Poland&apos;s
Solidarity movement (&quot;New Year&apos;s Day&quot;)
and Irish unrest (&quot;Sunday Bloody Sunday&quot;)
charged with explosive, passionate guitar rock.</p>",
"sequence": 223,
"order": 279,
"label": "Island",
"artist": "U2",
"year": "1983",
"id": 350640
}
curl -XGET "http://localhost:9200/rolling500/_search"
-H 'Content-Type: application/json' -d'
{
"query": {
"multi_match": {
"query": "u2",
"fields": [
"album.analyzed",
"artist.analyzed",
"information"
]
}
}
}'
album.

analyzed
artist.

analyzed
information
{
query
}
1
3
20
211
410
Matching
410
211
1
20
3
Ranking
Ranking with BM25
Change Ranking of results
• Add boosting to fields

• Add boosting by doc using other signals

• Add decay using other fields
Re-ranking
Learning To Rank
https://github.com/o19s/elasticsearch-learning-to-rank
Learning to rank
Y = f(X)

X - set of queries

Y - Ideal ordered set of documents per query

f(X) - The model with learned parameters through an
algorithm
Learning to rank
Query 1
Model
X f(X) Y
Query 2
Query 3
Query 4
5 15 67 3 17
23 3 7 88 45
3 27 25 23 5
6 99 22 27 33
Model Evaluation Types
• Binary relevance (MAP, Precision)

• Graded relevance, position based (DCG, NDCG)

• Graded relevance, cascade based (ERR)
http://bit.ly/eval-metric
Learning to Rank
approaches
• Point wise - Calculate a score for each item and sort them

• Pair wise - Compare two items each time and sort them.
Use binary classifier for one document that is better than
the other.

• List wise - Optimise the complete list directly using one of
the evaluation measures, averaged over all queries
http://bit.ly/list-wise-approach
LTR: Steps to take
1. Create Judgement List (Ground Truth)

2. Define features for the model

3. Log features during usage

4. Training and testing the model

5. Deploying and using the model
1. Judgement List
Obtain labelled data to train the model
Judgement List:
Expert Panel
• Time consuming and Expensive

• Error prone due to different judgements
http://bit.ly/ebay-hum-judge
Judgement List:
Implicit Feedback
• Log user behaviour

• A click is not a relevance judgement

• Compare actual clicks versus expected clicks
Click models
• Random Click Model -> Every document has the same
chance of being clicked

• Click Through Rate Model -> Uses the fact that the first
document is clicked far more than the second document

• Cascade Model -> A click in the third item also tells us
something about the first two items

• Dynamic Bayesian Network Model -> Uses the historical
clicks in a search session and the actual relevance of a
document.
http://bit.ly/click-model
Judgement List:
Implicit Feedback
• Use as a signal to the ranking algorithm -> Feature

• Use as Label to train the model -> Ground truth
Using python scripts
from demo project
https://github.com/o19s/elasticsearch-learning-to-rank
Our Judgement List
# grade (0-4) queryid docId title
#
# qid:1: u2
# qid:2: metallica
# qid:3: beatles
#
# https://sourceforge.net/p/lemur/wiki/RankLib%20File%20Format/
#
4 qid:1 #350967 U2-Achtung Baby
4 qid:1 #350696 U2-All That You Can't Leave Behind
1 qid:1 #350897 Radiohead-The Bends
1 qid:1 #351105 Radiohead-Kid A
1 qid:1 #351029 The Beatles-Sgt. Pepper's Lonely Hearts Club Band
4 qid:2 #350672 Metallica-Metallica
4 qid:2 #350816 Metallica-Master of Puppets
0 qid:2 #351029 The Beatles-Sgt. Pepper's Lonely Hearts Club Band
4 qid:3 #350976 Beatles-Meet the Beatles!
1 qid:3 #350881 The Byrds-Younger Than Yesterday
2. features for the model
• Elasticsearch queries

• Raw Term Statistics

• Document Frequency

• Total Term Frequency

• Also max, min, sum (in case of multiple terms, fields)
2. features for the model
{
"query": {
"match": {
"artist.analyzed": "{{keywords}}"
}
}
}
Two other features are similar for fields “album.analyzed” and “information”
{
"query": {
"nested": {
"path": "clicks",
"query": {
"function_score": {
"query": {
"term": {
"clicks.term": {
"value": "{{keywords}}"
}
}
},
"functions": [
{
"field_value_factor": {
"field": "clicks.clicks",
"modifier": "log1p"
}
}
]
}
}
}
}
}
3. Log features
4 qid:1 1:5.813487 2:0.0 3:4.599949 4:0.3804092 # 350967 u2
4 qid:1 1:5.813487 2:0.0 3:4.7994814 4:0.3804092 # 350696 u2
4 qid:1 1:5.813487 2:0.0 3:3.1217866 4:0.98334354 # 351003 u2
4 qid:1 1:5.813487 2:0.0 3:4.5399256 4:0.7608184 # 350640 u2
4 qid:1 1:5.813487 2:0.0 3:4.5399256 4:0.3804092 # 351098 u2
1 qid:1 1:0.0 2:0.0 3:4.7994814 4:0.0 # 350897 u2
1 qid:1 1:0.0 2:0.0 3:3.1217866 4:0.3804092 # 351105 u2
1 qid:1 1:0.0 2:0.0 3:0.5061711 4:0.0 # 351029 u2
4 qid:2 1:6.8280644 2:7.9855165 3:5.279917 4:1.3360937 # 350672 metallica
4 qid:2 1:6.8280644 2:0.0 3:5.513398 4:0.66804683 # 350816 metallica
0 qid:2 1:0.0 2:0.0 3:0.87671757 4:0.0 # 351029 metallica
4 qid:3 1:3.757609 2:5.796505 3:3.581085 4:0.2996537 # 350976 beatles
3 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 351103 beatles
1 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 350881 beatles
1 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 351065 beatles
4 qid:3 1:3.757609 2:0.0 3:3.2117074 4:0.2996537 # 351016 beatles
4 qid:3 1:3.757609 2:0.0 3:2.2627318 4:0.2996537 # 351027 beatles
4 qid:3 1:3.757609 2:0.0 3:2.7668488 4:0.0 # 351025 beatles
4 qid:3 1:3.757609 2:0.0 3:3.581085 4:0.2996537 # 350991 beatles
4 qid:3 1:3.757609 2:0.0 3:2.1937072 4:0.2996537 # 351029 beatles
4 qid:3 1:3.757609 2:0.0 3:3.2462234 4:0.2996537 # 350760 beatles
4. Train and test model
• Making use of Ranklib

• Following Algorithms are available:

• MART, RankNet, RankBoost, AdaRank, Coordinate Ascent,
LambdaMART, ListNet, Random Forests

• Supported evaluation metrics to optimise on training data

• MAP, NDCG@k, DCG@k, P@k, RR@k, ERR@k

• Can specify separate train, validation and test set

• Can normalise feature sets
Models using Ranklib
MART
Multiple Additive Regression Trees, a gradient boosting machine. Can be
used for regression as well as classification.
RankNet
Compare two feature vectors using stochastic gradient descent with the
help of a cost function.
RankBoost
Based on AdaBoost, combining many weak rankings into a single highly
accurate ranking. Is pairwise comparison
AdaRank
Combines a number of weak learners in a linear way. Builds on AdaBoost,
but directed more at ranking.
Coordinate Ascent
Optimises one parameter at a time, keeping the other constant. Done
iteratively until some convergence criteria is met.
LambdaRank
Optimisation of RankNet that only looks at the gradients represented by
arrows indicating how much they need to move up or down
LambdaMART Combines using the gradients of LambdaRank and the use of MART
ListNet
Uses a list wise loss function, a neural network and gradient descent.
Similar to RankNet, only difference is List versus Pair loss functions.
Random Forests
Number of trees to vote for the most popular class for a vector of features.
One tree would not be better than a random choice, but a forest is
Linear Regression
Most of the times to simplistic for the learning to rank problem with lots of
features, but good to have available to at least try
Evaluation metrics
MAP Mean Average Precision: The average of all P@k
NDCG@k
Normalised DCG: A DCG with a value between 0 and 1, normalised by the
highest score.
DCG@k
Discounted Cumulative Gain: Add relevance of all documents discounted
by the position of the document making the 1st document more important
P@k Percentage of relevant documents of this top K
RR@k Reciprocal Rank: 1/K where K is the first relevant document
ERR@k
Expected Reciprocal Rank: discounts documents that are below a very
relevant document. (Can be used for List wise comparison)
Training results
MAP NDCG@10 DCG@10 P@10 RR@10 ERR@10
MART 0,999 0,979 38,7047 0,9333 1,0 0,9618
RankNet 0,8806 0,8831 33,298 0,8933 0,9 0,5988
RankBoost 1,0 1,0 39,8699 0,9333 1,0 0,9629
AdaRank (List) 0,9493 0,8271 31,4686 0,8733 0,8 0,7597
Coordinate
Ascent
0,9446 0,8973 36,3888 0,9133 1,0 0,96
LambdaRank 0,9562 0,7079 35,7277 0,8933 0,9 0,7598
LambdaMART 0,9725 0,9805 38,788 0,9333 1,0 0,9618
ListNet (List) 0,9496 0,8973 33,1995 0,8933 1,0 0,8229
Random Forests 0,999 0,979 38,7047 0,9333 1,0 0,9618
Linear
Regression
0,9463 0,8602 34,1433 0,8933 0,9 0,7906
5. Deploy the model
• The model including learned parameters is stored in
elasticsearch.

• Using the plugin we can now re-rank the top-n results
GET rolling500/_search
{
"query": {
"multi_match": {
"query": "rolling",
"fields": ["album.analyzed", "artist.analyzed", "information"]
}
},
"rescore": {
"window_size": 1000,
"query": {
"rescore_query": {
"sltr": {
"params": {"keywords": "rolling"},
"model": "test_6"
}
}
}
}
}
DEMO

Combining machine learning and search through learning to rank

  • 1.
    Combining machine learning andsearch through learning to rank Jettro Coenradie
  • 6.
    Jettro Coenradie I talkabout (Elastic) Search @jettroCoenradie https://github.com/jettro https://www.linkedin.com/in/jettro/
  • 7.
    Combining machine learning andsearch through learning to rank
  • 8.
  • 12.
  • 13.
  • 14.
    How do wecome from ‘U2’ to a list of albums?
  • 15.
  • 16.
  • 17.
    Recap: Inverted Index Termsdoc_ids ttf against 5 1 band 6 1 beatles 3 1 machine 5 1 radiohead 4 1 rage 5 1 rolling 2 1 stones 2 1 the 5,6 2 u2 1 1 Doc Id Artist 1 U2 2 Rolling Stones 3 Beatles 4 Radiohead 5 Rage against the machine 6 The Band
  • 18.
    { "album": "War", "image": "rs-137178-883f5fe955b2745cd539.jpg", "information":"<p>U2 were on the cusp of becoming one of the Eighties&apos; most important groups when their third album came out. It&apos;s the band&apos;s most overtly political album, with songs about Poland&apos;s Solidarity movement (&quot;New Year&apos;s Day&quot;) and Irish unrest (&quot;Sunday Bloody Sunday&quot;) charged with explosive, passionate guitar rock.</p>", "sequence": 223, "order": 279, "label": "Island", "artist": "U2", "year": "1983", "id": 350640 }
  • 19.
    curl -XGET "http://localhost:9200/rolling500/_search" -H'Content-Type: application/json' -d' { "query": { "multi_match": { "query": "u2", "fields": [ "album.analyzed", "artist.analyzed", "information" ] } } }'
  • 20.
  • 21.
  • 22.
    Change Ranking ofresults • Add boosting to fields • Add boosting by doc using other signals • Add decay using other fields
  • 23.
  • 24.
  • 25.
    Learning to rank Y= f(X) X - set of queries Y - Ideal ordered set of documents per query f(X) - The model with learned parameters through an algorithm
  • 26.
    Learning to rank Query1 Model X f(X) Y Query 2 Query 3 Query 4 5 15 67 3 17 23 3 7 88 45 3 27 25 23 5 6 99 22 27 33
  • 27.
    Model Evaluation Types •Binary relevance (MAP, Precision) • Graded relevance, position based (DCG, NDCG) • Graded relevance, cascade based (ERR) http://bit.ly/eval-metric
  • 28.
    Learning to Rank approaches •Point wise - Calculate a score for each item and sort them • Pair wise - Compare two items each time and sort them. Use binary classifier for one document that is better than the other. • List wise - Optimise the complete list directly using one of the evaluation measures, averaged over all queries http://bit.ly/list-wise-approach
  • 29.
    LTR: Steps totake 1. Create Judgement List (Ground Truth) 2. Define features for the model 3. Log features during usage 4. Training and testing the model 5. Deploying and using the model
  • 30.
    1. Judgement List Obtainlabelled data to train the model
  • 31.
    Judgement List: Expert Panel •Time consuming and Expensive • Error prone due to different judgements http://bit.ly/ebay-hum-judge
  • 32.
    Judgement List: Implicit Feedback •Log user behaviour • A click is not a relevance judgement • Compare actual clicks versus expected clicks
  • 33.
    Click models • RandomClick Model -> Every document has the same chance of being clicked • Click Through Rate Model -> Uses the fact that the first document is clicked far more than the second document • Cascade Model -> A click in the third item also tells us something about the first two items • Dynamic Bayesian Network Model -> Uses the historical clicks in a search session and the actual relevance of a document. http://bit.ly/click-model
  • 34.
    Judgement List: Implicit Feedback •Use as a signal to the ranking algorithm -> Feature • Use as Label to train the model -> Ground truth
  • 35.
    Using python scripts fromdemo project https://github.com/o19s/elasticsearch-learning-to-rank
  • 36.
    Our Judgement List #grade (0-4) queryid docId title # # qid:1: u2 # qid:2: metallica # qid:3: beatles # # https://sourceforge.net/p/lemur/wiki/RankLib%20File%20Format/ # 4 qid:1 #350967 U2-Achtung Baby 4 qid:1 #350696 U2-All That You Can't Leave Behind 1 qid:1 #350897 Radiohead-The Bends 1 qid:1 #351105 Radiohead-Kid A 1 qid:1 #351029 The Beatles-Sgt. Pepper's Lonely Hearts Club Band 4 qid:2 #350672 Metallica-Metallica 4 qid:2 #350816 Metallica-Master of Puppets 0 qid:2 #351029 The Beatles-Sgt. Pepper's Lonely Hearts Club Band 4 qid:3 #350976 Beatles-Meet the Beatles! 1 qid:3 #350881 The Byrds-Younger Than Yesterday
  • 37.
    2. features forthe model • Elasticsearch queries • Raw Term Statistics • Document Frequency • Total Term Frequency • Also max, min, sum (in case of multiple terms, fields)
  • 38.
    2. features forthe model { "query": { "match": { "artist.analyzed": "{{keywords}}" } } } Two other features are similar for fields “album.analyzed” and “information”
  • 39.
    { "query": { "nested": { "path":"clicks", "query": { "function_score": { "query": { "term": { "clicks.term": { "value": "{{keywords}}" } } }, "functions": [ { "field_value_factor": { "field": "clicks.clicks", "modifier": "log1p" } } ] } } } } }
  • 40.
    3. Log features 4qid:1 1:5.813487 2:0.0 3:4.599949 4:0.3804092 # 350967 u2 4 qid:1 1:5.813487 2:0.0 3:4.7994814 4:0.3804092 # 350696 u2 4 qid:1 1:5.813487 2:0.0 3:3.1217866 4:0.98334354 # 351003 u2 4 qid:1 1:5.813487 2:0.0 3:4.5399256 4:0.7608184 # 350640 u2 4 qid:1 1:5.813487 2:0.0 3:4.5399256 4:0.3804092 # 351098 u2 1 qid:1 1:0.0 2:0.0 3:4.7994814 4:0.0 # 350897 u2 1 qid:1 1:0.0 2:0.0 3:3.1217866 4:0.3804092 # 351105 u2 1 qid:1 1:0.0 2:0.0 3:0.5061711 4:0.0 # 351029 u2 4 qid:2 1:6.8280644 2:7.9855165 3:5.279917 4:1.3360937 # 350672 metallica 4 qid:2 1:6.8280644 2:0.0 3:5.513398 4:0.66804683 # 350816 metallica 0 qid:2 1:0.0 2:0.0 3:0.87671757 4:0.0 # 351029 metallica 4 qid:3 1:3.757609 2:5.796505 3:3.581085 4:0.2996537 # 350976 beatles 3 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 351103 beatles 1 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 350881 beatles 1 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 351065 beatles 4 qid:3 1:3.757609 2:0.0 3:3.2117074 4:0.2996537 # 351016 beatles 4 qid:3 1:3.757609 2:0.0 3:2.2627318 4:0.2996537 # 351027 beatles 4 qid:3 1:3.757609 2:0.0 3:2.7668488 4:0.0 # 351025 beatles 4 qid:3 1:3.757609 2:0.0 3:3.581085 4:0.2996537 # 350991 beatles 4 qid:3 1:3.757609 2:0.0 3:2.1937072 4:0.2996537 # 351029 beatles 4 qid:3 1:3.757609 2:0.0 3:3.2462234 4:0.2996537 # 350760 beatles
  • 41.
    4. Train andtest model • Making use of Ranklib • Following Algorithms are available: • MART, RankNet, RankBoost, AdaRank, Coordinate Ascent, LambdaMART, ListNet, Random Forests • Supported evaluation metrics to optimise on training data • MAP, NDCG@k, DCG@k, P@k, RR@k, ERR@k • Can specify separate train, validation and test set • Can normalise feature sets
  • 42.
    Models using Ranklib MART MultipleAdditive Regression Trees, a gradient boosting machine. Can be used for regression as well as classification. RankNet Compare two feature vectors using stochastic gradient descent with the help of a cost function. RankBoost Based on AdaBoost, combining many weak rankings into a single highly accurate ranking. Is pairwise comparison AdaRank Combines a number of weak learners in a linear way. Builds on AdaBoost, but directed more at ranking. Coordinate Ascent Optimises one parameter at a time, keeping the other constant. Done iteratively until some convergence criteria is met. LambdaRank Optimisation of RankNet that only looks at the gradients represented by arrows indicating how much they need to move up or down LambdaMART Combines using the gradients of LambdaRank and the use of MART ListNet Uses a list wise loss function, a neural network and gradient descent. Similar to RankNet, only difference is List versus Pair loss functions. Random Forests Number of trees to vote for the most popular class for a vector of features. One tree would not be better than a random choice, but a forest is Linear Regression Most of the times to simplistic for the learning to rank problem with lots of features, but good to have available to at least try
  • 43.
    Evaluation metrics MAP MeanAverage Precision: The average of all P@k NDCG@k Normalised DCG: A DCG with a value between 0 and 1, normalised by the highest score. DCG@k Discounted Cumulative Gain: Add relevance of all documents discounted by the position of the document making the 1st document more important P@k Percentage of relevant documents of this top K RR@k Reciprocal Rank: 1/K where K is the first relevant document ERR@k Expected Reciprocal Rank: discounts documents that are below a very relevant document. (Can be used for List wise comparison)
  • 44.
    Training results MAP NDCG@10DCG@10 P@10 RR@10 ERR@10 MART 0,999 0,979 38,7047 0,9333 1,0 0,9618 RankNet 0,8806 0,8831 33,298 0,8933 0,9 0,5988 RankBoost 1,0 1,0 39,8699 0,9333 1,0 0,9629 AdaRank (List) 0,9493 0,8271 31,4686 0,8733 0,8 0,7597 Coordinate Ascent 0,9446 0,8973 36,3888 0,9133 1,0 0,96 LambdaRank 0,9562 0,7079 35,7277 0,8933 0,9 0,7598 LambdaMART 0,9725 0,9805 38,788 0,9333 1,0 0,9618 ListNet (List) 0,9496 0,8973 33,1995 0,8933 1,0 0,8229 Random Forests 0,999 0,979 38,7047 0,9333 1,0 0,9618 Linear Regression 0,9463 0,8602 34,1433 0,8933 0,9 0,7906
  • 45.
    5. Deploy themodel • The model including learned parameters is stored in elasticsearch. • Using the plugin we can now re-rank the top-n results
  • 46.
    GET rolling500/_search { "query": { "multi_match":{ "query": "rolling", "fields": ["album.analyzed", "artist.analyzed", "information"] } }, "rescore": { "window_size": 1000, "query": { "rescore_query": { "sltr": { "params": {"keywords": "rolling"}, "model": "test_6" } } } } }
  • 47.