SlideShare a Scribd company logo
Combining machine
learning and search
through learning to rank
Jettro Coenradie
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
Jettro Coenradie
I talk about (Elastic) Search
@jettroCoenradie
https://github.com/jettro
https://www.linkedin.com/in/jettro/
Combining machine
learning and search
through learning to rank
Search and Ranking
in eCommerce
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
Order Matters
Changing order influences sales
DEMO
How do we come from
‘U2’ to a list of albums?
Meet Elasticsearch
Content Pipeline
elastic
search
Document
Store
Inverted
Index
Analyse
{
query
}
Match
Recap: Inverted Index
Terms doc_ids ttf
against 5 1
band 6 1
beatles 3 1
machine 5 1
radiohead 4 1
rage 5 1
rolling 2 1
stones 2 1
the 5,6 2
u2 1 1
Doc Id Artist
1 U2
2 Rolling Stones
3 Beatles
4 Radiohead
5 Rage against the machine
6 The Band
{
"album": "War",
"image": "rs-137178-883f5fe955b2745cd539.jpg",
"information": "<p>U2 were on the cusp of becoming one
of the Eighties&apos; most important groups when their
third album came out. It&apos;s the band&apos;s most
overtly political album, with songs about Poland&apos;s
Solidarity movement (&quot;New Year&apos;s Day&quot;)
and Irish unrest (&quot;Sunday Bloody Sunday&quot;)
charged with explosive, passionate guitar rock.</p>",
"sequence": 223,
"order": 279,
"label": "Island",
"artist": "U2",
"year": "1983",
"id": 350640
}
curl -XGET "http://localhost:9200/rolling500/_search"
-H 'Content-Type: application/json' -d'
{
"query": {
"multi_match": {
"query": "u2",
"fields": [
"album.analyzed",
"artist.analyzed",
"information"
]
}
}
}'
album.

analyzed
artist.

analyzed
information
{
query
}
1
3
20
211
410
Matching
410
211
1
20
3
Ranking
Ranking with BM25
Change Ranking of results
• Add boosting to fields

• Add boosting by doc using other signals

• Add decay using other fields
Re-ranking
Learning To Rank
https://github.com/o19s/elasticsearch-learning-to-rank
Learning to rank
Y = f(X)

X - set of queries

Y - Ideal ordered set of documents per query

f(X) - The model with learned parameters through an
algorithm
Learning to rank
Query 1
Model
X f(X) Y
Query 2
Query 3
Query 4
5 15 67 3 17
23 3 7 88 45
3 27 25 23 5
6 99 22 27 33
Model Evaluation Types
• Binary relevance (MAP, Precision)

• Graded relevance, position based (DCG, NDCG)

• Graded relevance, cascade based (ERR)
http://bit.ly/eval-metric
Learning to Rank
approaches
• Point wise - Calculate a score for each item and sort them

• Pair wise - Compare two items each time and sort them.
Use binary classifier for one document that is better than
the other.

• List wise - Optimise the complete list directly using one of
the evaluation measures, averaged over all queries
http://bit.ly/list-wise-approach
LTR: Steps to take
1. Create Judgement List (Ground Truth)

2. Define features for the model

3. Log features during usage

4. Training and testing the model

5. Deploying and using the model
1. Judgement List
Obtain labelled data to train the model
Judgement List:
Expert Panel
• Time consuming and Expensive

• Error prone due to different judgements
http://bit.ly/ebay-hum-judge
Judgement List:
Implicit Feedback
• Log user behaviour

• A click is not a relevance judgement

• Compare actual clicks versus expected clicks
Click models
• Random Click Model -> Every document has the same
chance of being clicked

• Click Through Rate Model -> Uses the fact that the first
document is clicked far more than the second document

• Cascade Model -> A click in the third item also tells us
something about the first two items

• Dynamic Bayesian Network Model -> Uses the historical
clicks in a search session and the actual relevance of a
document.
http://bit.ly/click-model
Judgement List:
Implicit Feedback
• Use as a signal to the ranking algorithm -> Feature

• Use as Label to train the model -> Ground truth
Using python scripts
from demo project
https://github.com/o19s/elasticsearch-learning-to-rank
Our Judgement List
# grade (0-4) queryid docId title
#
# qid:1: u2
# qid:2: metallica
# qid:3: beatles
#
# https://sourceforge.net/p/lemur/wiki/RankLib%20File%20Format/
#
4 qid:1 #350967 U2-Achtung Baby
4 qid:1 #350696 U2-All That You Can't Leave Behind
1 qid:1 #350897 Radiohead-The Bends
1 qid:1 #351105 Radiohead-Kid A
1 qid:1 #351029 The Beatles-Sgt. Pepper's Lonely Hearts Club Band
4 qid:2 #350672 Metallica-Metallica
4 qid:2 #350816 Metallica-Master of Puppets
0 qid:2 #351029 The Beatles-Sgt. Pepper's Lonely Hearts Club Band
4 qid:3 #350976 Beatles-Meet the Beatles!
1 qid:3 #350881 The Byrds-Younger Than Yesterday
2. features for the model
• Elasticsearch queries

• Raw Term Statistics

• Document Frequency

• Total Term Frequency

• Also max, min, sum (in case of multiple terms, fields)
2. features for the model
{
"query": {
"match": {
"artist.analyzed": "{{keywords}}"
}
}
}
Two other features are similar for fields “album.analyzed” and “information”
{
"query": {
"nested": {
"path": "clicks",
"query": {
"function_score": {
"query": {
"term": {
"clicks.term": {
"value": "{{keywords}}"
}
}
},
"functions": [
{
"field_value_factor": {
"field": "clicks.clicks",
"modifier": "log1p"
}
}
]
}
}
}
}
}
3. Log features
4 qid:1 1:5.813487 2:0.0 3:4.599949 4:0.3804092 # 350967 u2
4 qid:1 1:5.813487 2:0.0 3:4.7994814 4:0.3804092 # 350696 u2
4 qid:1 1:5.813487 2:0.0 3:3.1217866 4:0.98334354 # 351003 u2
4 qid:1 1:5.813487 2:0.0 3:4.5399256 4:0.7608184 # 350640 u2
4 qid:1 1:5.813487 2:0.0 3:4.5399256 4:0.3804092 # 351098 u2
1 qid:1 1:0.0 2:0.0 3:4.7994814 4:0.0 # 350897 u2
1 qid:1 1:0.0 2:0.0 3:3.1217866 4:0.3804092 # 351105 u2
1 qid:1 1:0.0 2:0.0 3:0.5061711 4:0.0 # 351029 u2
4 qid:2 1:6.8280644 2:7.9855165 3:5.279917 4:1.3360937 # 350672 metallica
4 qid:2 1:6.8280644 2:0.0 3:5.513398 4:0.66804683 # 350816 metallica
0 qid:2 1:0.0 2:0.0 3:0.87671757 4:0.0 # 351029 metallica
4 qid:3 1:3.757609 2:5.796505 3:3.581085 4:0.2996537 # 350976 beatles
3 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 351103 beatles
1 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 350881 beatles
1 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 351065 beatles
4 qid:3 1:3.757609 2:0.0 3:3.2117074 4:0.2996537 # 351016 beatles
4 qid:3 1:3.757609 2:0.0 3:2.2627318 4:0.2996537 # 351027 beatles
4 qid:3 1:3.757609 2:0.0 3:2.7668488 4:0.0 # 351025 beatles
4 qid:3 1:3.757609 2:0.0 3:3.581085 4:0.2996537 # 350991 beatles
4 qid:3 1:3.757609 2:0.0 3:2.1937072 4:0.2996537 # 351029 beatles
4 qid:3 1:3.757609 2:0.0 3:3.2462234 4:0.2996537 # 350760 beatles
4. Train and test model
• Making use of Ranklib

• Following Algorithms are available:

• MART, RankNet, RankBoost, AdaRank, Coordinate Ascent,
LambdaMART, ListNet, Random Forests

• Supported evaluation metrics to optimise on training data

• MAP, NDCG@k, DCG@k, P@k, RR@k, ERR@k

• Can specify separate train, validation and test set

• Can normalise feature sets
Models using Ranklib
MART
Multiple Additive Regression Trees, a gradient boosting machine. Can be
used for regression as well as classification.
RankNet
Compare two feature vectors using stochastic gradient descent with the
help of a cost function.
RankBoost
Based on AdaBoost, combining many weak rankings into a single highly
accurate ranking. Is pairwise comparison
AdaRank
Combines a number of weak learners in a linear way. Builds on AdaBoost,
but directed more at ranking.
Coordinate Ascent
Optimises one parameter at a time, keeping the other constant. Done
iteratively until some convergence criteria is met.
LambdaRank
Optimisation of RankNet that only looks at the gradients represented by
arrows indicating how much they need to move up or down
LambdaMART Combines using the gradients of LambdaRank and the use of MART
ListNet
Uses a list wise loss function, a neural network and gradient descent.
Similar to RankNet, only difference is List versus Pair loss functions.
Random Forests
Number of trees to vote for the most popular class for a vector of features.
One tree would not be better than a random choice, but a forest is
Linear Regression
Most of the times to simplistic for the learning to rank problem with lots of
features, but good to have available to at least try
Evaluation metrics
MAP Mean Average Precision: The average of all P@k
NDCG@k
Normalised DCG: A DCG with a value between 0 and 1, normalised by the
highest score.
DCG@k
Discounted Cumulative Gain: Add relevance of all documents discounted
by the position of the document making the 1st document more important
P@k Percentage of relevant documents of this top K
RR@k Reciprocal Rank: 1/K where K is the first relevant document
ERR@k
Expected Reciprocal Rank: discounts documents that are below a very
relevant document. (Can be used for List wise comparison)
Training results
MAP NDCG@10 DCG@10 P@10 RR@10 ERR@10
MART 0,999 0,979 38,7047 0,9333 1,0 0,9618
RankNet 0,8806 0,8831 33,298 0,8933 0,9 0,5988
RankBoost 1,0 1,0 39,8699 0,9333 1,0 0,9629
AdaRank (List) 0,9493 0,8271 31,4686 0,8733 0,8 0,7597
Coordinate
Ascent
0,9446 0,8973 36,3888 0,9133 1,0 0,96
LambdaRank 0,9562 0,7079 35,7277 0,8933 0,9 0,7598
LambdaMART 0,9725 0,9805 38,788 0,9333 1,0 0,9618
ListNet (List) 0,9496 0,8973 33,1995 0,8933 1,0 0,8229
Random Forests 0,999 0,979 38,7047 0,9333 1,0 0,9618
Linear
Regression
0,9463 0,8602 34,1433 0,8933 0,9 0,7906
5. Deploy the model
• The model including learned parameters is stored in
elasticsearch.

• Using the plugin we can now re-rank the top-n results
GET rolling500/_search
{
"query": {
"multi_match": {
"query": "rolling",
"fields": ["album.analyzed", "artist.analyzed", "information"]
}
},
"rescore": {
"window_size": 1000,
"query": {
"rescore_query": {
"sltr": {
"params": {"keywords": "rolling"},
"model": "test_6"
}
}
}
}
}
DEMO

More Related Content

Similar to Combining machine learning and search through learning to rank

Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Gábor Szárnyas
 
Data Mining Course Overview Overview.ppt
Data Mining Course Overview Overview.pptData Mining Course Overview Overview.ppt
Data Mining Course Overview Overview.ppt
fatimaezzahraboumaiz2
 
lghjghgggkgjhgjghhjgjhgkhjghjghjghjghect1.ppt
lghjghgggkgjhgjghhjgjhgkhjghjghjghjghect1.pptlghjghgggkgjhgjghhjgjhgkhjghjghjghjghect1.ppt
lghjghgggkgjhgjghhjgjhgkhjghjghjghjghect1.ppt
JITENDER773791
 
data mining presentation power point for the study
data mining presentation power point for the studydata mining presentation power point for the study
data mining presentation power point for the study
anjanishah774
 
lect1.ppt
lect1.pptlect1.ppt
lect1.ppt
ssuserb26f53
 
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptlect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
DEEPAK948083
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation System
Minha Hwang
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Shree Shree
 
lecture8-evaluation.pptxnnnnnnnnnnnnnnnnnnnnnnnnn
lecture8-evaluation.pptxnnnnnnnnnnnnnnnnnnnnnnnnnlecture8-evaluation.pptxnnnnnnnnnnnnnnnnnnnnnnnnn
lecture8-evaluation.pptxnnnnnnnnnnnnnnnnnnnnnnnnn
RAtna29
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
ADV Slides: Graph Databases on the Edge
ADV Slides: Graph Databases on the EdgeADV Slides: Graph Databases on the Edge
ADV Slides: Graph Databases on the Edge
DATAVERSITY
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
markgrover
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
jill734733
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
Neo4j
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Hyderabad Scalability Meetup
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
Andy Stretton
 
Pistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinarPistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinar
Pistoia Alliance
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data Team
Traveloka
 
datamining-lect1.pptx
datamining-lect1.pptxdatamining-lect1.pptx
datamining-lect1.pptx
GautamDematti1
 

Similar to Combining machine learning and search through learning to rank (20)

Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
 
Data Mining Course Overview Overview.ppt
Data Mining Course Overview Overview.pptData Mining Course Overview Overview.ppt
Data Mining Course Overview Overview.ppt
 
lghjghgggkgjhgjghhjgjhgkhjghjghjghjghect1.ppt
lghjghgggkgjhgjghhjgjhgkhjghjghjghjghect1.pptlghjghgggkgjhgjghhjgjhgkhjghjghjghjghect1.ppt
lghjghgggkgjhgjghhjgjhgkhjghjghjghjghect1.ppt
 
data mining presentation power point for the study
data mining presentation power point for the studydata mining presentation power point for the study
data mining presentation power point for the study
 
lect1.ppt
lect1.pptlect1.ppt
lect1.ppt
 
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptlect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation System
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
lecture8-evaluation.pptxnnnnnnnnnnnnnnnnnnnnnnnnn
lecture8-evaluation.pptxnnnnnnnnnnnnnnnnnnnnnnnnnlecture8-evaluation.pptxnnnnnnnnnnnnnnnnnnnnnnnnn
lecture8-evaluation.pptxnnnnnnnnnnnnnnnnnnnnnnnnn
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
ADV Slides: Graph Databases on the Edge
ADV Slides: Graph Databases on the EdgeADV Slides: Graph Databases on the Edge
ADV Slides: Graph Databases on the Edge
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 
Pistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinarPistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinar
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data Team
 
datamining-lect1.pptx
datamining-lect1.pptxdatamining-lect1.pptx
datamining-lect1.pptx
 

Recently uploaded

Girls Call Bandra East 9910780858 Provide Best And Top Girl Service And No1 i...
Girls Call Bandra East 9910780858 Provide Best And Top Girl Service And No1 i...Girls Call Bandra East 9910780858 Provide Best And Top Girl Service And No1 i...
Girls Call Bandra East 9910780858 Provide Best And Top Girl Service And No1 i...
margaretblush
 
UCI biyezheng degree offer diploma Transcript
UCI biyezheng degree offer diploma TranscriptUCI biyezheng degree offer diploma Transcript
UCI biyezheng degree offer diploma Transcript
xmevus
 
Cornell biyezheng degree offer diploma Transcript
Cornell biyezheng degree offer diploma TranscriptCornell biyezheng degree offer diploma Transcript
Cornell biyezheng degree offer diploma Transcript
xmevus
 
Strategies for Adoption of SDGs in organizations
Strategies for Adoption of SDGs in organizationsStrategies for Adoption of SDGs in organizations
Strategies for Adoption of SDGs in organizations
Amgad Morgan
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
rawankhanlove256
 
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
parichopra4
 
stackconf 2024 | Using European Open Source to build a Sovereign Multi-Cloud ...
stackconf 2024 | Using European Open Source to build a Sovereign Multi-Cloud ...stackconf 2024 | Using European Open Source to build a Sovereign Multi-Cloud ...
stackconf 2024 | Using European Open Source to build a Sovereign Multi-Cloud ...
NETWAYS
 
Biography of the late Mrs. Stella Atsupui Eddah.pdf
Biography of the late Mrs. Stella Atsupui Eddah.pdfBiography of the late Mrs. Stella Atsupui Eddah.pdf
Biography of the late Mrs. Stella Atsupui Eddah.pdf
AbdulSadickZutah
 
GT biyezheng degree offer diploma Transcript
GT biyezheng degree offer diploma TranscriptGT biyezheng degree offer diploma Transcript
GT biyezheng degree offer diploma Transcript
xmevus
 
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
seenaoberoi
 
Chandigarh Girls Call Chandigarh 0X0000000X Provide Best And Top Girl Service...
Chandigarh Girls Call Chandigarh 0X0000000X Provide Best And Top Girl Service...Chandigarh Girls Call Chandigarh 0X0000000X Provide Best And Top Girl Service...
Chandigarh Girls Call Chandigarh 0X0000000X Provide Best And Top Girl Service...
kishanaaani
 
A study on drug utilization evaluation of bronchodilators using DDD method
A study on drug utilization evaluation of bronchodilators using DDD methodA study on drug utilization evaluation of bronchodilators using DDD method
A study on drug utilization evaluation of bronchodilators using DDD method
Dr. Chihiro
 
calcaneal fracture seminar by dr vishu.pptx
calcaneal fracture seminar by dr vishu.pptxcalcaneal fracture seminar by dr vishu.pptx
calcaneal fracture seminar by dr vishu.pptx
Skmch
 
UC Davis biyezheng degree offer diploma Transcript
UC Davis biyezheng degree offer diploma TranscriptUC Davis biyezheng degree offer diploma Transcript
UC Davis biyezheng degree offer diploma Transcript
xmevus
 
@ℂall Lucknow @Girls Chinhat 08630512678
@ℂall Lucknow  @Girls Chinhat 08630512678 @ℂall Lucknow  @Girls Chinhat 08630512678
@ℂall Lucknow @Girls Chinhat 08630512678
veenita788
 
HERO.pdf hero company working cap management project
HERO.pdf hero company working cap management projectHERO.pdf hero company working cap management project
HERO.pdf hero company working cap management project
SambalpurTokaSatyaji
 
TEST WORTHINESS: VALIDITY, RELIABILITY, PRACTICALITY
TEST WORTHINESS: VALIDITY, RELIABILITY, PRACTICALITYTEST WORTHINESS: VALIDITY, RELIABILITY, PRACTICALITY
TEST WORTHINESS: VALIDITY, RELIABILITY, PRACTICALITY
AaSs197122
 
Call India - AmanTel on the App Store.ppt
Call India - AmanTel on the App Store.pptCall India - AmanTel on the App Store.ppt
Call India - AmanTel on the App Store.ppt
Best International calling app on the market
 
Lucknow Girls Call Fazullaganj 08630512678 Provide Best And Top Girl Service ...
Lucknow Girls Call Fazullaganj 08630512678 Provide Best And Top Girl Service ...Lucknow Girls Call Fazullaganj 08630512678 Provide Best And Top Girl Service ...
Lucknow Girls Call Fazullaganj 08630512678 Provide Best And Top Girl Service ...
bangaloreakshitakaus
 
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
rashmikasinghdelhiro
 

Recently uploaded (20)

Girls Call Bandra East 9910780858 Provide Best And Top Girl Service And No1 i...
Girls Call Bandra East 9910780858 Provide Best And Top Girl Service And No1 i...Girls Call Bandra East 9910780858 Provide Best And Top Girl Service And No1 i...
Girls Call Bandra East 9910780858 Provide Best And Top Girl Service And No1 i...
 
UCI biyezheng degree offer diploma Transcript
UCI biyezheng degree offer diploma TranscriptUCI biyezheng degree offer diploma Transcript
UCI biyezheng degree offer diploma Transcript
 
Cornell biyezheng degree offer diploma Transcript
Cornell biyezheng degree offer diploma TranscriptCornell biyezheng degree offer diploma Transcript
Cornell biyezheng degree offer diploma Transcript
 
Strategies for Adoption of SDGs in organizations
Strategies for Adoption of SDGs in organizationsStrategies for Adoption of SDGs in organizations
Strategies for Adoption of SDGs in organizations
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
 
stackconf 2024 | Using European Open Source to build a Sovereign Multi-Cloud ...
stackconf 2024 | Using European Open Source to build a Sovereign Multi-Cloud ...stackconf 2024 | Using European Open Source to build a Sovereign Multi-Cloud ...
stackconf 2024 | Using European Open Source to build a Sovereign Multi-Cloud ...
 
Biography of the late Mrs. Stella Atsupui Eddah.pdf
Biography of the late Mrs. Stella Atsupui Eddah.pdfBiography of the late Mrs. Stella Atsupui Eddah.pdf
Biography of the late Mrs. Stella Atsupui Eddah.pdf
 
GT biyezheng degree offer diploma Transcript
GT biyezheng degree offer diploma TranscriptGT biyezheng degree offer diploma Transcript
GT biyezheng degree offer diploma Transcript
 
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
Mysore Girls Call Mysore 0X0000000X Payment On Delevery Cash Hot Premium Genu...
 
Chandigarh Girls Call Chandigarh 0X0000000X Provide Best And Top Girl Service...
Chandigarh Girls Call Chandigarh 0X0000000X Provide Best And Top Girl Service...Chandigarh Girls Call Chandigarh 0X0000000X Provide Best And Top Girl Service...
Chandigarh Girls Call Chandigarh 0X0000000X Provide Best And Top Girl Service...
 
A study on drug utilization evaluation of bronchodilators using DDD method
A study on drug utilization evaluation of bronchodilators using DDD methodA study on drug utilization evaluation of bronchodilators using DDD method
A study on drug utilization evaluation of bronchodilators using DDD method
 
calcaneal fracture seminar by dr vishu.pptx
calcaneal fracture seminar by dr vishu.pptxcalcaneal fracture seminar by dr vishu.pptx
calcaneal fracture seminar by dr vishu.pptx
 
UC Davis biyezheng degree offer diploma Transcript
UC Davis biyezheng degree offer diploma TranscriptUC Davis biyezheng degree offer diploma Transcript
UC Davis biyezheng degree offer diploma Transcript
 
@ℂall Lucknow @Girls Chinhat 08630512678
@ℂall Lucknow  @Girls Chinhat 08630512678 @ℂall Lucknow  @Girls Chinhat 08630512678
@ℂall Lucknow @Girls Chinhat 08630512678
 
HERO.pdf hero company working cap management project
HERO.pdf hero company working cap management projectHERO.pdf hero company working cap management project
HERO.pdf hero company working cap management project
 
TEST WORTHINESS: VALIDITY, RELIABILITY, PRACTICALITY
TEST WORTHINESS: VALIDITY, RELIABILITY, PRACTICALITYTEST WORTHINESS: VALIDITY, RELIABILITY, PRACTICALITY
TEST WORTHINESS: VALIDITY, RELIABILITY, PRACTICALITY
 
Call India - AmanTel on the App Store.ppt
Call India - AmanTel on the App Store.pptCall India - AmanTel on the App Store.ppt
Call India - AmanTel on the App Store.ppt
 
Lucknow Girls Call Fazullaganj 08630512678 Provide Best And Top Girl Service ...
Lucknow Girls Call Fazullaganj 08630512678 Provide Best And Top Girl Service ...Lucknow Girls Call Fazullaganj 08630512678 Provide Best And Top Girl Service ...
Lucknow Girls Call Fazullaganj 08630512678 Provide Best And Top Girl Service ...
 
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
 

Combining machine learning and search through learning to rank

  • 1. Combining machine learning and search through learning to rank Jettro Coenradie
  • 6. Jettro Coenradie I talk about (Elastic) Search @jettroCoenradie https://github.com/jettro https://www.linkedin.com/in/jettro/
  • 7. Combining machine learning and search through learning to rank
  • 12. Order Matters Changing order influences sales
  • 13. DEMO
  • 14. How do we come from ‘U2’ to a list of albums?
  • 17. Recap: Inverted Index Terms doc_ids ttf against 5 1 band 6 1 beatles 3 1 machine 5 1 radiohead 4 1 rage 5 1 rolling 2 1 stones 2 1 the 5,6 2 u2 1 1 Doc Id Artist 1 U2 2 Rolling Stones 3 Beatles 4 Radiohead 5 Rage against the machine 6 The Band
  • 18. { "album": "War", "image": "rs-137178-883f5fe955b2745cd539.jpg", "information": "<p>U2 were on the cusp of becoming one of the Eighties&apos; most important groups when their third album came out. It&apos;s the band&apos;s most overtly political album, with songs about Poland&apos;s Solidarity movement (&quot;New Year&apos;s Day&quot;) and Irish unrest (&quot;Sunday Bloody Sunday&quot;) charged with explosive, passionate guitar rock.</p>", "sequence": 223, "order": 279, "label": "Island", "artist": "U2", "year": "1983", "id": 350640 }
  • 19. curl -XGET "http://localhost:9200/rolling500/_search" -H 'Content-Type: application/json' -d' { "query": { "multi_match": { "query": "u2", "fields": [ "album.analyzed", "artist.analyzed", "information" ] } } }'
  • 22. Change Ranking of results • Add boosting to fields • Add boosting by doc using other signals • Add decay using other fields
  • 25. Learning to rank Y = f(X) X - set of queries Y - Ideal ordered set of documents per query f(X) - The model with learned parameters through an algorithm
  • 26. Learning to rank Query 1 Model X f(X) Y Query 2 Query 3 Query 4 5 15 67 3 17 23 3 7 88 45 3 27 25 23 5 6 99 22 27 33
  • 27. Model Evaluation Types • Binary relevance (MAP, Precision) • Graded relevance, position based (DCG, NDCG) • Graded relevance, cascade based (ERR) http://bit.ly/eval-metric
  • 28. Learning to Rank approaches • Point wise - Calculate a score for each item and sort them • Pair wise - Compare two items each time and sort them. Use binary classifier for one document that is better than the other. • List wise - Optimise the complete list directly using one of the evaluation measures, averaged over all queries http://bit.ly/list-wise-approach
  • 29. LTR: Steps to take 1. Create Judgement List (Ground Truth) 2. Define features for the model 3. Log features during usage 4. Training and testing the model 5. Deploying and using the model
  • 30. 1. Judgement List Obtain labelled data to train the model
  • 31. Judgement List: Expert Panel • Time consuming and Expensive • Error prone due to different judgements http://bit.ly/ebay-hum-judge
  • 32. Judgement List: Implicit Feedback • Log user behaviour • A click is not a relevance judgement • Compare actual clicks versus expected clicks
  • 33. Click models • Random Click Model -> Every document has the same chance of being clicked • Click Through Rate Model -> Uses the fact that the first document is clicked far more than the second document • Cascade Model -> A click in the third item also tells us something about the first two items • Dynamic Bayesian Network Model -> Uses the historical clicks in a search session and the actual relevance of a document. http://bit.ly/click-model
  • 34. Judgement List: Implicit Feedback • Use as a signal to the ranking algorithm -> Feature • Use as Label to train the model -> Ground truth
  • 35. Using python scripts from demo project https://github.com/o19s/elasticsearch-learning-to-rank
  • 36. Our Judgement List # grade (0-4) queryid docId title # # qid:1: u2 # qid:2: metallica # qid:3: beatles # # https://sourceforge.net/p/lemur/wiki/RankLib%20File%20Format/ # 4 qid:1 #350967 U2-Achtung Baby 4 qid:1 #350696 U2-All That You Can't Leave Behind 1 qid:1 #350897 Radiohead-The Bends 1 qid:1 #351105 Radiohead-Kid A 1 qid:1 #351029 The Beatles-Sgt. Pepper's Lonely Hearts Club Band 4 qid:2 #350672 Metallica-Metallica 4 qid:2 #350816 Metallica-Master of Puppets 0 qid:2 #351029 The Beatles-Sgt. Pepper's Lonely Hearts Club Band 4 qid:3 #350976 Beatles-Meet the Beatles! 1 qid:3 #350881 The Byrds-Younger Than Yesterday
  • 37. 2. features for the model • Elasticsearch queries • Raw Term Statistics • Document Frequency • Total Term Frequency • Also max, min, sum (in case of multiple terms, fields)
  • 38. 2. features for the model { "query": { "match": { "artist.analyzed": "{{keywords}}" } } } Two other features are similar for fields “album.analyzed” and “information”
  • 39. { "query": { "nested": { "path": "clicks", "query": { "function_score": { "query": { "term": { "clicks.term": { "value": "{{keywords}}" } } }, "functions": [ { "field_value_factor": { "field": "clicks.clicks", "modifier": "log1p" } } ] } } } } }
  • 40. 3. Log features 4 qid:1 1:5.813487 2:0.0 3:4.599949 4:0.3804092 # 350967 u2 4 qid:1 1:5.813487 2:0.0 3:4.7994814 4:0.3804092 # 350696 u2 4 qid:1 1:5.813487 2:0.0 3:3.1217866 4:0.98334354 # 351003 u2 4 qid:1 1:5.813487 2:0.0 3:4.5399256 4:0.7608184 # 350640 u2 4 qid:1 1:5.813487 2:0.0 3:4.5399256 4:0.3804092 # 351098 u2 1 qid:1 1:0.0 2:0.0 3:4.7994814 4:0.0 # 350897 u2 1 qid:1 1:0.0 2:0.0 3:3.1217866 4:0.3804092 # 351105 u2 1 qid:1 1:0.0 2:0.0 3:0.5061711 4:0.0 # 351029 u2 4 qid:2 1:6.8280644 2:7.9855165 3:5.279917 4:1.3360937 # 350672 metallica 4 qid:2 1:6.8280644 2:0.0 3:5.513398 4:0.66804683 # 350816 metallica 0 qid:2 1:0.0 2:0.0 3:0.87671757 4:0.0 # 351029 metallica 4 qid:3 1:3.757609 2:5.796505 3:3.581085 4:0.2996537 # 350976 beatles 3 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 351103 beatles 1 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 350881 beatles 1 qid:3 1:0.0 2:0.0 3:4.0479565 4:0.0 # 351065 beatles 4 qid:3 1:3.757609 2:0.0 3:3.2117074 4:0.2996537 # 351016 beatles 4 qid:3 1:3.757609 2:0.0 3:2.2627318 4:0.2996537 # 351027 beatles 4 qid:3 1:3.757609 2:0.0 3:2.7668488 4:0.0 # 351025 beatles 4 qid:3 1:3.757609 2:0.0 3:3.581085 4:0.2996537 # 350991 beatles 4 qid:3 1:3.757609 2:0.0 3:2.1937072 4:0.2996537 # 351029 beatles 4 qid:3 1:3.757609 2:0.0 3:3.2462234 4:0.2996537 # 350760 beatles
  • 41. 4. Train and test model • Making use of Ranklib • Following Algorithms are available: • MART, RankNet, RankBoost, AdaRank, Coordinate Ascent, LambdaMART, ListNet, Random Forests • Supported evaluation metrics to optimise on training data • MAP, NDCG@k, DCG@k, P@k, RR@k, ERR@k • Can specify separate train, validation and test set • Can normalise feature sets
  • 42. Models using Ranklib MART Multiple Additive Regression Trees, a gradient boosting machine. Can be used for regression as well as classification. RankNet Compare two feature vectors using stochastic gradient descent with the help of a cost function. RankBoost Based on AdaBoost, combining many weak rankings into a single highly accurate ranking. Is pairwise comparison AdaRank Combines a number of weak learners in a linear way. Builds on AdaBoost, but directed more at ranking. Coordinate Ascent Optimises one parameter at a time, keeping the other constant. Done iteratively until some convergence criteria is met. LambdaRank Optimisation of RankNet that only looks at the gradients represented by arrows indicating how much they need to move up or down LambdaMART Combines using the gradients of LambdaRank and the use of MART ListNet Uses a list wise loss function, a neural network and gradient descent. Similar to RankNet, only difference is List versus Pair loss functions. Random Forests Number of trees to vote for the most popular class for a vector of features. One tree would not be better than a random choice, but a forest is Linear Regression Most of the times to simplistic for the learning to rank problem with lots of features, but good to have available to at least try
  • 43. Evaluation metrics MAP Mean Average Precision: The average of all P@k NDCG@k Normalised DCG: A DCG with a value between 0 and 1, normalised by the highest score. DCG@k Discounted Cumulative Gain: Add relevance of all documents discounted by the position of the document making the 1st document more important P@k Percentage of relevant documents of this top K RR@k Reciprocal Rank: 1/K where K is the first relevant document ERR@k Expected Reciprocal Rank: discounts documents that are below a very relevant document. (Can be used for List wise comparison)
  • 44. Training results MAP NDCG@10 DCG@10 P@10 RR@10 ERR@10 MART 0,999 0,979 38,7047 0,9333 1,0 0,9618 RankNet 0,8806 0,8831 33,298 0,8933 0,9 0,5988 RankBoost 1,0 1,0 39,8699 0,9333 1,0 0,9629 AdaRank (List) 0,9493 0,8271 31,4686 0,8733 0,8 0,7597 Coordinate Ascent 0,9446 0,8973 36,3888 0,9133 1,0 0,96 LambdaRank 0,9562 0,7079 35,7277 0,8933 0,9 0,7598 LambdaMART 0,9725 0,9805 38,788 0,9333 1,0 0,9618 ListNet (List) 0,9496 0,8973 33,1995 0,8933 1,0 0,8229 Random Forests 0,999 0,979 38,7047 0,9333 1,0 0,9618 Linear Regression 0,9463 0,8602 34,1433 0,8933 0,9 0,7906
  • 45. 5. Deploy the model • The model including learned parameters is stored in elasticsearch. • Using the plugin we can now re-rank the top-n results
  • 46. GET rolling500/_search { "query": { "multi_match": { "query": "rolling", "fields": ["album.analyzed", "artist.analyzed", "information"] } }, "rescore": { "window_size": 1000, "query": { "rescore_query": { "sltr": { "params": {"keywords": "rolling"}, "model": "test_6" } } } } }
  • 47. DEMO