SlideShare a Scribd company logo
LEARNING TO RANK
SEARCH RESULTS
Jettro Coenradie &
Byron Voorbach
#JFALL
COMBINE MACHINE LEARNING WITH SEARCH
Neptune
UranusSaturnNeptuneJupiter Mars VenusEarth Mercury
UranusSaturn NeptuneJupiter MarsVenusEarth Mercury
https://solarsystem.nasa.gov/resources/all/
LEARNING TO RANK SEARCH RESULTS
COMBINE MACHINE LEARNING WITH SEARCH
Jettro Coenradie
@jettroCoenradie
https://www.linkedin.com/in/jettro/
https://github.com/jettro
• Fellow at Luminis Amsterdam
• specialised in (Elastic) search
• experimenting with Machine Learning
Byron Voorbach
@byronvoorbach
https://www.linkedin.com/in/byronvoorbach/
https://github.com/byronvoorbach
• Search & Data Engineer at Luminis
Amsterdam
• building and optimising search engines
Search and Ranking
in eCommerce
Order Matters
DEMO
How do we get from
‘Call of duty’ to a list of
games?
Meet Elasticsearch
Content Pipeline
elastic
search
Document
Store
Inverted
Index
Analyse
{
query
}
Match
Recap: Inverted Index
Terms doc_ids ttf
fifa 1 1
call 2 1
of 2 1
duty 2 1
god 3 1
war 3 1
pes 4 1
doodle 5 1
Doc Id Title
1 Fifa
2 Call of Duty
3 God of War
4 PES
5 Doodle God
2,3 2
3,5 2
{
"title": “Call of Duty®: Black Ops 4",
"image": "rs-137178-883f5fe955b2745cd539.jpg",
"description": "<p>Digital Standard Edition includes: - 1,100
Call of Duty® Points* - Digital Edition Bonus Items: --
Specialist Outfit for all Specialists -- Gesture -- Calling Card,
Emblem, Sticker and Tag inspired by the iconic Call of Duty®:
Black Ops 4 skull. Black Ops is back! Featuring gritty,
grounded, fluid Multiplayer combat, the biggest Zombies
offering ever with three full undead adventures at launch, and
Blackout, where the universe of Black Ops comes to life in
one massive battle royale experience.</p>",
"rating": 4.5,
"numberOfRatings": 279,
"vendor": "Activision",
"price": 59.99,
"releaseDate": “2018-12-10“,
"id": 350640
}
curl -XGET "http://localhost:9200/ecommproduct/_search"
-H 'Content-Type: application/json' -d'
{
"query": {
"multi_match": {
"query": "call of duty",
"fields": [
"title",
"description"
]
}
}
}'
{
query
}
1
3
20
211
410
Matching
410
211
1
20
3
Ranking
Ranking with BM25
Change Ranking
of results
{
"query": {
"bool": {
"should": [
{
"match": {
“title": “call of duty",
"boost": 2
}
},
{
"match": {
“description”: “call of duty"
}
}
]
}
}
}
Field centric boosting
Change Ranking
of results
Function score: field value
{
"query": {
"function_score": {
"query": {
"match": {
“title”: “call of duty"
}
},
"functions": [
{
"field_value_factor": {
"field": “numberOfRatings",
"modifier": "log1p"
}
}
]
}
}
}'
Change Ranking
of results
Function score: decay
{
"query": {
"function_score": {
"query": {
"match": {
“title: “call of duty"
}
},
"gauss": {
“releaseDate": {
"scale": "1y",
"offset": "6m",
"decay": 0.5
}
}
}
}
}
Learning To Rank
Learning to rank or machine-learned ranking is the application of machine
learning, typically supervised, semi-supervised or reinforcement learning, in the
construction of ranking models for information retrieval systems.
~ Wikipedia
http://bit.ly/ltr-wp
Machine LearningSupervised
4 rooms
100 m2 € 200k
6 rooms
150 m2 € 350k
3 rooms
500 m2 € 750k
Input Labelled
Output
Model
Train
Predicted
€ 210k
€ 370k
€ 800k
Error
Learning to rank
X
Query 1
Query 2
Query 3
Query 4
Model
f(X)
5 15 67 3 17
23 3 7 88 45
3 27 25 23 5
6 99 22 27 33
YY’
5 3 67 15 17
23 3 17 8 45
3 27 25 6 15
6 9 32 27 33
Predict Error
Model Evaluation Types (Errors)
• Binary relevance (MAP, Precision)
• Graded relevance, position based (DCG, NDCG)
• Only discounts based on relevance
• Graded relevance, cascade based (ERR)
• Discounts based on user interaction with the results
http://bit.ly/eval-metric
MAP using Binary relevance
23
9
88
33
45
YI Average Precision
1
0.67
0.6
fifa
X
relevant
not relevant
= (1 + 0.67 + 0.6) / 3
= 0.76
MAP
DCG using Graded relevance
23 rel=3
9 rel=2
88 rel=4
33 rel=1
45 rel=0
YI
Discounted
Cumulative
Gain
3
5 + 4/log2(3) = 7.57
8.07
fifa
X
Documents ranked on 0 - 4 relevance scale
3 + 2/log2(2) = 5
7.57 + 1/log2(4) = 8.07
NDCG using Graded relevance
23 rel=3
9 rel=2
88 rel=4
33 rel=1
45 rel=0
YI DCG
3
7.57
8.07
fifa
X
Documents ranked on 0 - 4 relevance scale
5
8.07
23 rel=3
9 rel=2
88 rel=4
33 rel=1
45 rel=0
Y MaxDCG
4
8.28
8.77
7
8.77
NDCG
0.75
0.91
0.92
0.71
0.92
Learning to rank - Model
X
Model
Y
Y’
Algorithm
Parameters
Cost function
LTR: Steps to take
1. Create Judgement List (Ground Truth)
2. Define features for the model
3. Log features during usage
4. Training and testing the model
5. Deploying and using the model
6. Feedback loop
1. Judgement List
Obtain labelled data to train the model
Judgement List: Expert Panel
• Time consuming and Expensive
• Error prone due to different judgements
http://bit.ly/ebay-hum-judge
Experiment: Pet or Not
Pet or Not
Pet or Not
Pet or Not
Judgement List: Implicit Feedback
• Log user behaviour
• Compare actual clicks versus expected clicks
• A click is not a relevance judgement
Judgement List: Implicit Feedback
• Use as a signal to the ranking algorithm -> Feature
• Use as Label to train the model -> Ground truth
Using the LTR Plugin and the
python scripts
https://github.com/o19s/elasticsearch-learning-to-rank
Our Judgement List
# grade (0-4) queryid docId title
#
# Add your keyword strings below, the feature script will
# Use them to populate your query templates
#
# qid:1: fifa
# qid:2: football
# qid:3: call of duty
# qid:4: marvel
# qid:5: basketball
# qid:6: god
#
4 qid:1 # 1538781503000 FIFA 18
2 qid:1 # 1536187840000 EA SPORTS FIFA 16
3 qid:1 # 1538107776000 FIFA 19 Ultimate Edition
4 qid:1 # 1538694141000 FIFA 19
3 qid:1 # 1536293937000 EA SPORTS FIFA 17 Standard Edition
1 qid:1 # 1536022097000 FIFA 15
3 qid:2 # 1538509479000 PRO EVOLUTION SOCCER 2018
4 qid:2 # 1535257488000 Pro Evolution Soccer 2018 FC Barcelona Edition
1 qid:2 # 1536293937000 EA SPORTS FIFA 17 Standard Edition
1 qid:2 # 1538781636000 2MD: VR Football
2. features for the model
• Raw Term Statistics
• Document Frequency
• Total Term Frequency
• Also max, min, sum (in case of multiple terms, fields)
• Elasticsearch queries
2. features for the model
{
"query": {
"match": {
"title": "{{keywords}}"
}
}
}
{
"query": {
"match": {
"description”: "{{keywords}}"
}
}
}
{
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "rating",
"missing": 0
}
}
]
}
}
}
{
"query": {
"function_score": {
"gauss": {
"releaseDate": {
"scale": "720d",
"decay": 0.25
}
}
}
}
}
{
"query": {
"nested": {
"path": "basket",
"query": {
"function_score": {
"query": {
"term": {
"basket.term": {
"value": "{{keywords}}"
}
}
},
"functions": [
{
"field_value_factor": {
"field": "basket.clicks",
"modifier": "log1p"
}
}
]
}
}
}
}
}
{
"query": {
"nested": {
"path": "clicks",
"query": {
"function_score": {
"query": {
"term": {
"clicks.term": {
"value": "{{keywords}}"
}
}
},
"functions": [
{
"field_value_factor": {
"field": "clicks.clicks",
"modifier": "log1p"
}
}
]
}
}
}
}
}
3. Log features
2 qid:1 1:7.7917995 2:10.646265 3:4.5 4:0.99658567 5:1.0154246 6:1.475652 # 1538694141000 fifa
0 qid:1 1:7.7917995 2:12.036756 3:4.0 4:0.6522283 5:0.0 6:0.0 # 1538781503000 fifa
2 qid:1 1:5.3206625 2:10.542777 3:4.5 4:0.20758471 5:0.0 6:0.0 # 1536293937000 fifa
2 qid:1 1:7.7917995 2:9.694633 3:4.5 4:0.00240296 5:0.0 6:0.0 # 1536022097000 fifa
2 qid:1 1:6.3233795 2:11.195704 3:4.5 4:0.03137532 5:0.0 6:0.0 # 1536187840000 fifa
4 qid:1 1:6.3233795 2:11.144586 3:0.0 4:0.99658567 5:1.0154246 6:1.475652 # 1538107776000 fifa
4 qid:2 1:0.0 2:6.58323 3:4.5 4:0.99142087 5:1.0787467 6:1.1217693 # 1538101584000 basketball
0 qid:2 1:0.0 2:5.237752 3:4.0 4:0.99915665 5:0.0 6:0.0 # 1538152581000 basketball
1 qid:2 1:0.0 2:5.9230056 3:4.0 4:0.9891866 5:0.8548865 6:0.8889811 # 1538360829000 basketball
1 qid:2 1:0.0 2:4.897768 3:5.0 4:0.19836442 5:0.0 6:0.0 # 1537838613000 basketball
0 qid:2 1:0.0 2:5.237752 3:4.0 4:0.6326628 5:0.0 6:0.0 # 1537838447000 basketball
1 qid:2 1:0.0 2:5.3618174 3:3.0 4:0.259035 5:0.0 6:0.0 # 1538781628000 basketball
1 qid:2 1:0.0 2:7.574521 3:5.0 4:0.7088474 5:0.0 6:0.0 # 1535173832000 basketball
0 qid:2 1:0.0 2:5.237752 3:4.5 4:0.9410281 5:0.0 6:0.0 # 1538518103000 basketball
1 qid:2 1:0.0 2:7.3292704 3:4.5 4:0.99142087 5:1.0787467 6:1.1217693 # 1538619230000 basketball
1 qid:2 1:0.0 2:9.499237 3:4.0 4:0.27305022 5:0.0 6:0.0 # 1536946885000 basketball
1 qid:3 1:5.0056663 2:7.690156 3:5.0 4:0.18467863 5:0.0 6:0.0 # 1538509528000 god
1 qid:3 1:4.608784 2:7.9265423 3:5.0 4:0.9016471 5:0.5393733 6:0.8889811 # 1536297114000 god
1 qid:3 1:4.2702136 2:8.228076 3:5.0 4:0.34165436 5:0.0 6:0.0 # 1535775896000 god
1 qid:3 1:0.0 2:7.043888 3:0.0 4:0.7977039 5:0.0 6:0.0 # 1537244713000 god
4. Train and test model
• Making use of Ranklib
• Can specify separate train, validation and test set
• Can normalise feature sets
Models using Ranklib
MART
Multiple Additive Regression Trees, a gradient boosting machine. Can be used for
regression as well as classification.
RankNet
Compare two feature vectors using stochastic gradient descent with the help of a cost
function.
RankBoost
Based on AdaBoost, combining many weak rankings into a single highly accurate ranking.
Is pairwise comparison
AdaRank
Combines a number of weak learners in a linear way. Builds on AdaBoost, but directed
more at ranking.
Coordinate Ascent
Optimises one parameter at a time, keeping the other constant. Done iteratively until
some convergence criteria is met.
LambdaRank
Optimisation of RankNet that only looks at the gradients represented by arrows
indicating how much they need to move up or down
LambdaMART Combines using the gradients of LambdaRank and the use of MART
ListNet
Uses a list wise loss function, a neural network and gradient descent. Similar to RankNet,
only difference is List versus Pair loss functions.
Random Forests
Number of trees to vote for the most popular class for a vector of features. One tree
would not be better than a random choice, but a forest is
Linear Regression
Most of the times to simplistic for the learning to rank problem with lots of features, but
good to have available to at least try
Evaluation metrics
MAP Mean Average Precision:The average of all P@k
DCG@k Discounted Cumulative Gain:Add relevance of all documents discounted
by the position of the document making the 1st document more important
NDCG@k Normalised DCG:A DCG with a value between 0 and 1, normalised by the
highest score.
P@k Percentage of relevant documents of this top K
RR@k Reciprocal Rank: 1/K where K is the first relevant document
ERR@k Expected Reciprocal Rank: discounts documents that are below a very
relevant document.
Training results
MAP NDCG@10 DCG@10 P@10 RR@10 ERR@10
MART 0.999 0.979 38.7047 0.9333 1.0 0.9618
RankNet 0.8806 0.8831 33.298 0.8933 0.9 0.5988
RankBoost 1.0 1.0 39.8699 0.9333 1.0 0.9629
AdaRank (List) 0.9493 0.8271 31.4686 0.8733 0.8 0.7597
Coordinate Ascent 0.9446 0.8973 36.3888 0.9133 1.0 0.96
LambdaRank 0.9562 0.7079 35.7277 0.8933 0.9 0.7598
LambdaMART 0.9725 0.9805 38.788 0.9333 1.0 0.9818
ListNet (List) 0.9496 0.8973 33.1995 0.8933 1.0 0.8229
Random Forests 0.999 0.979 38.7047 0.9333 1.0 0.9618
Linear Regression 0.9463 0.8602 34.1433 0.8933 0.9 0.7906
5. Deploy the model
• The model including learned parameters is stored in elasticsearch.
• Using the plugin we can now re-rank the top-n results
GET ecommproduct/_search
{
"query": {
"multi_match": {
"query": “call of duty",
"fields": ["title", "description"]
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"sltr": {
"params": {"keywords": “call of duty"},
"model": "test_6"
}
}
}
}
}
DEMO
6. Feedback Loop
• Register clicks by users (and other events)
• Use click data for predicting labels
Click models
• Random Click Model -> Every document has the same chance of being
clicked
• Click Through Rate Model -> Uses the fact that the first document is
clicked far more than the second document
• Cascade Model -> A click in the third item also tells us something about
the first two items. Only one click per session is assumed.
• Dynamic Bayesian Network Model -> Supports multiple clicks in a search
session and the difference in actual relevance of a document.
http://bit.ly/click-model
Dynamic Bayesian Network
Ei-1 Ei Ei+1
Ci
Ai Si
au su
http://bit.ly/dbn-clickmodel
Ei - Did the user examine the url
Ai - Was the user attracted by the url
Ci - Did the user click the url
Si - Was the user satisfied with the landing page
au - Probability of being attracted by the url
su - Probability of being satisfied by landing page
Logs
Queries
Clicks
Learning
data
varepsilon
Judgments
Webshop
Query id 82369cad-7ca3-4b69-ba8c-2eaf16446e55
Query Text call of duty
Document id’s
["1538784192000","1537838838000","1538799761000","1
538800073000","1536295676000","1536295271000","153
6296646000","1536292134000","1536292118000","15362
95845000"]
Clicks [1,0,0,0,1,0,0,0,1,0]
Python
…
fifa:1538694141000:2
fifa:1538781503000:0
fifa:1536293937000:2
fifa:1536022097000:2
fifa:1536187840000:2
fifa:1538107776000:4
call of duty:1536295845000:3
call of duty:1537838838000:2
call of duty:1538784192000:4
call of duty:1536295271000:2
call of duty:1538799761000:2
call of duty:1538800073000:2
call of duty:1536296646000:2
call of duty:1536295676000:3
call of duty:1536292134000:4
call of duty:1536292118000:4
…
# qid:1: fifa
# qid:2: basketball
# qid:3: god
# qid:4: football
# qid:5: call of duty
# qid:6: marvel
#
#
2 qid:1 # 1538694141000
0 qid:1 # 1538781503000
2 qid:1 # 1536293937000
2 qid:1 # 1536022097000
2 qid:1 # 1536187840000
4 qid:1 # 1538107776000
4 qid:2 # 1538101584000
0 qid:2 # 1538152581000
1 qid:2 # 1538360829000
1 qid:2 # 1537838613000
0 qid:2 # 1537838447000
1 qid:2 # 1538781628000
1 qid:2 # 1535173832000
0 qid:2 # 1538518103000
DEMO
Questions
https://www.elastic.co
https://github.com/o19s/elasticsearch-learning-to-rank
https://github.com/varepsilon/clickmodels
https://www.linkedin.com/
in/byronvoorbach/
https://www.linkedin.com/
in/jettro/
“Please rate our
talk in the official
J-FALL app”
#JFALL

More Related Content

What's hot

Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
Elk devops
Elk devopsElk devops
Elk devops
Ideato
 
Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
Hossein Shemshadi
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
Edureka!
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Flink Forward
 
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Databricks
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Databricks
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
Sease
 
Machine Learning and the Elastic Stack
Machine Learning and the Elastic StackMachine Learning and the Elastic Stack
Machine Learning and the Elastic Stack
Yann Cluchey
 
Eland: A Python client for data analysis and exploration
Eland: A Python client for data analysis and explorationEland: A Python client for data analysis and exploration
Eland: A Python client for data analysis and exploration
Elasticsearch
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
YuHsuan Chen
 
What is Python? | Edureka
What is Python? | EdurekaWhat is Python? | Edureka
What is Python? | Edureka
Edureka!
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
OSMC 2022 | Logstash, Beats, Elastic Agent, Open Telemetry — what’s the right...
OSMC 2022 | Logstash, Beats, Elastic Agent, Open Telemetry — what’s the right...OSMC 2022 | Logstash, Beats, Elastic Agent, Open Telemetry — what’s the right...
OSMC 2022 | Logstash, Beats, Elastic Agent, Open Telemetry — what’s the right...
NETWAYS
 
GraphQL Basics
GraphQL BasicsGraphQL Basics
GraphQL Basics
LeanIX GmbH
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Databricks
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Log analysis with elastic stack
Log analysis with elastic stackLog analysis with elastic stack
Log analysis with elastic stack
Bangladesh Network Operators Group
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
Knoldus Inc.
 

What's hot (20)

Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Elk devops
Elk devopsElk devops
Elk devops
 
Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
 
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
 
Machine Learning and the Elastic Stack
Machine Learning and the Elastic StackMachine Learning and the Elastic Stack
Machine Learning and the Elastic Stack
 
Eland: A Python client for data analysis and exploration
Eland: A Python client for data analysis and explorationEland: A Python client for data analysis and exploration
Eland: A Python client for data analysis and exploration
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
What is Python? | Edureka
What is Python? | EdurekaWhat is Python? | Edureka
What is Python? | Edureka
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
OSMC 2022 | Logstash, Beats, Elastic Agent, Open Telemetry — what’s the right...
OSMC 2022 | Logstash, Beats, Elastic Agent, Open Telemetry — what’s the right...OSMC 2022 | Logstash, Beats, Elastic Agent, Open Telemetry — what’s the right...
OSMC 2022 | Logstash, Beats, Elastic Agent, Open Telemetry — what’s the right...
 
GraphQL Basics
GraphQL BasicsGraphQL Basics
GraphQL Basics
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Log analysis with elastic stack
Log analysis with elastic stackLog analysis with elastic stack
Log analysis with elastic stack
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 

Similar to Learning to rank search results

Supercharging your Organic CTR
Supercharging your Organic CTRSupercharging your Organic CTR
Supercharging your Organic CTR
Phil Pearce
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
Trey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
Chetan Khatri
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Chester Chen
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Chetan Khatri
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
Modern Data Stack France
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
NETWAYS
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
SigOpt
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...Fabio Franzini
 
Useful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvmUseful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvm
Anton Shapin
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
Anass Bensrhir - Senior Data Scientist
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
Vijayananda Mohire
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
Vijayananda Mohire
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Jim Dowling
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Sease
 

Similar to Learning to rank search results (20)

Supercharging your Organic CTR
Supercharging your Organic CTRSupercharging your Organic CTR
Supercharging your Organic CTR
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...
 
Useful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvmUseful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvm
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
 

More from Jettro Coenradie

Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rankCombining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
Jettro Coenradie
 
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rankCombining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
Jettro Coenradie
 
Real-time data analysis using ELK
Real-time data analysis using ELKReal-time data analysis using ELK
Real-time data analysis using ELK
Jettro Coenradie
 
Search: the right tool, but what is the job. At nosqlmatters amsterdam 2013
Search: the right tool, but what is the job. At nosqlmatters amsterdam 2013Search: the right tool, but what is the job. At nosqlmatters amsterdam 2013
Search: the right tool, but what is the job. At nosqlmatters amsterdam 2013
Jettro Coenradie
 
Creating polyglot and scalable applications on the jvm using Vert.x
Creating polyglot and scalable applications on the jvm using Vert.xCreating polyglot and scalable applications on the jvm using Vert.x
Creating polyglot and scalable applications on the jvm using Vert.x
Jettro Coenradie
 
Sharing content between hippo and solr
Sharing content between hippo and solrSharing content between hippo and solr
Sharing content between hippo and solr
Jettro Coenradie
 

More from Jettro Coenradie (6)

Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rankCombining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
 
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rankCombining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
 
Real-time data analysis using ELK
Real-time data analysis using ELKReal-time data analysis using ELK
Real-time data analysis using ELK
 
Search: the right tool, but what is the job. At nosqlmatters amsterdam 2013
Search: the right tool, but what is the job. At nosqlmatters amsterdam 2013Search: the right tool, but what is the job. At nosqlmatters amsterdam 2013
Search: the right tool, but what is the job. At nosqlmatters amsterdam 2013
 
Creating polyglot and scalable applications on the jvm using Vert.x
Creating polyglot and scalable applications on the jvm using Vert.xCreating polyglot and scalable applications on the jvm using Vert.x
Creating polyglot and scalable applications on the jvm using Vert.x
 
Sharing content between hippo and solr
Sharing content between hippo and solrSharing content between hippo and solr
Sharing content between hippo and solr
 

Recently uploaded

Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Orkestra
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
IP ServerOne
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Matjaž Lipuš
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
Sebastiano Panichella
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
faizulhassanfaiz1670
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
gharris9
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
Faculty of Medicine And Health Sciences
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Sebastiano Panichella
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Sebastiano Panichella
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
Vladimir Samoylov
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
khadija278284
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
eCommerce Institute
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
Access Innovations, Inc.
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
OWASP Beja
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
Howard Spence
 

Recently uploaded (17)

Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
 

Learning to rank search results

  • 1. LEARNING TO RANK SEARCH RESULTS Jettro Coenradie & Byron Voorbach #JFALL COMBINE MACHINE LEARNING WITH SEARCH
  • 3. UranusSaturnNeptuneJupiter Mars VenusEarth Mercury UranusSaturn NeptuneJupiter MarsVenusEarth Mercury https://solarsystem.nasa.gov/resources/all/
  • 4. LEARNING TO RANK SEARCH RESULTS COMBINE MACHINE LEARNING WITH SEARCH
  • 5. Jettro Coenradie @jettroCoenradie https://www.linkedin.com/in/jettro/ https://github.com/jettro • Fellow at Luminis Amsterdam • specialised in (Elastic) search • experimenting with Machine Learning
  • 6. Byron Voorbach @byronvoorbach https://www.linkedin.com/in/byronvoorbach/ https://github.com/byronvoorbach • Search & Data Engineer at Luminis Amsterdam • building and optimising search engines
  • 8.
  • 9.
  • 10.
  • 11.
  • 13. DEMO
  • 14. How do we get from ‘Call of duty’ to a list of games?
  • 17. Recap: Inverted Index Terms doc_ids ttf fifa 1 1 call 2 1 of 2 1 duty 2 1 god 3 1 war 3 1 pes 4 1 doodle 5 1 Doc Id Title 1 Fifa 2 Call of Duty 3 God of War 4 PES 5 Doodle God 2,3 2 3,5 2
  • 18. { "title": “Call of Duty®: Black Ops 4", "image": "rs-137178-883f5fe955b2745cd539.jpg", "description": "<p>Digital Standard Edition includes: - 1,100 Call of Duty® Points* - Digital Edition Bonus Items: -- Specialist Outfit for all Specialists -- Gesture -- Calling Card, Emblem, Sticker and Tag inspired by the iconic Call of Duty®: Black Ops 4 skull. Black Ops is back! Featuring gritty, grounded, fluid Multiplayer combat, the biggest Zombies offering ever with three full undead adventures at launch, and Blackout, where the universe of Black Ops comes to life in one massive battle royale experience.</p>", "rating": 4.5, "numberOfRatings": 279, "vendor": "Activision", "price": 59.99, "releaseDate": “2018-12-10“, "id": 350640 }
  • 19. curl -XGET "http://localhost:9200/ecommproduct/_search" -H 'Content-Type: application/json' -d' { "query": { "multi_match": { "query": "call of duty", "fields": [ "title", "description" ] } } }'
  • 22. Change Ranking of results { "query": { "bool": { "should": [ { "match": { “title": “call of duty", "boost": 2 } }, { "match": { “description”: “call of duty" } } ] } } } Field centric boosting
  • 23. Change Ranking of results Function score: field value { "query": { "function_score": { "query": { "match": { “title”: “call of duty" } }, "functions": [ { "field_value_factor": { "field": “numberOfRatings", "modifier": "log1p" } } ] } } }'
  • 24. Change Ranking of results Function score: decay { "query": { "function_score": { "query": { "match": { “title: “call of duty" } }, "gauss": { “releaseDate": { "scale": "1y", "offset": "6m", "decay": 0.5 } } } } }
  • 25. Learning To Rank Learning to rank or machine-learned ranking is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. ~ Wikipedia http://bit.ly/ltr-wp
  • 26. Machine LearningSupervised 4 rooms 100 m2 € 200k 6 rooms 150 m2 € 350k 3 rooms 500 m2 € 750k Input Labelled Output Model Train Predicted € 210k € 370k € 800k Error
  • 27. Learning to rank X Query 1 Query 2 Query 3 Query 4 Model f(X) 5 15 67 3 17 23 3 7 88 45 3 27 25 23 5 6 99 22 27 33 YY’ 5 3 67 15 17 23 3 17 8 45 3 27 25 6 15 6 9 32 27 33 Predict Error
  • 28. Model Evaluation Types (Errors) • Binary relevance (MAP, Precision) • Graded relevance, position based (DCG, NDCG) • Only discounts based on relevance • Graded relevance, cascade based (ERR) • Discounts based on user interaction with the results http://bit.ly/eval-metric
  • 29. MAP using Binary relevance 23 9 88 33 45 YI Average Precision 1 0.67 0.6 fifa X relevant not relevant = (1 + 0.67 + 0.6) / 3 = 0.76 MAP
  • 30. DCG using Graded relevance 23 rel=3 9 rel=2 88 rel=4 33 rel=1 45 rel=0 YI Discounted Cumulative Gain 3 5 + 4/log2(3) = 7.57 8.07 fifa X Documents ranked on 0 - 4 relevance scale 3 + 2/log2(2) = 5 7.57 + 1/log2(4) = 8.07
  • 31. NDCG using Graded relevance 23 rel=3 9 rel=2 88 rel=4 33 rel=1 45 rel=0 YI DCG 3 7.57 8.07 fifa X Documents ranked on 0 - 4 relevance scale 5 8.07 23 rel=3 9 rel=2 88 rel=4 33 rel=1 45 rel=0 Y MaxDCG 4 8.28 8.77 7 8.77 NDCG 0.75 0.91 0.92 0.71 0.92
  • 32. Learning to rank - Model X Model Y Y’ Algorithm Parameters Cost function
  • 33. LTR: Steps to take 1. Create Judgement List (Ground Truth) 2. Define features for the model 3. Log features during usage 4. Training and testing the model 5. Deploying and using the model 6. Feedback loop
  • 34. 1. Judgement List Obtain labelled data to train the model
  • 35. Judgement List: Expert Panel • Time consuming and Expensive • Error prone due to different judgements http://bit.ly/ebay-hum-judge
  • 40. Judgement List: Implicit Feedback • Log user behaviour • Compare actual clicks versus expected clicks • A click is not a relevance judgement
  • 41. Judgement List: Implicit Feedback • Use as a signal to the ranking algorithm -> Feature • Use as Label to train the model -> Ground truth
  • 42. Using the LTR Plugin and the python scripts https://github.com/o19s/elasticsearch-learning-to-rank
  • 43. Our Judgement List # grade (0-4) queryid docId title # # Add your keyword strings below, the feature script will # Use them to populate your query templates # # qid:1: fifa # qid:2: football # qid:3: call of duty # qid:4: marvel # qid:5: basketball # qid:6: god # 4 qid:1 # 1538781503000 FIFA 18 2 qid:1 # 1536187840000 EA SPORTS FIFA 16 3 qid:1 # 1538107776000 FIFA 19 Ultimate Edition 4 qid:1 # 1538694141000 FIFA 19 3 qid:1 # 1536293937000 EA SPORTS FIFA 17 Standard Edition 1 qid:1 # 1536022097000 FIFA 15 3 qid:2 # 1538509479000 PRO EVOLUTION SOCCER 2018 4 qid:2 # 1535257488000 Pro Evolution Soccer 2018 FC Barcelona Edition 1 qid:2 # 1536293937000 EA SPORTS FIFA 17 Standard Edition 1 qid:2 # 1538781636000 2MD: VR Football
  • 44. 2. features for the model • Raw Term Statistics • Document Frequency • Total Term Frequency • Also max, min, sum (in case of multiple terms, fields) • Elasticsearch queries
  • 45. 2. features for the model { "query": { "match": { "title": "{{keywords}}" } } } { "query": { "match": { "description”: "{{keywords}}" } } }
  • 46. { "query": { "function_score": { "functions": [ { "field_value_factor": { "field": "rating", "missing": 0 } } ] } } }
  • 47. { "query": { "function_score": { "gauss": { "releaseDate": { "scale": "720d", "decay": 0.25 } } } } }
  • 48. { "query": { "nested": { "path": "basket", "query": { "function_score": { "query": { "term": { "basket.term": { "value": "{{keywords}}" } } }, "functions": [ { "field_value_factor": { "field": "basket.clicks", "modifier": "log1p" } } ] } } } } }
  • 49. { "query": { "nested": { "path": "clicks", "query": { "function_score": { "query": { "term": { "clicks.term": { "value": "{{keywords}}" } } }, "functions": [ { "field_value_factor": { "field": "clicks.clicks", "modifier": "log1p" } } ] } } } } }
  • 50. 3. Log features 2 qid:1 1:7.7917995 2:10.646265 3:4.5 4:0.99658567 5:1.0154246 6:1.475652 # 1538694141000 fifa 0 qid:1 1:7.7917995 2:12.036756 3:4.0 4:0.6522283 5:0.0 6:0.0 # 1538781503000 fifa 2 qid:1 1:5.3206625 2:10.542777 3:4.5 4:0.20758471 5:0.0 6:0.0 # 1536293937000 fifa 2 qid:1 1:7.7917995 2:9.694633 3:4.5 4:0.00240296 5:0.0 6:0.0 # 1536022097000 fifa 2 qid:1 1:6.3233795 2:11.195704 3:4.5 4:0.03137532 5:0.0 6:0.0 # 1536187840000 fifa 4 qid:1 1:6.3233795 2:11.144586 3:0.0 4:0.99658567 5:1.0154246 6:1.475652 # 1538107776000 fifa 4 qid:2 1:0.0 2:6.58323 3:4.5 4:0.99142087 5:1.0787467 6:1.1217693 # 1538101584000 basketball 0 qid:2 1:0.0 2:5.237752 3:4.0 4:0.99915665 5:0.0 6:0.0 # 1538152581000 basketball 1 qid:2 1:0.0 2:5.9230056 3:4.0 4:0.9891866 5:0.8548865 6:0.8889811 # 1538360829000 basketball 1 qid:2 1:0.0 2:4.897768 3:5.0 4:0.19836442 5:0.0 6:0.0 # 1537838613000 basketball 0 qid:2 1:0.0 2:5.237752 3:4.0 4:0.6326628 5:0.0 6:0.0 # 1537838447000 basketball 1 qid:2 1:0.0 2:5.3618174 3:3.0 4:0.259035 5:0.0 6:0.0 # 1538781628000 basketball 1 qid:2 1:0.0 2:7.574521 3:5.0 4:0.7088474 5:0.0 6:0.0 # 1535173832000 basketball 0 qid:2 1:0.0 2:5.237752 3:4.5 4:0.9410281 5:0.0 6:0.0 # 1538518103000 basketball 1 qid:2 1:0.0 2:7.3292704 3:4.5 4:0.99142087 5:1.0787467 6:1.1217693 # 1538619230000 basketball 1 qid:2 1:0.0 2:9.499237 3:4.0 4:0.27305022 5:0.0 6:0.0 # 1536946885000 basketball 1 qid:3 1:5.0056663 2:7.690156 3:5.0 4:0.18467863 5:0.0 6:0.0 # 1538509528000 god 1 qid:3 1:4.608784 2:7.9265423 3:5.0 4:0.9016471 5:0.5393733 6:0.8889811 # 1536297114000 god 1 qid:3 1:4.2702136 2:8.228076 3:5.0 4:0.34165436 5:0.0 6:0.0 # 1535775896000 god 1 qid:3 1:0.0 2:7.043888 3:0.0 4:0.7977039 5:0.0 6:0.0 # 1537244713000 god
  • 51. 4. Train and test model • Making use of Ranklib • Can specify separate train, validation and test set • Can normalise feature sets
  • 52. Models using Ranklib MART Multiple Additive Regression Trees, a gradient boosting machine. Can be used for regression as well as classification. RankNet Compare two feature vectors using stochastic gradient descent with the help of a cost function. RankBoost Based on AdaBoost, combining many weak rankings into a single highly accurate ranking. Is pairwise comparison AdaRank Combines a number of weak learners in a linear way. Builds on AdaBoost, but directed more at ranking. Coordinate Ascent Optimises one parameter at a time, keeping the other constant. Done iteratively until some convergence criteria is met. LambdaRank Optimisation of RankNet that only looks at the gradients represented by arrows indicating how much they need to move up or down LambdaMART Combines using the gradients of LambdaRank and the use of MART ListNet Uses a list wise loss function, a neural network and gradient descent. Similar to RankNet, only difference is List versus Pair loss functions. Random Forests Number of trees to vote for the most popular class for a vector of features. One tree would not be better than a random choice, but a forest is Linear Regression Most of the times to simplistic for the learning to rank problem with lots of features, but good to have available to at least try
  • 53. Evaluation metrics MAP Mean Average Precision:The average of all P@k DCG@k Discounted Cumulative Gain:Add relevance of all documents discounted by the position of the document making the 1st document more important NDCG@k Normalised DCG:A DCG with a value between 0 and 1, normalised by the highest score. P@k Percentage of relevant documents of this top K RR@k Reciprocal Rank: 1/K where K is the first relevant document ERR@k Expected Reciprocal Rank: discounts documents that are below a very relevant document.
  • 54. Training results MAP NDCG@10 DCG@10 P@10 RR@10 ERR@10 MART 0.999 0.979 38.7047 0.9333 1.0 0.9618 RankNet 0.8806 0.8831 33.298 0.8933 0.9 0.5988 RankBoost 1.0 1.0 39.8699 0.9333 1.0 0.9629 AdaRank (List) 0.9493 0.8271 31.4686 0.8733 0.8 0.7597 Coordinate Ascent 0.9446 0.8973 36.3888 0.9133 1.0 0.96 LambdaRank 0.9562 0.7079 35.7277 0.8933 0.9 0.7598 LambdaMART 0.9725 0.9805 38.788 0.9333 1.0 0.9818 ListNet (List) 0.9496 0.8973 33.1995 0.8933 1.0 0.8229 Random Forests 0.999 0.979 38.7047 0.9333 1.0 0.9618 Linear Regression 0.9463 0.8602 34.1433 0.8933 0.9 0.7906
  • 55. 5. Deploy the model • The model including learned parameters is stored in elasticsearch. • Using the plugin we can now re-rank the top-n results
  • 56. GET ecommproduct/_search { "query": { "multi_match": { "query": “call of duty", "fields": ["title", "description"] } }, "rescore": { "window_size": 100, "query": { "rescore_query": { "sltr": { "params": {"keywords": “call of duty"}, "model": "test_6" } } } } }
  • 57. DEMO
  • 58. 6. Feedback Loop • Register clicks by users (and other events) • Use click data for predicting labels
  • 59. Click models • Random Click Model -> Every document has the same chance of being clicked • Click Through Rate Model -> Uses the fact that the first document is clicked far more than the second document • Cascade Model -> A click in the third item also tells us something about the first two items. Only one click per session is assumed. • Dynamic Bayesian Network Model -> Supports multiple clicks in a search session and the difference in actual relevance of a document. http://bit.ly/click-model
  • 60. Dynamic Bayesian Network Ei-1 Ei Ei+1 Ci Ai Si au su http://bit.ly/dbn-clickmodel Ei - Did the user examine the url Ai - Was the user attracted by the url Ci - Did the user click the url Si - Was the user satisfied with the landing page au - Probability of being attracted by the url su - Probability of being satisfied by landing page
  • 62.
  • 63. Query id 82369cad-7ca3-4b69-ba8c-2eaf16446e55 Query Text call of duty Document id’s ["1538784192000","1537838838000","1538799761000","1 538800073000","1536295676000","1536295271000","153 6296646000","1536292134000","1536292118000","15362 95845000"] Clicks [1,0,0,0,1,0,0,0,1,0]
  • 64. Python … fifa:1538694141000:2 fifa:1538781503000:0 fifa:1536293937000:2 fifa:1536022097000:2 fifa:1536187840000:2 fifa:1538107776000:4 call of duty:1536295845000:3 call of duty:1537838838000:2 call of duty:1538784192000:4 call of duty:1536295271000:2 call of duty:1538799761000:2 call of duty:1538800073000:2 call of duty:1536296646000:2 call of duty:1536295676000:3 call of duty:1536292134000:4 call of duty:1536292118000:4 … # qid:1: fifa # qid:2: basketball # qid:3: god # qid:4: football # qid:5: call of duty # qid:6: marvel # # 2 qid:1 # 1538694141000 0 qid:1 # 1538781503000 2 qid:1 # 1536293937000 2 qid:1 # 1536022097000 2 qid:1 # 1536187840000 4 qid:1 # 1538107776000 4 qid:2 # 1538101584000 0 qid:2 # 1538152581000 1 qid:2 # 1538360829000 1 qid:2 # 1537838613000 0 qid:2 # 1537838447000 1 qid:2 # 1538781628000 1 qid:2 # 1535173832000 0 qid:2 # 1538518103000
  • 65. DEMO
  • 67. “Please rate our talk in the official J-FALL app” #JFALL