SlideShare a Scribd company logo
1 of 35
Download to read offline
PyData Global 2023
Sujit Pal, Elsevier Health
ORCID Id: https://orcid.org/0000-0002-6225-110X
Building Learning to
Rank models for search
using Large Language
Models
2023
• Work at the intersection of search and
machine learning
• Interested in Information Retrieval, Natural
Language Processing, Knowledge Graphs
and Machine Learning, and now LLMs and
Generative AI
About Me
2
sujit.pal@elsevier.com
https://www.linkedin.com/in/sujitpal
@palsujit@hachyderm.io
Agenda
OVERVIEW COMPONENTS FUTURE
WORK
3
Overview
4
Basic Idea (what)
Use LLMs to
generate relevance
judgements
Use relevance
judgements to train
LTR models
Use LTR models to
rerank query results
Profit!
5
Rationale (why)
LTR
Easy way to jumpstart relevance
model
Practical for situations where
judgement data is cheap and plentiful
LLM
Potentially unlimited source of
judgement data
70B+ LLM models capable of
mimicking human preferences
6
Large language models can accurately predict searcher preferences (Thomas et al, 2023)
Rationale (results)
Model MAP@10
BM25 (Elasticsearch OOB) 6.42
Cosine Sim. (QDrant OOB) 8.35
CKPOC Heuristics 8.50
Baselines
Model MAP@10
Pointwise (regression) 8.12
Pairwise (RankNet) 8.38
Pairwise (LambdaRank) 7.92
Pairwise (LambdaMART) 7.58
LTR models
7
Workflow (how)
Training
Gold set queries
Query logs
q (q, dk)
(q, dk,
yk) (Xk, yk)
q (q, dk)
(q, dk,
Xk)
(q, dk,
Xk, y’k)
(q, dk,
yk)
Inference Evaluation
Elasticsearch
Index (BM25)
Elasticsearch
Index (BM25)
Feature
Generator
Feature
Generator
Label
Generator
Label
Generator
LTR Model
Trained
LTR Model
Reranker
Query
Sampler
P@10
8
Components
9
Query Sampler
Training
Gold set queries
Query logs
q (q, dk)
(q, dk,
yk) (Xk, yk)
q (q, dk)
(q, dk,
Xk)
(q, dk,
Xk, y’k)
(q, dk,
yk)
Inference Evaluation
Elasticsearch
Index (BM25)
Elasticsearch
Index (BM25)
Feature
Generator
Feature
Generator
Label
Generator
Label
Generator
LTR Model
Trained
LTR Model
Reranker
Query
Sampler
P@10
10
• Determine a set of representative queries system
expected to answer (for training LTR model)
• This was for a specialized search component to
answer long queries, so we sampled from our
query log
• Pretend #-tokens and #-concepts form a normal
distribution, calculate mean and standard
deviation
• Set up boundaries: mean ± s.d.
• Filter queries from query log whose #-tokens and
#-concepts fall within the (mean ± s.d.) boundary
Query Sampler
11
Label Generator
Training
Gold set queries
Query logs
q (q, dk)
(q, dk,
yk) (Xk, yk)
q (q, dk)
(q, dk,
Xk)
(q, dk,
Xk, y’k)
(q, dk,
yk)
Inference Evaluation
Elasticsearch
Index (BM25)
Elasticsearch
Index (BM25)
Feature
Generator
Feature
Generator
Label
Generator
Label
Generator
LTR Model
Trained
LTR Model
Reranker
Query
Sampler
P@10
12
Label Generation (pointwise)
q (q, dk)
(q, dk, yk)
Human: You are a medical expert tasked with
identifying if the provided DOCUMENT addresses the
information needs for the provided QUERY.
QUERY: `{query}`
DOCUMENT: `{document}`
Your RESPONSE should be:
- RELEVANT if the DOCUMENT addresses the
information needs for the QUERY
- IRRELEVANT otherwise
Explain your REASONING.
Format the output as follows:
<output>
<response>RESPONSE</response>
<reasoning>REASONING</reasoning>
</output>
Assistant: <output>
Prompt
used
RELEVANT or IRRELEVANT
(binary)
13
Label Generation (pairwise)
Human: You are a medical expert who has to judge which
of two DOCUMENTs shown below are relevant for the
given QUERY. Provide your JUDGEMENT as DOCUMENT-
1 or DOCUMENT-2 depending on which DOCUMENT you
think is relevant for the QUERY.
QUERY: `{query}`
DOCUMENT-1: `{document_1}`
DOCUMENT-2: `{document_2}`
Explain your REASONING.
Format your output as follows:
<output>
<response>JUDGEMENT</response>
<reasoning>REASONING</reasoning>
</output>
Assistant: <output>
q (q, dk)
(q, dki, dkj, yk)
(q, dki, dkj)
Generate
pairs
Prompt
used
DOCUMENT-1 or
DOCUMENT-2
14
Label Generation (listwise)
q (q, dk)
(q, dk, yk)
Human: You are a medical expert tasked with assigning a SCORE
indicating how relevant the given DOCUMENT is to the given
QUERY.
QUERY: `{query}`
DOCUMENT: `{document}`
Assign the SCORE as follows:
1 - DOCUMENT is completely unrelated to QUERY
2 - DOCUMENT has some relation to QUERY, but mostly off-topic
3 - DOCUMENT is relevant to QUERY, but lacking focus or key
details
4 - DOCUMENT is highly relevant, addressing the main aspects of
QUERY
5 - DOCUMENT is directly relevant and precisely targeted to QUERY
Explain your REASONING for assigning the SCORE.
Format the output as follows:
<output>
<score>SCORE</score>
<reasoning>REASONING</reasoning>
</output>
Assistant: <output>
5-point scale (1-5)
(numeric)
Prompt
used
15
Feature Generator
Training
Gold set queries
Query logs
q (q, dk)
(q, dk,
yk) (Xk, yk)
q (q, dk)
(q, dk,
Xk)
(q, dk,
Xk, y’k)
(q, dk,
yk)
Inference Evaluation
Elasticsearch
Index (BM25)
Elasticsearch
Index (BM25)
Feature
Generator
Feature
Generator
Label
Generator
Label
Generator
LTR Model
Trained
LTR Model
Reranker
Query
Sampler
P@10
16
Feature Generation
Query Features
Document
Features
Query-Document
Features
#-tokens in query
Total Term Frequency (TTF) for field
TF (min, max, mean, var) for field
TF*IDF (min, max, mean, var) for field
#-overlapping query tokens w/field
#-overlapping query concepts w/field
#-overlapping query semantic groups w/field
BM25 scores for matching query w/field
Cosine similarity between query and field
Idea Source: Learning to Rank Datasets page from Microsoft Research
17
Feature Generation
Query Features
Document
Features
Query-Document
Features
#-tokens in query
Total Term Frequency (TTF) for field
TF (min, max, mean, var) for field
TF*IDF (min, max, mean, var) for field
#-overlapping query tokens w/field
#-overlapping query concepts w/field
#-overlapping query semantic groups w/field
BM25 scores for matching query w/field
Cosine similarity between query and field
Document Fields
- title
- section title
- breadcrumbs
- text
18
Feature Generation
Query Features
Document
Features
Query-Document
Features
#-tokens in query
Total Term Frequency (TTF) for field
TF (min, max, mean, var) for field
TF*IDF (min, max, mean, var) for field
#-overlapping query tokens w/field
#-overlapping query concepts w/field
#-overlapping query semantic groups w/field
BM25 scores for matching query w/field
Cosine similarity between query and field
Multiple point estimates for same
feature
19
Feature Generation
Query Features
Document
Features
Query-Document
Features
#-tokens in query
Total Term Frequency (TTF) for field
TF (min, max, mean, var) for field
TF*IDF (min, max, mean, var) for field
#-overlapping query tokens w/field
#-overlapping query concepts w/field
#-overlapping query semantic groups w/field
BM25 scores for matching query w/field
Cosine similarity between query and field
Count
Custom
NER
61 features in all
20
Model
Training
Gold set queries
Query logs
q (q, dk)
(q, dk,
yk) (Xk, yk)
q (q, dk)
(q, dk,
Xk)
(q, dk,
Xk, y’k)
(q, dk,
yk)
Inference Evaluation
Elasticsearch
Index (BM25)
Elasticsearch
Index (BM25)
Feature
Generator
Feature
Generator
Label
Generator
Label
Generator
LTR Model
Trained
LTR Model
Reranker
Query
Sampler
P@10
21
• Pointwise Models: take query and
document as input and return a
relevance judgment between 0 and 1.
• Pairwise Models: take a query and
pair of documents as input and return
a judgment between -1 and 1
• Listwise Models (not used): take a
query and list of documents and return
list of documents ordered by relevance
• Feature generator takes query and
document and returns a feature vector
LTR Models Recap
Generate
features
Point-
wise
LTR
Model
query
doc judgment
Generate
features
Pairwise
LTR
Model
query
doc-1
judgment
doc-2
22
• Pointwise
− 2-layer FCN for binary classification, uses binary relevance data
• RankNet
− 3-layer Siamese network for binary classification, uses pairwise
relevance data
• LambdaRank
− Pairwise model, needs listwise (scored) input, internally
converts to pairwise
− Code adapted from houchenyu/L2R
− Also available via XGBoost using rank:pairwise objective
• LambdaMART
− Also available via XGBoost using rank:ndcg objective
Model Performance
binary pairwise scored
23
Model Implementations
24
Pointwise Pairwise
RankNet LambdaRank LambdaMART
• Two-layer FCN
• Binary Classifier
• Scores = Pred.Prob
• Binary Labels
• 3-layer Siamese
Network
• Binary Classifier
• Pairwise Labels
• Adapted from houchenyu/L2R
• Returns numeric Score
• Scored (listwise) labels
• Internally converted to pairwise
rank:pairwise rank:ndcg
Rank Matrix (RankNet only)
25
D1 D2 D3 D4 D5
D1 0 1 1 -1 1
D2 -1 0 -1 1 -1
D3 -1 1 0 1 -1
D4 1 -1 -1 0 -1
D5 -1 1 1 1 0
• D2 > D1
• D3 > D1
• D4 < D1
• D5 > D1
• D3 < D2
• D4 > D2
• D5 < D2
• D4 > D3
• D5 < D3
• D4 < D4
Given the following pair-wise rankings
Pairwise to Listwise – Pairwise Comparison Method (1000 minds)
Return sorted
list of
documents
total
1
-2
0
-2
2
sorted
D5
D1
D3
D2
D4
• RankNet
− Trains using gradient descent
− Gradient computed as ∂C/∂S, where C = cross-entropy,
penalizes difference in desired ranking vs actual ranking, and
S = model score
• LambdaRank
− Multiplies RankNet ∂C/∂S (ƛ) values by |ΔNDCG|, the change
in NDCG caused by swapping a pair of inputs
• LambdaMART
− Combines Gradient Boosting (MART = Multiple Additive
Regression Trees) with LambdaRank gradient computation
LTR Models Evolution
Paper ref: From RankNet to LambdaRank to LambdaMART – an Overview (Burges, 2010)
26
Evaluation
Training
Gold set queries
Query logs
q (q, dk)
(q, dk,
yk) (Xk, yk)
q (q, dk)
(q, dk,
Xk)
(q, dk,
Xk, y’k)
(q, dk,
yk)
Inference Evaluation
Elasticsearch
Index (BM25)
Elasticsearch
Index (BM25)
Feature
Generator
Feature
Generator
Label
Generator
Label
Generator
LTR Model
Trained
LTR Model
Reranker
Query
Sampler
P@10
27
• Generate top 50 results for query from ES index (lexical search)
• Re-rank using trained LTR model and return top 10 results
• Use LLM (same prompt as point-wise label generation) to determine
relevant / irrelevant judgments
• Aggregate judgments across results, i.e. 7 / 10 relevant  0.7 P@10
• Average P@10 scores across all eval queries  MAP@10
• Our application called for top 10 results equally ranked
• But pipeline could also generate ranked lists and compute rank-aware
metrics such as MRR@k or NDCG@k if needed
Evaluation
28
Conclusion and Future Work
29
• Low-effort way to quickly build medium to high relevance LTR relevance
models
• LLMs provide (relatively) cheap and plentiful judgment labels to train LTR
models
• Can be used to jumpstart development of search pipelines.
• Feature Engineering used to inject informative features from different
search modalities – lexical, vector, knowledge graph, etc.
Conclusions
30
• Human judgments hard to acquire, so using LLM makes sense
• Human vs LLM judgment have similar trend, accuracy: 71%, but LLMs more
”lenient” than human
• Overall correlation 0.43, but decreases with increasing scores
• Observation: LLM tries too hard to conclude “RELEVANT” by making leaps of
reasoning humans would not.
Alignment with human judgments
31
Active Learning
32
Image Credit: SuperAnnotate Webinars Page
1. Train LTR model with fully
automated pipeline
2. Deploy as re-ranker
3. Generate search results for user
queries
4. Identify low conference predictions
and re-annotate using human
experts
5. Retrain LTR model with additional
labels from step 4
6. Go to step 2
Ensemble of LLM Judges
33
• Prompt Engineering
• Additional guidelines based on
(human) expert feedback
• Few shot prompting
• Chain of Thought / Auto-
prompting
• Other advanced techniques, e.g.,
APE, self-consistency
• Prefix tuning
• Fine tuning
Prompt Engineering
34
More advanced prompting techniques in: Prompt Engineering Guide
elsevierlabs-os/build-ltr-models-using-llm

More Related Content

What's hot

Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented GenerationDataScienceConferenc1
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMárton Kodok
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
 
How does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveHow does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveSease
 
Question Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesQuestion Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesSujit Pal
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohNUS-ISS
 
Machine learning for customer classification
Machine learning for customer classificationMachine learning for customer classification
Machine learning for customer classificationAndrew Barnes
 
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013Cain Ransbottyn
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&MDatabricks
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scaleMaxim Salnikov
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
Unit 1 Fundamentals of Artificial Intelligence-Part I.pptx
Unit 1  Fundamentals of Artificial Intelligence-Part I.pptxUnit 1  Fundamentals of Artificial Intelligence-Part I.pptx
Unit 1 Fundamentals of Artificial Intelligence-Part I.pptxDrYogeshDeshmukh1
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Maxim Salnikov
 

What's hot (20)

Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
How does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveHow does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspective
 
Question Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesQuestion Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other Stories
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's Perspective
 
Webinar on ChatGPT.pptx
Webinar on ChatGPT.pptxWebinar on ChatGPT.pptx
Webinar on ChatGPT.pptx
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
 
How will development change with LLMs
How will development change with LLMsHow will development change with LLMs
How will development change with LLMs
 
Machine learning for customer classification
Machine learning for customer classificationMachine learning for customer classification
Machine learning for customer classification
 
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Unit 1 Fundamentals of Artificial Intelligence-Part I.pptx
Unit 1  Fundamentals of Artificial Intelligence-Part I.pptxUnit 1  Fundamentals of Artificial Intelligence-Part I.pptx
Unit 1 Fundamentals of Artificial Intelligence-Part I.pptx
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 

Similar to Building Learning to Rank (LTR) search reranking models using Large Language Models (LLM)

Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!Arjen de Vries
 
Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023SakshiTiwari490123
 
Serving Information Needs of Knowledge Workers
Serving Information Needs of Knowledge WorkersServing Information Needs of Knowledge Workers
Serving Information Needs of Knowledge WorkersDebdoot Mukherjee
 
Bringing OpenClinica Data into SAS
Bringing OpenClinica Data into SASBringing OpenClinica Data into SAS
Bringing OpenClinica Data into SASRick Watts
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache PinotSiddharth Teotia
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Gabriel Moreira
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...NETWAYS
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Sudhir Mallem
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorialYiqun Liu
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...Aman Grover
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsGabriel Moreira
 
Show and tell program 04 2014-09-04
Show and tell program 04 2014-09-04Show and tell program 04 2014-09-04
Show and tell program 04 2014-09-04nihshowandtell
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15Hans Höchtl
 
Practical dimensions
Practical dimensionsPractical dimensions
Practical dimensionstholem
 

Similar to Building Learning to Rank (LTR) search reranking models using Large Language Models (LLM) (20)

Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023
 
BoTLRet: A Template-based Linked Data Information Retrieval
 BoTLRet: A Template-based Linked Data Information Retrieval BoTLRet: A Template-based Linked Data Information Retrieval
BoTLRet: A Template-based Linked Data Information Retrieval
 
Serving Information Needs of Knowledge Workers
Serving Information Needs of Knowledge WorkersServing Information Needs of Knowledge Workers
Serving Information Needs of Knowledge Workers
 
Erdi güngör bbs
Erdi güngör bbsErdi güngör bbs
Erdi güngör bbs
 
Bringing OpenClinica Data into SAS
Bringing OpenClinica Data into SASBringing OpenClinica Data into SAS
Bringing OpenClinica Data into SAS
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorial
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Show and tell program 04 2014-09-04
Show and tell program 04 2014-09-04Show and tell program 04 2014-09-04
Show and tell program 04 2014-09-04
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15
 
Practical dimensions
Practical dimensionsPractical dimensions
Practical dimensions
 
I explore
I exploreI explore
I explore
 

More from Sujit Pal

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question AnsweringSujit Pal
 
Searching Across Images and Test
Searching Across Images and TestSearching Across Images and Test
Searching Across Images and TestSujit Pal
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Sujit Pal
 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringSujit Pal
 
Backprop Visualization
Backprop VisualizationBackprop Visualization
Backprop VisualizationSujit Pal
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudSujit Pal
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Sujit Pal
 
Leslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubLeslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubSujit Pal
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalSujit Pal
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsSujit Pal
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSSujit Pal
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingSujit Pal
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSujit Pal
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSujit Pal
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSujit Pal
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchSujit Pal
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 

More from Sujit Pal (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question Answering
 
Searching Across Images and Test
Searching Across Images and TestSearching Across Images and Test
Searching Across Images and Test
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestring
 
Backprop Visualization
Backprop VisualizationBackprop Visualization
Backprop Visualization
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn Cloud
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
 
Leslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubLeslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal Club
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based Retrieval
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length Inputs
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language Processing
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Building Learning to Rank (LTR) search reranking models using Large Language Models (LLM)

  • 1. PyData Global 2023 Sujit Pal, Elsevier Health ORCID Id: https://orcid.org/0000-0002-6225-110X Building Learning to Rank models for search using Large Language Models 2023
  • 2. • Work at the intersection of search and machine learning • Interested in Information Retrieval, Natural Language Processing, Knowledge Graphs and Machine Learning, and now LLMs and Generative AI About Me 2 sujit.pal@elsevier.com https://www.linkedin.com/in/sujitpal @palsujit@hachyderm.io
  • 5. Basic Idea (what) Use LLMs to generate relevance judgements Use relevance judgements to train LTR models Use LTR models to rerank query results Profit! 5
  • 6. Rationale (why) LTR Easy way to jumpstart relevance model Practical for situations where judgement data is cheap and plentiful LLM Potentially unlimited source of judgement data 70B+ LLM models capable of mimicking human preferences 6 Large language models can accurately predict searcher preferences (Thomas et al, 2023)
  • 7. Rationale (results) Model MAP@10 BM25 (Elasticsearch OOB) 6.42 Cosine Sim. (QDrant OOB) 8.35 CKPOC Heuristics 8.50 Baselines Model MAP@10 Pointwise (regression) 8.12 Pairwise (RankNet) 8.38 Pairwise (LambdaRank) 7.92 Pairwise (LambdaMART) 7.58 LTR models 7
  • 8. Workflow (how) Training Gold set queries Query logs q (q, dk) (q, dk, yk) (Xk, yk) q (q, dk) (q, dk, Xk) (q, dk, Xk, y’k) (q, dk, yk) Inference Evaluation Elasticsearch Index (BM25) Elasticsearch Index (BM25) Feature Generator Feature Generator Label Generator Label Generator LTR Model Trained LTR Model Reranker Query Sampler P@10 8
  • 10. Query Sampler Training Gold set queries Query logs q (q, dk) (q, dk, yk) (Xk, yk) q (q, dk) (q, dk, Xk) (q, dk, Xk, y’k) (q, dk, yk) Inference Evaluation Elasticsearch Index (BM25) Elasticsearch Index (BM25) Feature Generator Feature Generator Label Generator Label Generator LTR Model Trained LTR Model Reranker Query Sampler P@10 10
  • 11. • Determine a set of representative queries system expected to answer (for training LTR model) • This was for a specialized search component to answer long queries, so we sampled from our query log • Pretend #-tokens and #-concepts form a normal distribution, calculate mean and standard deviation • Set up boundaries: mean ± s.d. • Filter queries from query log whose #-tokens and #-concepts fall within the (mean ± s.d.) boundary Query Sampler 11
  • 12. Label Generator Training Gold set queries Query logs q (q, dk) (q, dk, yk) (Xk, yk) q (q, dk) (q, dk, Xk) (q, dk, Xk, y’k) (q, dk, yk) Inference Evaluation Elasticsearch Index (BM25) Elasticsearch Index (BM25) Feature Generator Feature Generator Label Generator Label Generator LTR Model Trained LTR Model Reranker Query Sampler P@10 12
  • 13. Label Generation (pointwise) q (q, dk) (q, dk, yk) Human: You are a medical expert tasked with identifying if the provided DOCUMENT addresses the information needs for the provided QUERY. QUERY: `{query}` DOCUMENT: `{document}` Your RESPONSE should be: - RELEVANT if the DOCUMENT addresses the information needs for the QUERY - IRRELEVANT otherwise Explain your REASONING. Format the output as follows: <output> <response>RESPONSE</response> <reasoning>REASONING</reasoning> </output> Assistant: <output> Prompt used RELEVANT or IRRELEVANT (binary) 13
  • 14. Label Generation (pairwise) Human: You are a medical expert who has to judge which of two DOCUMENTs shown below are relevant for the given QUERY. Provide your JUDGEMENT as DOCUMENT- 1 or DOCUMENT-2 depending on which DOCUMENT you think is relevant for the QUERY. QUERY: `{query}` DOCUMENT-1: `{document_1}` DOCUMENT-2: `{document_2}` Explain your REASONING. Format your output as follows: <output> <response>JUDGEMENT</response> <reasoning>REASONING</reasoning> </output> Assistant: <output> q (q, dk) (q, dki, dkj, yk) (q, dki, dkj) Generate pairs Prompt used DOCUMENT-1 or DOCUMENT-2 14
  • 15. Label Generation (listwise) q (q, dk) (q, dk, yk) Human: You are a medical expert tasked with assigning a SCORE indicating how relevant the given DOCUMENT is to the given QUERY. QUERY: `{query}` DOCUMENT: `{document}` Assign the SCORE as follows: 1 - DOCUMENT is completely unrelated to QUERY 2 - DOCUMENT has some relation to QUERY, but mostly off-topic 3 - DOCUMENT is relevant to QUERY, but lacking focus or key details 4 - DOCUMENT is highly relevant, addressing the main aspects of QUERY 5 - DOCUMENT is directly relevant and precisely targeted to QUERY Explain your REASONING for assigning the SCORE. Format the output as follows: <output> <score>SCORE</score> <reasoning>REASONING</reasoning> </output> Assistant: <output> 5-point scale (1-5) (numeric) Prompt used 15
  • 16. Feature Generator Training Gold set queries Query logs q (q, dk) (q, dk, yk) (Xk, yk) q (q, dk) (q, dk, Xk) (q, dk, Xk, y’k) (q, dk, yk) Inference Evaluation Elasticsearch Index (BM25) Elasticsearch Index (BM25) Feature Generator Feature Generator Label Generator Label Generator LTR Model Trained LTR Model Reranker Query Sampler P@10 16
  • 17. Feature Generation Query Features Document Features Query-Document Features #-tokens in query Total Term Frequency (TTF) for field TF (min, max, mean, var) for field TF*IDF (min, max, mean, var) for field #-overlapping query tokens w/field #-overlapping query concepts w/field #-overlapping query semantic groups w/field BM25 scores for matching query w/field Cosine similarity between query and field Idea Source: Learning to Rank Datasets page from Microsoft Research 17
  • 18. Feature Generation Query Features Document Features Query-Document Features #-tokens in query Total Term Frequency (TTF) for field TF (min, max, mean, var) for field TF*IDF (min, max, mean, var) for field #-overlapping query tokens w/field #-overlapping query concepts w/field #-overlapping query semantic groups w/field BM25 scores for matching query w/field Cosine similarity between query and field Document Fields - title - section title - breadcrumbs - text 18
  • 19. Feature Generation Query Features Document Features Query-Document Features #-tokens in query Total Term Frequency (TTF) for field TF (min, max, mean, var) for field TF*IDF (min, max, mean, var) for field #-overlapping query tokens w/field #-overlapping query concepts w/field #-overlapping query semantic groups w/field BM25 scores for matching query w/field Cosine similarity between query and field Multiple point estimates for same feature 19
  • 20. Feature Generation Query Features Document Features Query-Document Features #-tokens in query Total Term Frequency (TTF) for field TF (min, max, mean, var) for field TF*IDF (min, max, mean, var) for field #-overlapping query tokens w/field #-overlapping query concepts w/field #-overlapping query semantic groups w/field BM25 scores for matching query w/field Cosine similarity between query and field Count Custom NER 61 features in all 20
  • 21. Model Training Gold set queries Query logs q (q, dk) (q, dk, yk) (Xk, yk) q (q, dk) (q, dk, Xk) (q, dk, Xk, y’k) (q, dk, yk) Inference Evaluation Elasticsearch Index (BM25) Elasticsearch Index (BM25) Feature Generator Feature Generator Label Generator Label Generator LTR Model Trained LTR Model Reranker Query Sampler P@10 21
  • 22. • Pointwise Models: take query and document as input and return a relevance judgment between 0 and 1. • Pairwise Models: take a query and pair of documents as input and return a judgment between -1 and 1 • Listwise Models (not used): take a query and list of documents and return list of documents ordered by relevance • Feature generator takes query and document and returns a feature vector LTR Models Recap Generate features Point- wise LTR Model query doc judgment Generate features Pairwise LTR Model query doc-1 judgment doc-2 22
  • 23. • Pointwise − 2-layer FCN for binary classification, uses binary relevance data • RankNet − 3-layer Siamese network for binary classification, uses pairwise relevance data • LambdaRank − Pairwise model, needs listwise (scored) input, internally converts to pairwise − Code adapted from houchenyu/L2R − Also available via XGBoost using rank:pairwise objective • LambdaMART − Also available via XGBoost using rank:ndcg objective Model Performance binary pairwise scored 23
  • 24. Model Implementations 24 Pointwise Pairwise RankNet LambdaRank LambdaMART • Two-layer FCN • Binary Classifier • Scores = Pred.Prob • Binary Labels • 3-layer Siamese Network • Binary Classifier • Pairwise Labels • Adapted from houchenyu/L2R • Returns numeric Score • Scored (listwise) labels • Internally converted to pairwise rank:pairwise rank:ndcg
  • 25. Rank Matrix (RankNet only) 25 D1 D2 D3 D4 D5 D1 0 1 1 -1 1 D2 -1 0 -1 1 -1 D3 -1 1 0 1 -1 D4 1 -1 -1 0 -1 D5 -1 1 1 1 0 • D2 > D1 • D3 > D1 • D4 < D1 • D5 > D1 • D3 < D2 • D4 > D2 • D5 < D2 • D4 > D3 • D5 < D3 • D4 < D4 Given the following pair-wise rankings Pairwise to Listwise – Pairwise Comparison Method (1000 minds) Return sorted list of documents total 1 -2 0 -2 2 sorted D5 D1 D3 D2 D4
  • 26. • RankNet − Trains using gradient descent − Gradient computed as ∂C/∂S, where C = cross-entropy, penalizes difference in desired ranking vs actual ranking, and S = model score • LambdaRank − Multiplies RankNet ∂C/∂S (ƛ) values by |ΔNDCG|, the change in NDCG caused by swapping a pair of inputs • LambdaMART − Combines Gradient Boosting (MART = Multiple Additive Regression Trees) with LambdaRank gradient computation LTR Models Evolution Paper ref: From RankNet to LambdaRank to LambdaMART – an Overview (Burges, 2010) 26
  • 27. Evaluation Training Gold set queries Query logs q (q, dk) (q, dk, yk) (Xk, yk) q (q, dk) (q, dk, Xk) (q, dk, Xk, y’k) (q, dk, yk) Inference Evaluation Elasticsearch Index (BM25) Elasticsearch Index (BM25) Feature Generator Feature Generator Label Generator Label Generator LTR Model Trained LTR Model Reranker Query Sampler P@10 27
  • 28. • Generate top 50 results for query from ES index (lexical search) • Re-rank using trained LTR model and return top 10 results • Use LLM (same prompt as point-wise label generation) to determine relevant / irrelevant judgments • Aggregate judgments across results, i.e. 7 / 10 relevant  0.7 P@10 • Average P@10 scores across all eval queries  MAP@10 • Our application called for top 10 results equally ranked • But pipeline could also generate ranked lists and compute rank-aware metrics such as MRR@k or NDCG@k if needed Evaluation 28
  • 30. • Low-effort way to quickly build medium to high relevance LTR relevance models • LLMs provide (relatively) cheap and plentiful judgment labels to train LTR models • Can be used to jumpstart development of search pipelines. • Feature Engineering used to inject informative features from different search modalities – lexical, vector, knowledge graph, etc. Conclusions 30
  • 31. • Human judgments hard to acquire, so using LLM makes sense • Human vs LLM judgment have similar trend, accuracy: 71%, but LLMs more ”lenient” than human • Overall correlation 0.43, but decreases with increasing scores • Observation: LLM tries too hard to conclude “RELEVANT” by making leaps of reasoning humans would not. Alignment with human judgments 31
  • 32. Active Learning 32 Image Credit: SuperAnnotate Webinars Page 1. Train LTR model with fully automated pipeline 2. Deploy as re-ranker 3. Generate search results for user queries 4. Identify low conference predictions and re-annotate using human experts 5. Retrain LTR model with additional labels from step 4 6. Go to step 2
  • 33. Ensemble of LLM Judges 33
  • 34. • Prompt Engineering • Additional guidelines based on (human) expert feedback • Few shot prompting • Chain of Thought / Auto- prompting • Other advanced techniques, e.g., APE, self-consistency • Prefix tuning • Fine tuning Prompt Engineering 34 More advanced prompting techniques in: Prompt Engineering Guide