RELXSearchSummIt2018RELXSearchSummIt2018
Learning to Rank
Sujit Pal, Elsevier Labs
September 25-27, 2018
What it is, how it’s done, and what it can do for you
RELXSearchSummIt2018
Outline
• History
• Problem setup
• Learning to Rank Algorithms
• Practical Considerations
• LTR Case Studies (Solr, Elasticsearch, DIY)
• Wrap Up
Learning to Rank - what it is, how it's done, and what it can do for you 2
RELXSearchSummIt2018
History
• 1992: Idea of LTR (or Machine Learned Ranking) first proposed
• 2003: Altavista (later acquired by Yahoo!) using LTR in its engine
• 2005: Microsoft invents RankNet, deploys in Bing
• 2008: In contrast, Google’s engine hand tuned, relies on up ~200 signals
• 2009: Yandex invents and deploys MatrixNet in its engine
• 2016: Google says RankBrain is #3 signal to its search engine
• 2016: Bloomberg contributes LTR plugin to Solr
• 2017: Open Source Connections contributes LTR plugin in Elasticsearch
Learning to Rank - what it is, how it's done, and what it can do for you 3
RELXSearchSummIt2018
Problem Setup
Learning to Rank - what it is, how it's done, and what it can do for you 4
RELXSearchSummIt2018
LTR Pipeline
Learning to Rank - what it is, how it's done, and what it can do for you 5
Image Credit: https://towardsdatascience.com/when-to-use-a-machine-learned-vs-score-based-search-ranker-aa8762cd9aa9
• Training: Build LTR
model using training
data (query, document,
label) triples
• Label is rank
• Inference: Use model to
predict label ŷ = h(x)
from unseen (query,
document) pairs
RELXSearchSummIt2018
Difference between search and LTR
• Search engines
• Use text based relevance – TF-IDF, BM25, etc.
• Unsupervised, backed by statistical models.
• LTR
• Can support different (application specific) notions of relevance. For example:
• Recommendations – depends on price, geolocation or user ratings.
• Question Answering – best text match might not return best answer, right set of features
may be hard to articulate explicitly.
• Supervised technique, needs labeled data to train.
• Just a re-ranker, search layer must return results to re-rank.
Learning to Rank - what it is, how it's done, and what it can do for you 6
RELXSearchSummIt2018
Difference between ML and LTR
• ML solves a prediction problem (classification or regression) for a
single instance at a time.
• LTR solves a ranking problem for a list of items – objective is to find an
optimal ordering of items.
Learning to Rank - what it is, how it's done, and what it can do for you 7
RELXSearchSummIt2018
Reasons to consider LTR
• Too many parameters to tune manually without overfitting to
particular query set.
• Ranking requirements not being met with traditional text based
search tools (including use of metadata fields).
• Availability of enough (implicit or explicit) good training data to train
LTR model.
Learning to Rank - what it is, how it's done, and what it can do for you 8
RELXSearchSummIt2018
Learning to Rank Algorithms
Learning to Rank - what it is, how it's done, and what it can do for you 9
RELXSearchSummIt2018
Traditional Ranking Models
• Vector Space Models
• Boolean – predicts if document relevant to query or not
• TF-IDF – rank documents by cosine similarity between document and query
• Probabilistic Models
• BM25 – rank documents by log odds of relevance to query
• LMIR – probability of document’s LM generating terms in query
• Importance based Models
• HITS – rank documents by hubness/authority (inlinks/outlinks).
• PageRank – rank document by probability of random surfer arriving on page
• Impact Factor – rank documents by number of citations
Learning to Rank - what it is, how it's done, and what it can do for you 10
RELXSearchSummIt2018
Evaluation Metrics
• Mean Average Precision (MAP
@k)
• Mean Reciprocal Rank (MRR)
Learning to Rank - what it is, how it's done, and what it can do for you 11
• Normalized Discounted
Cumulative Gain (NDCG @k)
• Rank Correlation
RELXSearchSummIt2018
High Level Taxonomy of LTR Algorithms
• Pointwise – documents ranked by relevance of each (query,
document) pair
• Pairwise – documents ranked by considering priority between pairs of
(query, document) pairs
• Listwise – documents ranked by considering the entire relevance
ordering of all (query, Documents) tuples per query
Learning to Rank - what it is, how it's done, and what it can do for you 12
RELXSearchSummIt2018
Pointwise Approach
• Input: (query, document) pair (q, d)
• Output: score indicating rank on result list
• Model: 𝒇(q, d) → score
• Regression problem (in case of numeric scores) or Classification
problem (in case of relevant/irrelevant, or multi-level classes like
Perfect/Excellent/Good/Fair/Bad)
• Ordinal regression: include ordinal relationship between labels.
• Examples: SLR (Staged Logistic Regression), Pranking
Learning to Rank - what it is, how it's done, and what it can do for you 13
RELXSearchSummIt2018
Pairwise Approach
• Input: triples of (query, document pairs) (q, dA, dB)
• Output: one of [-1, 1]
• Model: 𝒇(q, dA, dB) → [-1, 1]
• Classification problem, learn binary classifier to predict [-1, 1] for a
given pair of (query, document pair) triples
• Goal is to minimize average number of inversions in ranking
• Examples: RankNet, RankSVM, LambdaMART
Learning to Rank - what it is, how it's done, and what it can do for you 14
RELXSearchSummIt2018
Listwise Approach
• Input: (query, Documents {d1, d2, …, dN})
• Output: desired ranked list of documents 𝕯
• Model: 𝒇(q, {d1, d2, …, dN}) → 𝕯
• Classification problem, with indirect loss functions such as
RankCosine or KL Divergence, or smoothing IR measures (since not
directly differentiable) and applying Gradient Descent
• Examples: AdaRank, ListNET, RankCosine, SVMMap
Learning to Rank - what it is, how it's done, and what it can do for you 15
RELXSearchSummIt2018
Commonly used Algorithms
• Linear Model
• Predicted rank is linear combination of input features
• RankNet
• Neural network based
• Good for binary (relevant/irrelevant) labels
• Weight matrix transforms input features into rank probabilities
• LambdaMART
• Tree (forest) based
• Good for multi-class labels
• Feature splits with thresholds
Learning to Rank - what it is, how it's done, and what it can do for you 16
RELXSearchSummIt2018
Practical Considerations
Learning to Rank - what it is, how it's done, and what it can do for you 17
RELXSearchSummIt2018
Acquiring labels
• Implicit
• Intrinsic features (words, phrases)
• Document metadata
• User Clicks
• Time spent on document
• Purchases (if applicable)
• Cheap to build but noisy
• Explicit
• Human expert rates relevancy of each document against query
• Cleaner but expensive to build
Learning to Rank - what it is, how it's done, and what it can do for you 18
RELXSearchSummIt2018
Feature Selection
• Document Features
• Document Length
• URL Length
• Publication Date
• Number of outlinks
• PageRank
• Query Features
• Number of words
• PER or ORG in query
Learning to Rank - what it is, how it's done, and what it can do for you 19
• Query-Document Features
• TF-IDF, BM25 similarity
• Frequency of query in anchor
text
• Document contains query words
in title
• User Dependent Features
• Star ratings
• Age, gender
• Device
RELXSearchSummIt2018
Unbalanced Datasets
• If dataset is unbalanced, i.e., classes are not represented
approximately equally, then use under- or oversampling to balance.
• Consider using something like SMOTE for oversampling instead of
naïve oversampling by duplication.
• Make sure no data leakage in case of oversampling.
Learning to Rank - what it is, how it's done, and what it can do for you 20
RELXSearchSummIt2018
LTR used as re-ranker
• LTR models are usually more
computationally expensive than
search engines.
• Search engine used to pull out
matched documents
• Top-N of these documents are fed into
the LTR model and top-n of those are
replaced with the output of the
model, for N >> n (typically 50-100x).
Learning to Rank - what it is, how it's done, and what it can do for you 21
Index
Query
Matched
(10k)
Scored
(10k)
Top 1000
retrieved
Re-ranked
Top 10
Ranking
Model
Image Credit: https://lucidworks.com/2016/08/17/learning-to-rank-solr/
RELXSearchSummIt2018
LTR Algorithm Implementations
• RankLib (Java) – from Lemur Project (UMass, CMU), provides
Coordinate Ascent, Random Forest (pointwise), MART, RankNet,
RankBoost (pairwise), LambdaMART (pair/listwise), AdaRank and
ListNet (listwise)
• SVMRank (C++) – from Cornell, provides SVMRank (pairwise)
• XGBoost (Python/C++) – LambdaRank (pairwise)
• PyLTR (Python) – LambdaMART (pairwise)
• Michael Alcorn (Python) – RankNet and LambdaMART (pairwise)
Learning to Rank - what it is, how it's done, and what it can do for you 22
RELXSearchSummIt2018
LETOR Data Format
2 qid:1 1:3 2:3 3:0 4:0 5:3 6:1 7:1 8:0 9:0 10:1 11:156... # 11
2 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:406... # 23
0 qid:1 1:3 2:0 3:2 4:0 5:3 6:1 7:0 8:0.666667 9:0 10:1 ... # 44
2 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:287 ... # 57
1 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:2009 ... # 89
Learning to Rank - what it is, how it's done, and what it can do for you 23
label
Query ID
Features: query, document, query/document, other
(sparse or dense format)
Comments (ex: docID)
RELXSearchSummIt2018
Case Studies
Learning to Rank - what it is, how it's done, and what it can do for you 24
RELXSearchSummIt2018
Preprocessing Data
• We use The Movie Database (TMDB) from Kaggle.
• 45k movies, 20 genres, 31k unique keywords
• We extract following fields: (docID, title, description, popularity,
release date, running time, rating (0-10), keywords, genres)
• Categorical labels 1-5 created from rating
• Objective is to build LTR model that learns the ordering implied by
rating and re-rank top 10 results using this model
• Features chosen: (query-title and query-description similarity using
TF-IDF and BM25, document recency, original score, and boolean 0/1
for each genre)
Learning to Rank - what it is, how it's done, and what it can do for you 25
RELXSearchSummIt2018
LTR with Solr
• Prepare Solr for LTR (add snippet to solrconfig.xml) and start with
solr.ltr.enabled=True
• Load data
• Define LTR features to be used to Solr
• Define dummy linear model to use Solr to extract features (rq) for some
queries to LETOR format
• Train RankLib LambdaMART model using extracted features
• Upload trained model definition to Solr
• Run Solr re-rank query (rq) using trained LTR model
• See notebooks – 02-solr/01 .. 04
Learning to Rank - what it is, how it's done, and what it can do for you 26
RELXSearchSummIt2018
LTR with Elasticsearch
• Install LTR plugin and load data
• Initialize feature store
• Define features – load feature templates into Elasticsearch
• Extract features (sltr) to LETOR format
• Train RankLib model (also supported natively XGBoost, SVMRank).
• Upload trained LTR model to Elasticsearch
• Run re-rank query (rescore) using trained LTR model
• See notebooks – 03-elasticsearch/01 .. 04
Learning to Rank - what it is, how it's done, and what it can do for you 27
RELXSearchSummIt2018
DIY LTR – Index Agnostic
• Run queries, generate features from results to LETOR format
• Train RankLib (or other third party LTR) model
• Run re-rank query on trained model
• Merge output of re-rank with actual results from index
• See notebooks – 04-ranklib/02..04
• Pros: index agnostic; more freedom to add novel features
• Cons: less support from index
Learning to Rank - what it is, how it's done, and what it can do for you 28
RELXSearchSummIt2018
Wrapping Up
Learning to Rank - what it is, how it's done, and what it can do for you 29
RELXSearchSummIt2018
Resources
• Book – Learning to Rank for Information Retrieval, by Tie-Yan Liu.
• Paper – From RankNet to LambdaRank to LambdaMART: An
Overview, by Christopher J. C. Burges
• Tutorials
• Solr - https://github.com/airalcorn2/Solr-LTR
• Elasticsearch – Learning to Rank 101 by Pere Urbon-Bayes, ES-LTR Demo by
Doug Turnbull.
• Product Centric LTR Documentation
• Solr Learning To Rank Docs
• Elasticsearch Learning to Rank Docs
Learning to Rank - what it is, how it's done, and what it can do for you 30
RELXSearchSummIt2018
Thank you!
• Contact: sujit.pal@elsevier.com
Learning to Rank - what it is, how it's done, and what it can do for you 31

Search summit-2018-ltr-presentation

  • 1.
    RELXSearchSummIt2018RELXSearchSummIt2018 Learning to Rank SujitPal, Elsevier Labs September 25-27, 2018 What it is, how it’s done, and what it can do for you
  • 2.
    RELXSearchSummIt2018 Outline • History • Problemsetup • Learning to Rank Algorithms • Practical Considerations • LTR Case Studies (Solr, Elasticsearch, DIY) • Wrap Up Learning to Rank - what it is, how it's done, and what it can do for you 2
  • 3.
    RELXSearchSummIt2018 History • 1992: Ideaof LTR (or Machine Learned Ranking) first proposed • 2003: Altavista (later acquired by Yahoo!) using LTR in its engine • 2005: Microsoft invents RankNet, deploys in Bing • 2008: In contrast, Google’s engine hand tuned, relies on up ~200 signals • 2009: Yandex invents and deploys MatrixNet in its engine • 2016: Google says RankBrain is #3 signal to its search engine • 2016: Bloomberg contributes LTR plugin to Solr • 2017: Open Source Connections contributes LTR plugin in Elasticsearch Learning to Rank - what it is, how it's done, and what it can do for you 3
  • 4.
    RELXSearchSummIt2018 Problem Setup Learning toRank - what it is, how it's done, and what it can do for you 4
  • 5.
    RELXSearchSummIt2018 LTR Pipeline Learning toRank - what it is, how it's done, and what it can do for you 5 Image Credit: https://towardsdatascience.com/when-to-use-a-machine-learned-vs-score-based-search-ranker-aa8762cd9aa9 • Training: Build LTR model using training data (query, document, label) triples • Label is rank • Inference: Use model to predict label ŷ = h(x) from unseen (query, document) pairs
  • 6.
    RELXSearchSummIt2018 Difference between searchand LTR • Search engines • Use text based relevance – TF-IDF, BM25, etc. • Unsupervised, backed by statistical models. • LTR • Can support different (application specific) notions of relevance. For example: • Recommendations – depends on price, geolocation or user ratings. • Question Answering – best text match might not return best answer, right set of features may be hard to articulate explicitly. • Supervised technique, needs labeled data to train. • Just a re-ranker, search layer must return results to re-rank. Learning to Rank - what it is, how it's done, and what it can do for you 6
  • 7.
    RELXSearchSummIt2018 Difference between MLand LTR • ML solves a prediction problem (classification or regression) for a single instance at a time. • LTR solves a ranking problem for a list of items – objective is to find an optimal ordering of items. Learning to Rank - what it is, how it's done, and what it can do for you 7
  • 8.
    RELXSearchSummIt2018 Reasons to considerLTR • Too many parameters to tune manually without overfitting to particular query set. • Ranking requirements not being met with traditional text based search tools (including use of metadata fields). • Availability of enough (implicit or explicit) good training data to train LTR model. Learning to Rank - what it is, how it's done, and what it can do for you 8
  • 9.
    RELXSearchSummIt2018 Learning to RankAlgorithms Learning to Rank - what it is, how it's done, and what it can do for you 9
  • 10.
    RELXSearchSummIt2018 Traditional Ranking Models •Vector Space Models • Boolean – predicts if document relevant to query or not • TF-IDF – rank documents by cosine similarity between document and query • Probabilistic Models • BM25 – rank documents by log odds of relevance to query • LMIR – probability of document’s LM generating terms in query • Importance based Models • HITS – rank documents by hubness/authority (inlinks/outlinks). • PageRank – rank document by probability of random surfer arriving on page • Impact Factor – rank documents by number of citations Learning to Rank - what it is, how it's done, and what it can do for you 10
  • 11.
    RELXSearchSummIt2018 Evaluation Metrics • MeanAverage Precision (MAP @k) • Mean Reciprocal Rank (MRR) Learning to Rank - what it is, how it's done, and what it can do for you 11 • Normalized Discounted Cumulative Gain (NDCG @k) • Rank Correlation
  • 12.
    RELXSearchSummIt2018 High Level Taxonomyof LTR Algorithms • Pointwise – documents ranked by relevance of each (query, document) pair • Pairwise – documents ranked by considering priority between pairs of (query, document) pairs • Listwise – documents ranked by considering the entire relevance ordering of all (query, Documents) tuples per query Learning to Rank - what it is, how it's done, and what it can do for you 12
  • 13.
    RELXSearchSummIt2018 Pointwise Approach • Input:(query, document) pair (q, d) • Output: score indicating rank on result list • Model: 𝒇(q, d) → score • Regression problem (in case of numeric scores) or Classification problem (in case of relevant/irrelevant, or multi-level classes like Perfect/Excellent/Good/Fair/Bad) • Ordinal regression: include ordinal relationship between labels. • Examples: SLR (Staged Logistic Regression), Pranking Learning to Rank - what it is, how it's done, and what it can do for you 13
  • 14.
    RELXSearchSummIt2018 Pairwise Approach • Input:triples of (query, document pairs) (q, dA, dB) • Output: one of [-1, 1] • Model: 𝒇(q, dA, dB) → [-1, 1] • Classification problem, learn binary classifier to predict [-1, 1] for a given pair of (query, document pair) triples • Goal is to minimize average number of inversions in ranking • Examples: RankNet, RankSVM, LambdaMART Learning to Rank - what it is, how it's done, and what it can do for you 14
  • 15.
    RELXSearchSummIt2018 Listwise Approach • Input:(query, Documents {d1, d2, …, dN}) • Output: desired ranked list of documents 𝕯 • Model: 𝒇(q, {d1, d2, …, dN}) → 𝕯 • Classification problem, with indirect loss functions such as RankCosine or KL Divergence, or smoothing IR measures (since not directly differentiable) and applying Gradient Descent • Examples: AdaRank, ListNET, RankCosine, SVMMap Learning to Rank - what it is, how it's done, and what it can do for you 15
  • 16.
    RELXSearchSummIt2018 Commonly used Algorithms •Linear Model • Predicted rank is linear combination of input features • RankNet • Neural network based • Good for binary (relevant/irrelevant) labels • Weight matrix transforms input features into rank probabilities • LambdaMART • Tree (forest) based • Good for multi-class labels • Feature splits with thresholds Learning to Rank - what it is, how it's done, and what it can do for you 16
  • 17.
    RELXSearchSummIt2018 Practical Considerations Learning toRank - what it is, how it's done, and what it can do for you 17
  • 18.
    RELXSearchSummIt2018 Acquiring labels • Implicit •Intrinsic features (words, phrases) • Document metadata • User Clicks • Time spent on document • Purchases (if applicable) • Cheap to build but noisy • Explicit • Human expert rates relevancy of each document against query • Cleaner but expensive to build Learning to Rank - what it is, how it's done, and what it can do for you 18
  • 19.
    RELXSearchSummIt2018 Feature Selection • DocumentFeatures • Document Length • URL Length • Publication Date • Number of outlinks • PageRank • Query Features • Number of words • PER or ORG in query Learning to Rank - what it is, how it's done, and what it can do for you 19 • Query-Document Features • TF-IDF, BM25 similarity • Frequency of query in anchor text • Document contains query words in title • User Dependent Features • Star ratings • Age, gender • Device
  • 20.
    RELXSearchSummIt2018 Unbalanced Datasets • Ifdataset is unbalanced, i.e., classes are not represented approximately equally, then use under- or oversampling to balance. • Consider using something like SMOTE for oversampling instead of naïve oversampling by duplication. • Make sure no data leakage in case of oversampling. Learning to Rank - what it is, how it's done, and what it can do for you 20
  • 21.
    RELXSearchSummIt2018 LTR used asre-ranker • LTR models are usually more computationally expensive than search engines. • Search engine used to pull out matched documents • Top-N of these documents are fed into the LTR model and top-n of those are replaced with the output of the model, for N >> n (typically 50-100x). Learning to Rank - what it is, how it's done, and what it can do for you 21 Index Query Matched (10k) Scored (10k) Top 1000 retrieved Re-ranked Top 10 Ranking Model Image Credit: https://lucidworks.com/2016/08/17/learning-to-rank-solr/
  • 22.
    RELXSearchSummIt2018 LTR Algorithm Implementations •RankLib (Java) – from Lemur Project (UMass, CMU), provides Coordinate Ascent, Random Forest (pointwise), MART, RankNet, RankBoost (pairwise), LambdaMART (pair/listwise), AdaRank and ListNet (listwise) • SVMRank (C++) – from Cornell, provides SVMRank (pairwise) • XGBoost (Python/C++) – LambdaRank (pairwise) • PyLTR (Python) – LambdaMART (pairwise) • Michael Alcorn (Python) – RankNet and LambdaMART (pairwise) Learning to Rank - what it is, how it's done, and what it can do for you 22
  • 23.
    RELXSearchSummIt2018 LETOR Data Format 2qid:1 1:3 2:3 3:0 4:0 5:3 6:1 7:1 8:0 9:0 10:1 11:156... # 11 2 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:406... # 23 0 qid:1 1:3 2:0 3:2 4:0 5:3 6:1 7:0 8:0.666667 9:0 10:1 ... # 44 2 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:287 ... # 57 1 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:2009 ... # 89 Learning to Rank - what it is, how it's done, and what it can do for you 23 label Query ID Features: query, document, query/document, other (sparse or dense format) Comments (ex: docID)
  • 24.
    RELXSearchSummIt2018 Case Studies Learning toRank - what it is, how it's done, and what it can do for you 24
  • 25.
    RELXSearchSummIt2018 Preprocessing Data • Weuse The Movie Database (TMDB) from Kaggle. • 45k movies, 20 genres, 31k unique keywords • We extract following fields: (docID, title, description, popularity, release date, running time, rating (0-10), keywords, genres) • Categorical labels 1-5 created from rating • Objective is to build LTR model that learns the ordering implied by rating and re-rank top 10 results using this model • Features chosen: (query-title and query-description similarity using TF-IDF and BM25, document recency, original score, and boolean 0/1 for each genre) Learning to Rank - what it is, how it's done, and what it can do for you 25
  • 26.
    RELXSearchSummIt2018 LTR with Solr •Prepare Solr for LTR (add snippet to solrconfig.xml) and start with solr.ltr.enabled=True • Load data • Define LTR features to be used to Solr • Define dummy linear model to use Solr to extract features (rq) for some queries to LETOR format • Train RankLib LambdaMART model using extracted features • Upload trained model definition to Solr • Run Solr re-rank query (rq) using trained LTR model • See notebooks – 02-solr/01 .. 04 Learning to Rank - what it is, how it's done, and what it can do for you 26
  • 27.
    RELXSearchSummIt2018 LTR with Elasticsearch •Install LTR plugin and load data • Initialize feature store • Define features – load feature templates into Elasticsearch • Extract features (sltr) to LETOR format • Train RankLib model (also supported natively XGBoost, SVMRank). • Upload trained LTR model to Elasticsearch • Run re-rank query (rescore) using trained LTR model • See notebooks – 03-elasticsearch/01 .. 04 Learning to Rank - what it is, how it's done, and what it can do for you 27
  • 28.
    RELXSearchSummIt2018 DIY LTR –Index Agnostic • Run queries, generate features from results to LETOR format • Train RankLib (or other third party LTR) model • Run re-rank query on trained model • Merge output of re-rank with actual results from index • See notebooks – 04-ranklib/02..04 • Pros: index agnostic; more freedom to add novel features • Cons: less support from index Learning to Rank - what it is, how it's done, and what it can do for you 28
  • 29.
    RELXSearchSummIt2018 Wrapping Up Learning toRank - what it is, how it's done, and what it can do for you 29
  • 30.
    RELXSearchSummIt2018 Resources • Book –Learning to Rank for Information Retrieval, by Tie-Yan Liu. • Paper – From RankNet to LambdaRank to LambdaMART: An Overview, by Christopher J. C. Burges • Tutorials • Solr - https://github.com/airalcorn2/Solr-LTR • Elasticsearch – Learning to Rank 101 by Pere Urbon-Bayes, ES-LTR Demo by Doug Turnbull. • Product Centric LTR Documentation • Solr Learning To Rank Docs • Elasticsearch Learning to Rank Docs Learning to Rank - what it is, how it's done, and what it can do for you 30
  • 31.
    RELXSearchSummIt2018 Thank you! • Contact:sujit.pal@elsevier.com Learning to Rank - what it is, how it's done, and what it can do for you 31

Editor's Notes

  • #4 Most of the key work done between 2008 – 2011, competitions sponsored by MS, Yahoo and Yandex. Bloomberg LTR meetup – Michael Nillson, Erick Erickson. OSC LTR – at Haystack earlier this year.
  • #12 All cases you need judgement list (ie relevant vs irrelevant). For MRR you need first good result so notion of position; for DCG you need graded results and for NDCG and Rank Correlation we also need ideal ordering.
  • #21 SMOTE – take a minority class and pick one of its k-nearest neighbors, create synthetic data based on a mix between the original and the neighbor.