Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to Build your Training Set for a Learning To Rank Project

665 views

Published on

Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
– model and collect the necessary feedback from the users (implicit or explicit)
– calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate …)
– transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect)
Join us as we explore real world scenarios and dos and don’ts from the e-commerce industry.

Published in: Software
  • Be the first to comment

  • Be the first to like this

How to Build your Training Set for a Learning To Rank Project

  1. 1. London Information Retrieval Meetup 21 Oct 2019 How to Build your Training Set
 for a Learning to Rank Project Alessandro Benedetti, Software Engineer 21th October 2019
  2. 2. London Information Retrieval Meetup Sease Search Services ● Open Source Enthusiasts ● Apache Lucene/Solr experts ! Community Contributors ● Active Researchers ● Hot Trends : Learning To Rank, Document Similarity, Search Quality Evaluation, Relevancy Tuning
  3. 3. London Information Retrieval Meetup Who I am ▪ Search Consultant ▪ R&D Software Engineer ▪ Master in Computer Science ▪ Apache Lucene/Solr Enthusiast ▪ S e m a n t i c , N L P, M a c h i n e L e a r n i n g Technologies passionate ▪ Beach Volleyball Player & Snowboarder Alessandro Benedetti
  4. 4. London Information Retrieval Meetup Agenda ● Learning To Rank ● Training Set Definition ! Implicit/Explicit feedback ! Feature Engineering ● Relevance Label ● Metric Evaluation/Loss Function ! Training Set Split
  5. 5. London Information Retrieval Meetup Learning To Rank What is it ? Learning from user implicit/explicit feedback To Rank documents (sensu lato)
  6. 6. London Information Retrieval Meetup Learning To Rank - What is NOT “Learning to rank is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems.” Wikipedia Learning To Rank Users Interactions Logger Judgement Collector 
 UI Interactions Training
  7. 7. London Information Retrieval Meetup • Sentient system that learn by itself • Continuously improving itself by ingesting additional feedback • Easy to set up and tune it • Easy to give a human understandable explanation of why the model operates in certain ways Learning To Rank - What is NOT
  8. 8. London Information Retrieval Meetup • A set of Examples to train the model on • It will considered the golden truth • Each example is composed by:
 - relevancy rating
 - query Id
 - feature vector • The feature vector is composed by N features (<id>:<value>) Training Set: What is it? 3 qid:1 0:3.4 1:0.7 2:1.5 3:0
 2 qid:1 0:5.0 1:0.4 2:1.3 3:0
 0 qid:1 0:2.4 1:0.7 2:1.5 3:1
 1 qid:2 0:5.7 1:0.2 2:1.1 3:0
 3 qid:2 0:0.0 1:0.5 2:4.0 3:0
 0 qid:3 0:1.0 1:0.7 2:1.5 3:1
  9. 9. London Information Retrieval Meetup Training Set: Gather Feedback Ratings Set Explicit Feedback Implicit Feedback Judgements Collector Interactions Logger Queen music Bohemian
 Rhapsody D a n c i n g Queen Queen 
 Albums Bohemian
 Rhapsody D a n c i n g Queen Queen 
 Albums
  10. 10. London Information Retrieval Meetup Feature Engineering : Feature Level Document level Query level Query Dependent This feature describes a 
 property of the DOCUMENT.
 The value of the feature depends only on
 the document instance.
 
 e.g.
 Document Type = E-commerce Product
 
 <Product price> is a Document Level feature.
 <Product colour> is a Document Level feature.
 <Product size> is a Document Level feature. Document Type = Hotel Stay
 
 <Hotel star rating> is a Document Level feature.
 <Hotel price> is a Document Level feature.
 <Hotel food rating> is a Document Level feature.
 Each sample is a <query,document> pair, the feature vector describes numerically this This feature describes a 
 property of the QUERY.
 The value of the feature depends only on
 the query instance.
 
 e.g.
 Query Type = E-commerce Search
 
 <Query length> is a Query Level feature.
 <User device> is a Query Level feature.
 <User budget> is a Query Level feature. This feature describes a 
 property of the QUERY in correlation
 with the DOCUMENT.
 The value of the feature depends on
 the query and document instance.
 
 e.g.
 Query Type = E-commerce Search
 Document Type = E-commerce Product
 
 <first Query Term TF in Product title> 
 is a Query dependent feature.
 <first Query Term DF in Product title> 
 is a Query dependent feature. <query selected categories intersecting
 the product categories> 
 is a Query dependent feature.
 

  11. 11. London Information Retrieval Meetup Feature Engineering : Feature Type Ordinal CategoricalQuantitative An ordinal feature describes a property
 for which the possible values are ordered.
 Ordinal variables can be considered 
 “in between” categorical and quantitative variables. e.g.
 Educational level might be categorized as 1: Elementary school education 2: High school graduate 3: Some college 4: College graduate 5: Graduate degree 
 
 1<2<3<4<5
 An quantitative feature describes a property
 for which the possible values are 
 a measurable quantity.
 e.g.
 Document Type = E-commerce Product
 
 <Product price> is a quantity
 
 e.g.
 Document Type = Hotel Stay
 
 <Hotel distance from city center>
 is a quantity
 
 A categorical feature represents an attribute of an 
 object that have a set of distinct possible values. In computer science it is common to call the possible
 values of a categorical features Enumerations.
 e.g. Document Type = E-commerce Product
 
 <Product colour> is a categorical feature
 <Product brand> is a categorical feature N.B. It is easy to observe that to give an order 
 to the values of a categorical feature 
 does not make any sense. For the Colour feature : red < blue < black has no general meaning.
  12. 12. London Information Retrieval Meetup Feature Engineering : One Hot Encoding Categorical Features e.g. Document Type = E-commerce Product
 
 <Product colour> is a categorical feature Values: Red, Green, Blue, Other Encoded Features:
 Given a cardinality of N, we build N-1 encoded binary features
 product_colour_red = 0/1
 product_colour_green = 0/1
 product_colour_blue = 0/1
 
 product_colour_other = 0/1
 Dummy Variable Trap
 predict feature value from others features High Cardinality Categoricals
 you may need to encode only the top frequent
  13. 13. London Information Retrieval Meetup Feature Engineering : Binary Encoding Categorical Features e.g. Document Type = E-commerce Product
 
 <Product colour> is a categorical feature Values: Red, Green, Blue, Other Encoded Features: 1) Ordinal Encoding
 Red=0, Green=1 Blue=2 Other=3
 
 2) Binary Encoding product_colour_bit1 = 0/1
 product_colour_bit2 = 0/1 Better for high cardinality categorically 
 
 
 Multi Valued?
 you may have collisions and not able to use binary features
  14. 14. London Information Retrieval Meetup Feature Engineering : Missing Values ● Some times a missing value would be equivalent to a 0 value semantic
 e.g.
 Domain: e-commerce products
 Feature: Discount Percentage - [quantitative, document level feature]
 a missing discount percentage could model a 0 discount percentage,
 missing values can be filled with 0 values
 ● Some times a missing feature value can have a completely different semantic
 e.g.
 Domain: Hotel Stay
 Feature: Star Rating - [quantitative, document level feature]
 a missing star rating it’s not equivalent to a 0 star rating, so an additional feature
 should be added to distinguish 

  15. 15. London Information Retrieval Meetup Relevance Label : Signal Intensity Discordant training samples ● Each sample is a user interaction (click, add to cart, sale, ect) ● Some sample are impressions (we have showed the document to the user) ● A rank is attributed to the user interaction types
 e.g.
 0-Impression < 1-click < 2-add to cart < 3-sale ● The rank becomes the relevance label for the sample
  16. 16. London Information Retrieval Meetup Relevance Label : Simple Click Model ● Each sample is a user interaction (click, add to cart, sale, ect) ● Some sample are impressions (we have showed the document to the user) ● One interaction type is set as the target of optimisation ● Identical Samples are aggregated, the new sample generated will have a new feature:
 
 [Interaction Type Count/ Impressions]
 
 e.g. CTR (Click Through Rate) = for the sample, number of clicks/ number of impressions ● We then get the resulting score (in CTR 0<x<1) and normalise it to get the relevance label
 The relevance label scale will depend on the Training algorithm chosen
  17. 17. London Information Retrieval Meetup Relevance Label : Advanced Click Model Given a sample:
 ● We have the CTR from previous model ● We compare it with the avg CTR of all samples ● We take into account the statistical significance
 of that (how many initial samples generated our estimation?)
 ● The relevance label will be the product of those two factors
 (scaled up accordingly to the training algorithm scale) More info in John Berryman blog:
 http://blog.jnbrymn.com/2018/04/16/better-click-tracking-1/
  18. 18. London Information Retrieval Meetup Point-wise Pair-wise List-wise How many documents you consider at a time, when calculating your loss function? ● Single Document ● You estimate a function that predicts the best score of the document ● Rank the results on the 
 predicted score ● Score of the doc is 
 independent of the other scores
 in the same result list ● You can use any regression or
 classification algorithm ● Pair of documents ● You estimate the optimal local ordering to maximise the global ● The Objective is to set local ordering to minimise the number of inversion across all pairs ● Works better than point wise because predicting a local ordering is closer to solving the ranking problem that just estimate a regression score ● Entire list of documents for a given query ● Direct optimization of IR measures such as NDCG ● Minimise a specific loss function ● The evaluation measure is avg across the queries ● Works better than pair wise Learning To Rank : Metric Evaluation
  19. 19. London Information Retrieval Meetup Offline Evaluation Metrics[1/3] • precision TruePositives/TruePositives+FalsePositives • precision@K (TruePositives/TruePositives+FalsePositives) in topK • (precision@1, precision@2, precision@10) • recall TruePositives/TruePositives+FalseNegatives Learning To Rank : Metric Evaluation
  20. 20. London Information Retrieval Meetup Offline Evaluation Metrics[2/3] Let’s combine Precision and recall: Learning To Rank : Metric Evaluation
  21. 21. London Information Retrieval Meetup Offline Evaluation Metrics[3/3] • DCG@K = Discounted Cumulative Gain@K • NDCG@K = DCG@K/ Ideal DCG@K Model1 Model2 Model3 Ideal 1 2 2 4 2 3 4 3 3 2 3 2 4 4 2 2 2 1 1 1 0 0 0 0 0 0 0 0 0.64 0.73 0.79 1.0 Learning To Rank : Metric Evaluation
  22. 22. London Information Retrieval Meetup Learning To Rank : List-wise and NDCG ● The list is the result set of documents
 for a given query Id ● In lambdaMart is often used NDCG@K per list ● The Evaluation Metric is avg over the query Ids
 when evaluating a training iteration It is extremely important to assess the distribution of training samples per queryId Model1 Model2 Model3 Ideal 1 1 1 1 1 1 1 1 Model1 Model2 Model3 Ideal 3 3 3 3 7 7 7 7 Query1 Query2 Under sampled QueryId can potentially sky rock your NDCG avg
  23. 23. London Information Retrieval Meetup Build the Lists: QueryId Hashing ● Assess how to calculate your queryId,
 each queryId should bring a separate set of results ● No free text query? Group query level features ● Target: uniform distribution of samples per query ● Drop training samples if queryIds are undersampled
  24. 24. London Information Retrieval Meetup Split the Set: Training/Validation(dev)/Test ● Each training set iteration will assess the evaluation metric on the training and validation set ● At the end of the iterations the final model will be evaluated on an unknown Test Set ● This split could be random ● This split could depend on the time the interactions were collected
  25. 25. London Information Retrieval Meetup Split the Set: Training/Validation(dev)/Test K-fold Cross Validation 


×