From Academic Papers To
Production: A Learning To Rank
Story
Alessandro Benedetti, Software Engineer, Sease Ltd.

Alessandro Benedetti
● Search Consultant
● R&D Software Engineer
● Master in Computer Science
● Apache Lucene/Solr Enthusiast
● Semantic, NLP, Machine Learning Technologies passionate
● Beach Volleyball Player & Snowboarder
Who I am
Search Services
● Open Source Enthusiasts
● Apache Lucene/Solr experts
! Community Contributors
● Active Researchers
● Hot Trends : Learning To Rank, Document Similarity,
Measuring Search Quality, Relevancy Tuning
Sease Ltd
● Learning To Rank
! Technologies Involved
! Data Preparation
● Model Training
● Apache Solr Integration
! Conclusions
Agenda
Learning To Rank - What is it ?
Learning from user implicit/explicit feedback
To
Rank documents (sensu lato)
Learning To Rank - What is NOT
- Sentient system that learn by itself
- Continuously improving itself by ingesting additional
feedback
- Easy to set up and tune it
- Easy to give a human understandable explanation of why the
model operates in certain ways
Learning To Rank - Technologies Used
- Spring Boot[1]
- RankLib[2]
- Apache Solr >=6.4 [3]
[1] https://projects.spring.io/spring-boot/
[2] https://sourceforge.net/p/lemur/wiki/RankLib/
[3] http://lucene.apache.org/solr/
Data Preparation
- User feedback harvesting 

- Feature Engineering
- Dataset clean-up
- Training/Validation/Test split
User Feedback Harvesting
- Explicit user feedback ( Experts/Crowdsourcing)

- Implicit user feedback ( eCommerce Sale Funnel)



How to assign the relevance label ?

- Signal intensity to model relevance ( sale > add to chart)

- Identify a target signal, calculate rates and normalise
Discordant training samples
Feature Engineering
- Query level / Document level / Query dependent

- Ordinal/Categorical features -> one hot encoding

- Missing values

High Cardinality Categorical
Data Set Cleanup


- Resample the dataset
- Query Id Hashing
- You need bad examples ! (NDCG -> not reflecting real quality)
Oversampling by duplication -> over-fitting
This strongly affects the evaluation metric (NDCG)
Training/Validation/Test split
- K-fold Cross Validation 

- Temporal Split
- Manual split after shuffling
Per rankList (subset of
queryIds)
Model Training
- LambdaMART + NDCG@K

- Threshold Candidates Count For Splitting -> simplify!
- Minimum Leaf Support -> remove outliers

Reason : missing searched location from training set
Apache Solr
Solr is the popular, blazing fast, open source NoSQL search
platform from the Apache LuceneTM project.
Its major features include powerful full-text search, hit
highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as
parallel SQL.
Apache Solr Integration
- Features definition (Json + Solr syntax)
- Model(s) definition (Json)
- Sharded LTR

- Pagination ( in sharded environment)
Classic Business Level Questions
- Given X,Y,Z input features , I would have expected a
different ranking -> can you fix this ?

Solution : no single line manual fix -> trial and error !
- How does the model work ? What are the most important
features ?

Solution : index the model to extract information such as 

- most frequent features in splits

- unique thresholds
Classic Business Level Questions
- What are good items generally ?

Solution : developed simple tool[1] to extract from the
model top scoring leaves
- Why given query X doc Y is higher scored than doc Z?

Solution : debug Solr score and investigate tree paths
[1] https://github.com/alessandrobenedetti/ltr-tools
Conclusions
- LTR is a promising and deep technology
- It requires effort ! ( it’s not as automatic as you think)
- Start collecting user feedback! (if you plan to use LTR)
- Good open source support available ( Apache Solr + ES)
- Not easy to debug/explain
Questions ?
From Academic Papers To Production : A Learning To Rank Story

From Academic Papers To Production : A Learning To Rank Story

  • 1.
    From Academic PapersTo Production: A Learning To Rank Story Alessandro Benedetti, Software Engineer, Sease Ltd.

  • 2.
    Alessandro Benedetti ● SearchConsultant ● R&D Software Engineer ● Master in Computer Science ● Apache Lucene/Solr Enthusiast ● Semantic, NLP, Machine Learning Technologies passionate ● Beach Volleyball Player & Snowboarder Who I am
  • 3.
    Search Services ● OpenSource Enthusiasts ● Apache Lucene/Solr experts ! Community Contributors ● Active Researchers ● Hot Trends : Learning To Rank, Document Similarity, Measuring Search Quality, Relevancy Tuning Sease Ltd
  • 4.
    ● Learning ToRank ! Technologies Involved ! Data Preparation ● Model Training ● Apache Solr Integration ! Conclusions Agenda
  • 5.
    Learning To Rank- What is it ? Learning from user implicit/explicit feedback To Rank documents (sensu lato)
  • 6.
    Learning To Rank- What is NOT - Sentient system that learn by itself - Continuously improving itself by ingesting additional feedback - Easy to set up and tune it - Easy to give a human understandable explanation of why the model operates in certain ways
  • 7.
    Learning To Rank- Technologies Used - Spring Boot[1] - RankLib[2] - Apache Solr >=6.4 [3] [1] https://projects.spring.io/spring-boot/ [2] https://sourceforge.net/p/lemur/wiki/RankLib/ [3] http://lucene.apache.org/solr/
  • 8.
    Data Preparation - Userfeedback harvesting 
 - Feature Engineering - Dataset clean-up - Training/Validation/Test split
  • 9.
    User Feedback Harvesting -Explicit user feedback ( Experts/Crowdsourcing)
 - Implicit user feedback ( eCommerce Sale Funnel)
 
 How to assign the relevance label ?
 - Signal intensity to model relevance ( sale > add to chart)
 - Identify a target signal, calculate rates and normalise Discordant training samples
  • 10.
    Feature Engineering - Querylevel / Document level / Query dependent
 - Ordinal/Categorical features -> one hot encoding
 - Missing values
 High Cardinality Categorical
  • 11.
    Data Set Cleanup 
 -Resample the dataset - Query Id Hashing - You need bad examples ! (NDCG -> not reflecting real quality) Oversampling by duplication -> over-fitting This strongly affects the evaluation metric (NDCG)
  • 12.
    Training/Validation/Test split - K-foldCross Validation 
 - Temporal Split - Manual split after shuffling Per rankList (subset of queryIds)
  • 13.
    Model Training - LambdaMART+ NDCG@K
 - Threshold Candidates Count For Splitting -> simplify! - Minimum Leaf Support -> remove outliers
 Reason : missing searched location from training set
  • 14.
    Apache Solr Solr isthe popular, blazing fast, open source NoSQL search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search and analytics, rich document parsing, geospatial search, extensive REST APIs as well as parallel SQL.
  • 15.
    Apache Solr Integration -Features definition (Json + Solr syntax) - Model(s) definition (Json) - Sharded LTR
 - Pagination ( in sharded environment)
  • 16.
    Classic Business LevelQuestions - Given X,Y,Z input features , I would have expected a different ranking -> can you fix this ?
 Solution : no single line manual fix -> trial and error ! - How does the model work ? What are the most important features ?
 Solution : index the model to extract information such as 
 - most frequent features in splits
 - unique thresholds
  • 17.
    Classic Business LevelQuestions - What are good items generally ?
 Solution : developed simple tool[1] to extract from the model top scoring leaves - Why given query X doc Y is higher scored than doc Z?
 Solution : debug Solr score and investigate tree paths [1] https://github.com/alessandrobenedetti/ltr-tools
  • 18.
    Conclusions - LTR isa promising and deep technology - It requires effort ! ( it’s not as automatic as you think) - Start collecting user feedback! (if you plan to use LTR) - Good open source support available ( Apache Solr + ES) - Not easy to debug/explain
  • 19.