SlideShare a Scribd company logo
Learning To Rank
with Apache Solr and Fusion
Trey Grainger
Chief Algorithms Officer
Andy Liu
Senior Data Scientist
The Problem: Classic Similarity
Keyword search isn’t always enough for relevance
The Problem
User searches for “outdoor rock speaker”
Should see this:Sees this:
The Problem
• Improving search relevance is hard,
• TF-IDF and BM25 are good for text-keyword but what about other
models of relevance?
• Text matching is sometimes not the best solution
• Users don’t always say what they mean
The Solution:
Fusion 4 Signals + Solr 7 LTR
The Solution : Learning to Rank Overview
• Learning to rank lets you pick “features” of a document that
“matter” and teach the machine how to rank a set of items.
• One possible source of ordering is user behavior (i.e. the only clicks were
on the speaker shaped like a rock)
• Solr provides a Learning to Rank implementation.
• Fusion provides a way of capturing user behavior through signals.
The Solution: Learning to Rank Overview
The Solution: Learning to Rank Overview
The Solution: Fusion Signals Overview
The Solution
• Define features (relevancy factors)
• Derive Ground Truth using Fusion’s signals
• Use Solr’s Learning to Rank implementation
Some notes
• Fusion’s normal click boosting is an alternative and pretty good
• It is possible to use them together or one where the other
doesn’t work
• Do other more simple things first, learning to rank without an
adequate schema won’t accomplish much.
Some notes
• Using click signals for ground truth
• Pros:
• Voluminous
• Cheap
• Reflects a captive user’s intent (especially when supplemented with purchase, add to
cart events)
• Tacitly, implicitly labeled data the key to an OOTB “self-learning” system
• Cons
• Noisy
• Potential for reinforcing existing ranking
Putting it together
Building an LTR Pipeline
…but is it better?
• Models compared:
• Solr Out-of-the-box BM25
ranking using textual
features only
• Logistic Regression using all
features except the signals
feature
• Logistic Regression using all
features
Why is it better?
• Summary of Benefits:
• LTR offers automated relevancy tuning
• Using Fusion to implement LTR greatly reduces the time and complexity
required to train and deploy LTR models in production
• Leveraging Fusion’s signals as features in an LTR model offers an easy way of
boosting search relevance performance beyond what is possible using textual
features alone
A/B and experiments
• Do this carefully.
• A/B testing is the safest way to make sure you don’t ruin different user
experiences.
• Stay tuned for a future webinar on Experiments and A/B testing
Where to learn more?
• Grab the technical paper (with step by step instructions):
https://lucidworks.com/ebook/learning-to-rank/
• Grab the code: https://github.com/lucidworks/fusion-ltr-
webinar#fusionsolr-setup
Thank you
Register by Sep 6
to save $200
SEPTEMBER 9-12,
2019 WASHINGTON DC
Check out the site here: https:/ / activate-conf.com/
JOIN ANDY AND TREY AT ACTIVATE
• Productionizing Python ML Models Using Fusion 5, Sanket Shahane,
Andy Liu
• Natural Language Search with Knowledge Graphs, Trey Grainger
• Closing Keynote: The Next Generation of AI-powered Search, Trey
Grainger
AI, ML & DATA SCIENCE TRACK
• Supporting Query Tagging/Suggestion in Fusion 4.2, Uber
• Building a Health QA Chatbot with Solr, Healthwise Incorporated
• Tackling a “Small Data” Search Challenge at Airbnb Experiences,
Airbnb
• Using Deep Learning and Customized Solr Components to Improve
Search Relevancy at Target, Target
THE SEARCH AND AI
CONFERENCE
SEPTEMBER 9-
12,2019 WASHINGTON DC
Check out the site here: https://activate-conf.com/
Register by Sep 6
to save $200
LIVE Q&A:
Enter your questions in the chat box now
for Trey to answer live

More Related Content

Similar to Learning to Rank with Apache Solr and Fusion

50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16
Christian Berg
 
Introduction to Test Driven Development
Introduction to Test Driven DevelopmentIntroduction to Test Driven Development
Introduction to Test Driven Development
Siva Arunachalam
 
odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
Sanghamitra Deb
 
Sharpest tool in the box: Choosing the right authoring tool for your learning...
Sharpest tool in the box: Choosing the right authoring tool for your learning...Sharpest tool in the box: Choosing the right authoring tool for your learning...
Sharpest tool in the box: Choosing the right authoring tool for your learning...
Brightwave Group
 
Retrieval Performance Bound Analysis for Single Term Queries
Retrieval Performance Bound Analysis for Single Term QueriesRetrieval Performance Bound Analysis for Single Term Queries
Retrieval Performance Bound Analysis for Single Term Queries
Twitter Inc.
 
TDD - Seriously, try it! (updated '22)
TDD - Seriously, try it! (updated '22)TDD - Seriously, try it! (updated '22)
TDD - Seriously, try it! (updated '22)
Nacho Cougil
 
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Into...
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Into...Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Into...
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Into...
Ortus Solutions, Corp
 
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Adob...
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Adob...Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Adob...
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Adob...
Uma Ghotikar
 
Eurosport's Kodakademi #2
Eurosport's Kodakademi #2Eurosport's Kodakademi #2
Eurosport's Kodakademi #2
Benjamin Baumann
 
Avoiding test hell
Avoiding test hellAvoiding test hell
Avoiding test hell
Yun Ki Lee
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
Simon Hughes
 
Sample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdfSample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdf
SURYAPRAKASH281978
 
Mulesoft torronto meetup_16
Mulesoft torronto meetup_16Mulesoft torronto meetup_16
Mulesoft torronto meetup_16
Anurag Dwivedi
 
Critical Capabilities to Shifting Left the Right Way
Critical Capabilities to Shifting Left the Right WayCritical Capabilities to Shifting Left the Right Way
Critical Capabilities to Shifting Left the Right Way
SmartBear
 
Client Technical Analysis of Legacy Software and Future Replacement
Client Technical Analysis of Legacy Software and Future ReplacementClient Technical Analysis of Legacy Software and Future Replacement
Client Technical Analysis of Legacy Software and Future Replacement
VictorSzoltysek
 
Performance and Abstractions
Performance and AbstractionsPerformance and Abstractions
Performance and Abstractions
Metosin Oy
 
TDD - Seriously, try it! - Trójmiasto Java User Group (17th May '23)
TDD - Seriously, try it! - Trójmiasto Java User Group (17th May '23)TDD - Seriously, try it! - Trójmiasto Java User Group (17th May '23)
TDD - Seriously, try it! - Trójmiasto Java User Group (17th May '23)
ssusercaf6c1
 
TDD - Seriously, try it! - Trjjmiasto JUG (17th May '23)
TDD - Seriously, try it! - Trjjmiasto JUG (17th May '23)TDD - Seriously, try it! - Trjjmiasto JUG (17th May '23)
TDD - Seriously, try it! - Trjjmiasto JUG (17th May '23)
Nacho Cougil
 
Building trust within the organization, first steps towards DevOps
Building trust within the organization, first steps towards DevOpsBuilding trust within the organization, first steps towards DevOps
Building trust within the organization, first steps towards DevOps
Guido Serra
 

Similar to Learning to Rank with Apache Solr and Fusion (20)

50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16
 
Introduction to Test Driven Development
Introduction to Test Driven DevelopmentIntroduction to Test Driven Development
Introduction to Test Driven Development
 
odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Sharpest tool in the box: Choosing the right authoring tool for your learning...
Sharpest tool in the box: Choosing the right authoring tool for your learning...Sharpest tool in the box: Choosing the right authoring tool for your learning...
Sharpest tool in the box: Choosing the right authoring tool for your learning...
 
Retrieval Performance Bound Analysis for Single Term Queries
Retrieval Performance Bound Analysis for Single Term QueriesRetrieval Performance Bound Analysis for Single Term Queries
Retrieval Performance Bound Analysis for Single Term Queries
 
TDD - Seriously, try it! (updated '22)
TDD - Seriously, try it! (updated '22)TDD - Seriously, try it! (updated '22)
TDD - Seriously, try it! (updated '22)
 
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Into...
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Into...Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Into...
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Into...
 
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Adob...
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Adob...Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Adob...
Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Adob...
 
Eurosport's Kodakademi #2
Eurosport's Kodakademi #2Eurosport's Kodakademi #2
Eurosport's Kodakademi #2
 
Avoiding test hell
Avoiding test hellAvoiding test hell
Avoiding test hell
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Sample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdfSample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdf
 
Mulesoft torronto meetup_16
Mulesoft torronto meetup_16Mulesoft torronto meetup_16
Mulesoft torronto meetup_16
 
Critical Capabilities to Shifting Left the Right Way
Critical Capabilities to Shifting Left the Right WayCritical Capabilities to Shifting Left the Right Way
Critical Capabilities to Shifting Left the Right Way
 
Client Technical Analysis of Legacy Software and Future Replacement
Client Technical Analysis of Legacy Software and Future ReplacementClient Technical Analysis of Legacy Software and Future Replacement
Client Technical Analysis of Legacy Software and Future Replacement
 
Performance and Abstractions
Performance and AbstractionsPerformance and Abstractions
Performance and Abstractions
 
TDD - Seriously, try it! - Trójmiasto Java User Group (17th May '23)
TDD - Seriously, try it! - Trójmiasto Java User Group (17th May '23)TDD - Seriously, try it! - Trójmiasto Java User Group (17th May '23)
TDD - Seriously, try it! - Trójmiasto Java User Group (17th May '23)
 
TDD - Seriously, try it! - Trjjmiasto JUG (17th May '23)
TDD - Seriously, try it! - Trjjmiasto JUG (17th May '23)TDD - Seriously, try it! - Trjjmiasto JUG (17th May '23)
TDD - Seriously, try it! - Trjjmiasto JUG (17th May '23)
 
Building trust within the organization, first steps towards DevOps
Building trust within the organization, first steps towards DevOpsBuilding trust within the organization, first steps towards DevOps
Building trust within the organization, first steps towards DevOps
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 

Recently uploaded (20)

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
 

Learning to Rank with Apache Solr and Fusion

  • 1. Learning To Rank with Apache Solr and Fusion Trey Grainger Chief Algorithms Officer Andy Liu Senior Data Scientist
  • 2. The Problem: Classic Similarity Keyword search isn’t always enough for relevance
  • 3. The Problem User searches for “outdoor rock speaker” Should see this:Sees this:
  • 4. The Problem • Improving search relevance is hard, • TF-IDF and BM25 are good for text-keyword but what about other models of relevance? • Text matching is sometimes not the best solution • Users don’t always say what they mean
  • 5. The Solution: Fusion 4 Signals + Solr 7 LTR
  • 6. The Solution : Learning to Rank Overview • Learning to rank lets you pick “features” of a document that “matter” and teach the machine how to rank a set of items. • One possible source of ordering is user behavior (i.e. the only clicks were on the speaker shaped like a rock) • Solr provides a Learning to Rank implementation. • Fusion provides a way of capturing user behavior through signals.
  • 7. The Solution: Learning to Rank Overview
  • 8. The Solution: Learning to Rank Overview
  • 9. The Solution: Fusion Signals Overview
  • 10. The Solution • Define features (relevancy factors) • Derive Ground Truth using Fusion’s signals • Use Solr’s Learning to Rank implementation
  • 11. Some notes • Fusion’s normal click boosting is an alternative and pretty good • It is possible to use them together or one where the other doesn’t work • Do other more simple things first, learning to rank without an adequate schema won’t accomplish much.
  • 12. Some notes • Using click signals for ground truth • Pros: • Voluminous • Cheap • Reflects a captive user’s intent (especially when supplemented with purchase, add to cart events) • Tacitly, implicitly labeled data the key to an OOTB “self-learning” system • Cons • Noisy • Potential for reinforcing existing ranking
  • 14. Building an LTR Pipeline
  • 15. …but is it better? • Models compared: • Solr Out-of-the-box BM25 ranking using textual features only • Logistic Regression using all features except the signals feature • Logistic Regression using all features
  • 16. Why is it better? • Summary of Benefits: • LTR offers automated relevancy tuning • Using Fusion to implement LTR greatly reduces the time and complexity required to train and deploy LTR models in production • Leveraging Fusion’s signals as features in an LTR model offers an easy way of boosting search relevance performance beyond what is possible using textual features alone
  • 17. A/B and experiments • Do this carefully. • A/B testing is the safest way to make sure you don’t ruin different user experiences. • Stay tuned for a future webinar on Experiments and A/B testing
  • 18. Where to learn more? • Grab the technical paper (with step by step instructions): https://lucidworks.com/ebook/learning-to-rank/ • Grab the code: https://github.com/lucidworks/fusion-ltr- webinar#fusionsolr-setup
  • 20. Register by Sep 6 to save $200 SEPTEMBER 9-12, 2019 WASHINGTON DC Check out the site here: https:/ / activate-conf.com/ JOIN ANDY AND TREY AT ACTIVATE • Productionizing Python ML Models Using Fusion 5, Sanket Shahane, Andy Liu • Natural Language Search with Knowledge Graphs, Trey Grainger • Closing Keynote: The Next Generation of AI-powered Search, Trey Grainger AI, ML & DATA SCIENCE TRACK • Supporting Query Tagging/Suggestion in Fusion 4.2, Uber • Building a Health QA Chatbot with Solr, Healthwise Incorporated • Tackling a “Small Data” Search Challenge at Airbnb Experiences, Airbnb • Using Deep Learning and Customized Solr Components to Improve Search Relevancy at Target, Target THE SEARCH AND AI CONFERENCE SEPTEMBER 9- 12,2019 WASHINGTON DC Check out the site here: https://activate-conf.com/ Register by Sep 6 to save $200
  • 21. LIVE Q&A: Enter your questions in the chat box now for Trey to answer live