Haystack 2019 - Autocomplete as Relevancy - Rimple Shah

OpenSource Connections
OpenSource ConnectionsPrincipal, OpenSource Connections and Solr Consultant at OpenSource Connections
Autocomplete as Relevancy
Haystack Search Relevance Conference
April 24, 2019
Rimple Shah
Revanth Malay
David Rhodes
2
LexisNexis
 Business – Information for Lawyers and other Professionals
 Mission: Advance the Rule of Law
 Flagship Products: Lexis Advance, Lexis Risk Solutions, Nexis
 Target Markets: Legal, Risk, Government, Academia, Professional Information
Users
 Customers in 130 countries
 Subsidiary of RELX (NYSE: RELX) since 1994
 Primary Direct Competitors: Dow Jones, Thomson Reuters, Wolters Kluwer,
Bloomberg
 > 10,000 employees worldwide
3
Agenda
 Autocomplete as Search Relevance
 Use case
 Customizing Solr for Autocomplete
 Architecture Design
 Evaluating Relevance
 Future Roadmap
4
Autocomplete is Search Relevance
5
Autocomplete in Lexis Advance
6
Who’s Your Data?
 The suggestion is the document
<doc>
<field name="query">obamacare</field>
<field name="display">patient protection and affordable care act</field>
<field name="token_count">1</field>
<field name="id">urn:label:7AC7C01A1EF546C4BCDF334557</field>
<field name="popularity">8464</field>
<field name="source">KRM</field>
<field name="region">United States</field>
</doc>
7
Where Do Suggestions Come From?
Users’ Queries LexisNexis legal experts
8
Data Preparation
9
Solr Suggester
 Built-in Solr Search Component
 Features
 Fast in-memory Finite State Transducer data structure
 Easy to add to an existing Solr index
10
• “index + query” approach for complex weights calculation, stop
words removal, or basic context filtrationFunctionality
• Lookups against in-memory FST work incredibly fast
• Performance of a well-tuned Solr Index is sufficient for this use
case
Performance
Should We Use Solr Suggester?
11
Basic Solr Configuration
 Keyword Tokenizer
 Lowercase Filter Factory
 EdgeNGram Filter Factory
 MinGramSize=3
 MaxGramSize=30
Motion to Dismiss
 mot
 moti
 motio
 motion
 motion t
 motion to
 motion to di
 motion to dis
 …
 Whitespace Tokenizer
 Lowercase Filter Factory
 EdgeNGram Filter Factory
 MinGramSize=1
 MaxGramSize=30
Motion to Dismiss
 m t d
 mo to di
 mot dis
 moti dism
 motio dismi
 motion dismis
 dismiss
 Whitespace Tokenizer
 Lowercase Filter Factory
Motion to Dismiss
 motion
 to
 dismiss
12
Term Frequency
motion to dis
T.F. = 1.0
motion to dismiss
plaintiff’s motion
motion to dismiss
Solution
Problem
motion to dis
13
EDisMax's pf2 Parameter
• Boost suggestions that have user query tokens next to each other.
• Example:
User Query: plaintiff’s rebuttal expert witness
Suggestions:
Doc1 : rebuttal expert witness | Score: 292
Doc2 : rebuttal witness and expert testimony | Score: 253
14
Preference for First Word Match
Insert an anchor term as the first token in index and query time.
Example :
User query : motion dismiss KXQHZ motion dismiss
Suggestions:
Documents Index
motion to dismiss with prejudice KXQHZ motion to dismiss with prejudice
dismiss motion with prejudice KXQHZ dismiss motion with prejudice
15
Incorrectly Matching On Partial Words
• Query suggestion incorrectly considers complete token as partial word and provides token
suggestions that start with the word.
User Query Documents Index
government is a virgin islands government act v i g a
vi is go ac
vir isl gov act
virg isla gove
virgi islan gover
virgin island govern
…….
government
16
Correctly Matching On Partial Words
• Condition 1: When user query has no trailing space
•Insert ‘xwkq’ in the beginning of the last token
User Query Documents Index
government is a
xwkq
virgin islands government act xwkqv xwkqi xwkqg xwkqa
xwkqvi xwkqis xwkqgo xwkqac
xwkqvir xwkqgov xwkqact
xwkqvirg xwkqgove
xwkqvirgi xwkqgover
xwkqvirgin xwkqgovern
…….
xwkqgovernment
17
Correctly Matching On Partial Words
• Condition 2: When user query has trailing space
• Rest of the Solr analyzers do the job here
User Query Documents Index
government is a_ virgin islands government act xwkqv xwkqi xwkqg xwkqa
xwkqvi xwkqis xwkqgo xwkqac
xwkqvir xwkqgov xwkqact
xwkqvirg xwkqgove
xwkqvirgi xwkqgover
xwkqvirgin xwkqgovern
…….
xwkqgovernment
18
Exact Token Match Before Stemmed & Synonym Match
^8
^6
^6
Standard Tokenizer Factory
+
Lowercase Filter Factory
Standard Tokenizer Factory
+
Lowercase Filter Factory
+
Snowball Porter Filter Factory
+
English Possessive Filter Factory
Standard Tokenizer Factory
+
Lowercase Filter Factory
+
Synonym Graph Filter Factory
19
Duplicate and Near Duplicate Suggestions
• Reduce the impression of repetitive suggestion by reduce the suggestion word from the same
root
• User Query: zoning var
• Suggestions:
20
Reduce Near Duplicate Suggestions
zone variance
zoning variance
zone variances
variance of zoning

variance_zone
21
• About 10-12 % of user queries to web search
engines have spelling errors
Spelling Correction
22
Spelling Correction
23
Architecture Design
24
Architecture Design
25
Offline Evaluation
26
Offline Evaluation Feedback
27
Online Evaluation
• Measure user engagement with
autocomplete suggestions
• Click-rate
• Mean Reciprocal Rank (MRR)
• Minimum Keystroke (MKS)
28
Future Roadmap
29
Any Questions ?
30
Thank You
• Rimple Shah
• rimple.shah@lexisnexis.com
• Revanth Malay
• revanth.malay@lexisnexis.com
• David Rhodes
• david.rhodes@lexisnexis.com
Stay
in touch
with us
1 of 30

Recommended

Tutorial on query auto-completion by
Tutorial on query auto-completionTutorial on query auto-completion
Tutorial on query auto-completionYichen Feng
1.9K views27 slides
Learning by
LearningLearning
LearningAmar Jukuntla
1.2K views72 slides
Mcq peresentation by
Mcq  peresentationMcq  peresentation
Mcq peresentationShah Jalal Hridoy
894 views40 slides
Netflix Global Search - Lucene Revolution by
Netflix Global Search - Lucene RevolutionNetflix Global Search - Lucene Revolution
Netflix Global Search - Lucene Revolutionivan provalov
2.5K views24 slides
Home-made-food-delivery-system(System Analysis & Design) by
 Home-made-food-delivery-system(System Analysis & Design) Home-made-food-delivery-system(System Analysis & Design)
Home-made-food-delivery-system(System Analysis & Design)Zahidul Islam Razu
3.3K views22 slides
SDLC for an e commerce website by
SDLC for an e commerce website SDLC for an e commerce website
SDLC for an e commerce website Jyotindra Zaveri
10.2K views13 slides

More Related Content

Similar to Haystack 2019 - Autocomplete as Relevancy - Rimple Shah

Proffer Blockchain Hackathon $17K+ prizes | Launch Presentation by
Proffer Blockchain Hackathon $17K+ prizes | Launch PresentationProffer Blockchain Hackathon $17K+ prizes | Launch Presentation
Proffer Blockchain Hackathon $17K+ prizes | Launch PresentationAnshul Bhagi
1.2K views19 slides
Reflected Intelligence: Lucene/Solr as a self-learning data system by
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
6.1K views64 slides
2016-08-22_winning_on_technicalities_for_linkedin by
2016-08-22_winning_on_technicalities_for_linkedin2016-08-22_winning_on_technicalities_for_linkedin
2016-08-22_winning_on_technicalities_for_linkedinDaniel Thornton
71 views65 slides
Software Engineering - chp2- requirements specification by
Software Engineering - chp2- requirements specificationSoftware Engineering - chp2- requirements specification
Software Engineering - chp2- requirements specificationLilia Sfaxi
4K views19 slides
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente... by
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
1.1K views64 slides
Building Search & Recommendation Engines by
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
6K views96 slides

Similar to Haystack 2019 - Autocomplete as Relevancy - Rimple Shah(20)

Proffer Blockchain Hackathon $17K+ prizes | Launch Presentation by Anshul Bhagi
Proffer Blockchain Hackathon $17K+ prizes | Launch PresentationProffer Blockchain Hackathon $17K+ prizes | Launch Presentation
Proffer Blockchain Hackathon $17K+ prizes | Launch Presentation
Anshul Bhagi1.2K views
Reflected Intelligence: Lucene/Solr as a self-learning data system by Trey Grainger
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
Trey Grainger6.1K views
2016-08-22_winning_on_technicalities_for_linkedin by Daniel Thornton
2016-08-22_winning_on_technicalities_for_linkedin2016-08-22_winning_on_technicalities_for_linkedin
2016-08-22_winning_on_technicalities_for_linkedin
Daniel Thornton71 views
Software Engineering - chp2- requirements specification by Lilia Sfaxi
Software Engineering - chp2- requirements specificationSoftware Engineering - chp2- requirements specification
Software Engineering - chp2- requirements specification
Lilia Sfaxi4K views
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente... by Lucidworks
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Lucidworks1.1K views
Building Search & Recommendation Engines by Trey Grainger
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
Trey Grainger6K views
[2].the requirement engineering handbook by Ngan Do
[2].the requirement engineering handbook[2].the requirement engineering handbook
[2].the requirement engineering handbook
Ngan Do59 views
LPO 2.0 - MARRYING TECHNOLOGY AND TALENT IN A NEW ERA OF eDISCOVERY by Kelly Services
LPO 2.0 - MARRYING TECHNOLOGY AND TALENT IN A NEW ERA OF eDISCOVERYLPO 2.0 - MARRYING TECHNOLOGY AND TALENT IN A NEW ERA OF eDISCOVERY
LPO 2.0 - MARRYING TECHNOLOGY AND TALENT IN A NEW ERA OF eDISCOVERY
Kelly Services2K views
The 21st Century Lawyer: Utilizing Technology Competency as a Marketing Tool by LegalTalkNet
The 21st Century Lawyer: Utilizing Technology Competency as a Marketing ToolThe 21st Century Lawyer: Utilizing Technology Competency as a Marketing Tool
The 21st Century Lawyer: Utilizing Technology Competency as a Marketing Tool
LegalTalkNet672 views
Reflected intelligence evolving self-learning data systems by Trey Grainger
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
Trey Grainger6.5K views
DPCL: a Language Template for Normative Specifications by Giovanni Sileno
DPCL: a Language Template for Normative SpecificationsDPCL: a Language Template for Normative Specifications
DPCL: a Language Template for Normative Specifications
Giovanni Sileno80 views
Automated Recommendation of Templates for Legal Requirements by Lionel Briand
Automated Recommendation of Templates for Legal RequirementsAutomated Recommendation of Templates for Legal Requirements
Automated Recommendation of Templates for Legal Requirements
Lionel Briand132 views
Hyperledger Fabric Application Development 20190618 by Arnaud Le Hors
Hyperledger Fabric Application Development 20190618Hyperledger Fabric Application Development 20190618
Hyperledger Fabric Application Development 20190618
Arnaud Le Hors1.6K views
IBM Cloud Côte D'Azur Meetup - 20181004 - Blockchain Hyperledger Workshop by IBM France Lab
IBM Cloud Côte D'Azur Meetup - 20181004 - Blockchain Hyperledger WorkshopIBM Cloud Côte D'Azur Meetup - 20181004 - Blockchain Hyperledger Workshop
IBM Cloud Côte D'Azur Meetup - 20181004 - Blockchain Hyperledger Workshop
IBM France Lab221 views
Best Practices: Complex Discovery in Corporations and Law Firms | Ryan Baker ... by Rob Robinson
Best Practices: Complex Discovery in Corporations and Law Firms | Ryan Baker ...Best Practices: Complex Discovery in Corporations and Law Firms | Ryan Baker ...
Best Practices: Complex Discovery in Corporations and Law Firms | Ryan Baker ...
Rob Robinson1K views
Lawyer competency in the age of e-discovery by Logikcull.com
Lawyer competency in the age of e-discovery Lawyer competency in the age of e-discovery
Lawyer competency in the age of e-discovery
Logikcull.com1.2K views
HPCC Systems Presentation to TDWI Chicago Chapter by HPCC Systems
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems780 views
The Intent Algorithms of Search & Recommendation Engines by Trey Grainger
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
Trey Grainger2.4K views

More from OpenSource Connections

Encores by
EncoresEncores
EncoresOpenSource Connections
2K views53 slides
Test driven relevancy by
Test driven relevancyTest driven relevancy
Test driven relevancyOpenSource Connections
272 views20 slides
How To Structure Your Search Team for Success by
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
162 views25 slides
The right path to making search relevant - Taxonomy Bootcamp London 2019 by
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
995 views56 slides
Payloads and OCR with Solr by
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with SolrOpenSource Connections
655 views22 slides
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull by
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullOpenSource Connections
498 views5 slides

More from OpenSource Connections(20)

The right path to making search relevant - Taxonomy Bootcamp London 2019 by OpenSource Connections
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull by OpenSource Connections
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison by OpenSource Connections
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ... by OpenSource Connections
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj by OpenSource Connections
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit... by OpenSource Connections
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl by OpenSource Connections
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger by OpenSource Connections
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh... by OpenSource Connections
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse... by OpenSource Connections
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Architectural considerations on search relevancy in the conte... by OpenSource Connections
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber... by OpenSource Connections
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Establishing a relevance focused culture in a large organizat... by OpenSource Connections
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz... by OpenSource Connections
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via by OpenSource Connections
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via

Recently uploaded

DGST Methodology Presentation.pdf by
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdfmaddierlegum
5 views9 slides
PRIVACY AWRE PERSONAL DATA STORAGE by
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGEantony420421
7 views56 slides
Listed Instruments Survey 2022.pptx by
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptxsecretariat4
31 views12 slides
[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines by
[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines
[DSC Europe 23] Luca Morena - From Psychohistory to Curious MachinesDataScienceConferenc1
5 views20 slides
CRM stick or twist workshop by
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshopinfo828217
12 views16 slides
CRIJ4385_Death Penalty_F23.pptx by
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptxyvettemm100
7 views24 slides

Recently uploaded(20)

DGST Methodology Presentation.pdf by maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum5 views
PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 views
Listed Instruments Survey 2022.pptx by secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat431 views
[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines by DataScienceConferenc1
[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines
[DSC Europe 23] Luca Morena - From Psychohistory to Curious Machines
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821712 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
LIVE OAK MEMORIAL PARK.pptx by ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 views
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf by DataScienceConferenc1
[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf
[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf by 10urkyr34
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
10urkyr346 views
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ... by DataScienceConferenc1
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...

Haystack 2019 - Autocomplete as Relevancy - Rimple Shah

  • 1. Autocomplete as Relevancy Haystack Search Relevance Conference April 24, 2019 Rimple Shah Revanth Malay David Rhodes
  • 2. 2 LexisNexis  Business – Information for Lawyers and other Professionals  Mission: Advance the Rule of Law  Flagship Products: Lexis Advance, Lexis Risk Solutions, Nexis  Target Markets: Legal, Risk, Government, Academia, Professional Information Users  Customers in 130 countries  Subsidiary of RELX (NYSE: RELX) since 1994  Primary Direct Competitors: Dow Jones, Thomson Reuters, Wolters Kluwer, Bloomberg  > 10,000 employees worldwide
  • 3. 3 Agenda  Autocomplete as Search Relevance  Use case  Customizing Solr for Autocomplete  Architecture Design  Evaluating Relevance  Future Roadmap
  • 6. 6 Who’s Your Data?  The suggestion is the document <doc> <field name="query">obamacare</field> <field name="display">patient protection and affordable care act</field> <field name="token_count">1</field> <field name="id">urn:label:7AC7C01A1EF546C4BCDF334557</field> <field name="popularity">8464</field> <field name="source">KRM</field> <field name="region">United States</field> </doc>
  • 7. 7 Where Do Suggestions Come From? Users’ Queries LexisNexis legal experts
  • 9. 9 Solr Suggester  Built-in Solr Search Component  Features  Fast in-memory Finite State Transducer data structure  Easy to add to an existing Solr index
  • 10. 10 • “index + query” approach for complex weights calculation, stop words removal, or basic context filtrationFunctionality • Lookups against in-memory FST work incredibly fast • Performance of a well-tuned Solr Index is sufficient for this use case Performance Should We Use Solr Suggester?
  • 11. 11 Basic Solr Configuration  Keyword Tokenizer  Lowercase Filter Factory  EdgeNGram Filter Factory  MinGramSize=3  MaxGramSize=30 Motion to Dismiss  mot  moti  motio  motion  motion t  motion to  motion to di  motion to dis  …  Whitespace Tokenizer  Lowercase Filter Factory  EdgeNGram Filter Factory  MinGramSize=1  MaxGramSize=30 Motion to Dismiss  m t d  mo to di  mot dis  moti dism  motio dismi  motion dismis  dismiss  Whitespace Tokenizer  Lowercase Filter Factory Motion to Dismiss  motion  to  dismiss
  • 12. 12 Term Frequency motion to dis T.F. = 1.0 motion to dismiss plaintiff’s motion motion to dismiss Solution Problem motion to dis
  • 13. 13 EDisMax's pf2 Parameter • Boost suggestions that have user query tokens next to each other. • Example: User Query: plaintiff’s rebuttal expert witness Suggestions: Doc1 : rebuttal expert witness | Score: 292 Doc2 : rebuttal witness and expert testimony | Score: 253
  • 14. 14 Preference for First Word Match Insert an anchor term as the first token in index and query time. Example : User query : motion dismiss KXQHZ motion dismiss Suggestions: Documents Index motion to dismiss with prejudice KXQHZ motion to dismiss with prejudice dismiss motion with prejudice KXQHZ dismiss motion with prejudice
  • 15. 15 Incorrectly Matching On Partial Words • Query suggestion incorrectly considers complete token as partial word and provides token suggestions that start with the word. User Query Documents Index government is a virgin islands government act v i g a vi is go ac vir isl gov act virg isla gove virgi islan gover virgin island govern ……. government
  • 16. 16 Correctly Matching On Partial Words • Condition 1: When user query has no trailing space •Insert ‘xwkq’ in the beginning of the last token User Query Documents Index government is a xwkq virgin islands government act xwkqv xwkqi xwkqg xwkqa xwkqvi xwkqis xwkqgo xwkqac xwkqvir xwkqgov xwkqact xwkqvirg xwkqgove xwkqvirgi xwkqgover xwkqvirgin xwkqgovern ……. xwkqgovernment
  • 17. 17 Correctly Matching On Partial Words • Condition 2: When user query has trailing space • Rest of the Solr analyzers do the job here User Query Documents Index government is a_ virgin islands government act xwkqv xwkqi xwkqg xwkqa xwkqvi xwkqis xwkqgo xwkqac xwkqvir xwkqgov xwkqact xwkqvirg xwkqgove xwkqvirgi xwkqgover xwkqvirgin xwkqgovern ……. xwkqgovernment
  • 18. 18 Exact Token Match Before Stemmed & Synonym Match ^8 ^6 ^6 Standard Tokenizer Factory + Lowercase Filter Factory Standard Tokenizer Factory + Lowercase Filter Factory + Snowball Porter Filter Factory + English Possessive Filter Factory Standard Tokenizer Factory + Lowercase Filter Factory + Synonym Graph Filter Factory
  • 19. 19 Duplicate and Near Duplicate Suggestions • Reduce the impression of repetitive suggestion by reduce the suggestion word from the same root • User Query: zoning var • Suggestions:
  • 20. 20 Reduce Near Duplicate Suggestions zone variance zoning variance zone variances variance of zoning  variance_zone
  • 21. 21 • About 10-12 % of user queries to web search engines have spelling errors Spelling Correction
  • 27. 27 Online Evaluation • Measure user engagement with autocomplete suggestions • Click-rate • Mean Reciprocal Rank (MRR) • Minimum Keystroke (MKS)
  • 30. 30 Thank You • Rimple Shah • rimple.shah@lexisnexis.com • Revanth Malay • revanth.malay@lexisnexis.com • David Rhodes • david.rhodes@lexisnexis.com Stay in touch with us