SlideShare a Scribd company logo
1 of 21
STAY CONNECTED
Twitter @activate_conf
Facebook @activateconf
#Activate19
Log in to wifi, follow Activate on social media,
and download the event app where you can
submit an evaluation after the session
WIFI NETWORK: Activate2019
PASSWORD: Lucidworks
DOWNLOAD THE ACTIVATE 2019 MOBILE APP
Search Activate2019 in the App/Play store
Or visit: http://crowd.cc/activate19
Elevation query
Solr plugin
Speaker Slide
R O B E R T
K I R C H G E S S N E R
Search Technology Architect
Wolters Kluwer
E X P E R I E N C E
• Search Algorithms Development
• Content Analysis
• Entity Recognition
• Solr plugins / extensions
• Strong software development experience for about 14 years in different commercial projects
• Last 4 years working on search expertise, particularly with Apache Solr and cloud-based solution for this
including availability and scalability.
• Customers: Wolters Kluwer, TRAFIGURA, Daadkracht...
N A Z A R S E N I U K
Lead Software Engineer
EPAM
Agenda
• Motivation
• Implementation Idea
• Implementation status
• Case study autosuggest
• Summary
Some background
• Developing search applications for legal market since 2003
• Inhomogeneous, structured content, rich metadata (laws, cases, commentaries)
• Use of metadata for ranking is essential for good results
• Up to 30% of queries contain legal / other entities
• Relying on query cooking using entity recognition in the user input
• Combining with full text search and tuning the results becomes a challenge
Example
User input: § 123 BGB
Transformed to queries Q1, Q2, Q3, Q4
Expected output:
• § 123 BGB (law document)
• Legal commentary A to § 123 BGB (promoted content)
• Legal commentary B to § 123 BGB (promoted content)
• Some latest cases based on § 123 BGB (relevant content)
• Full text (or whatever needed)
How to achieve?
Requirements
C O N T E N T S T R U C T U R E
• Handle entities in the user input properly: legal citations, locations, dates, names
– e.g. place the correct document cited in the query on the top
– given a book title place an entry document (table of contents) on the top
• Top (1-5) hits expected to be unambiguous
• Use the top slots efficiently (10-100 hits)
• Keep balance between numerous document types (legal cases) and relevant or promoted
document types
Generally more precise control of what is going on in the top 10
Possible solutions
• Boost factors on queries, terms, documents
• Sort fields
• Ranking functions
• Function queries
• Reranking (in Solr or application)
• Filtering
• Multiple requests
Works, but…
• Some are too complex
• Some are too slow
• Others are not reliable
• Missing a concept of subquery:
– tracking from which subquery a document is coming from
• Missing LIMIT as in SQL
Example continued
User input: § 123 BGB
Transformed to queries Q1, Q2, Q3, Q4
Expected output:
• § 123 BGB (law document)
• Legal commentary A to § 123 BGB (promoted content)
• Legal commentary B to § 123 BGB (promoted content)
• Some latest cases based on § 123 BGB (relevant content)
• Full text (or whatever needed)
Want the request look like: Q1 << Q2 << Q3 << Q4
Elevation query
Initial Idea / Specification
Given a list of queries Q1, Q2, …, QN produce a result fulfilling the conditions:
• All the documents of Qn are placed before the documents of Qm for m>n
• Each hit should occur in the leftmost possible subset
• No duplication of hits
• Meaningful scores
• Correct faceting
Elevation query
Additional requirements / expectations
• One request / one pass search
• Usable via some new syntax / parser support
• Implemented as plugin
Furthermore it should be possible to
• impose a limit on the results of each subquery
• provide a sort parameter for each subquery
Implementation
Idea
Where to start
• TopFieldCollector.collect
• TFC manages a priority queue
• The priority queue is parametrized with
size and sorting
• DisjunctionMaxQuery:
– „generates the union of documents produced by
its subqueries“
Q1
8
71
7
28
6
13
5
23
4
50
3
10
2
31
1
23
7
28
6
13
4
50
3
10
1
23
9
66
8
71
7
28
6
13
5
23
4
50
3
10
2
31
1
23
1
23
10
42
9
66
8
71
7
28
6
13
5
23
4
50
3
10
2
31
2
31
11
63
10
42
9
66
8
71
7
28
6
13
5
23
4
50
3
10
3
10
12
19
11
63
10
42
9
66
8
71
7
28
6
13
5
23
4
50
4
50
1
23
3
10
13
36
12
19
11
63
10
42
9
66
8
71
7
28
6
13
5
23
14
47
13
36
12
19
11
63
10
42
9
66
8
71
7
28
6
13
6
13
3
10
15
99
14
47
13
36
12
19
11
63
10
42
9
66
8
71
7
28
7
28
1
23
6
13
Implementation
Idea
Where to go
• Provide more than one queue to collector
• Propagate information from
DisjunctionMaxScorer to the collector
• Some additional bookkeeping
– Scores
– Sort field values
– Subquery index
– (Facets)
Q3
Q2
Q1
87654321 87654321 98765432
1
50
1
-
1
-
1
50
1
50
1
-
1
-
109876543
1
50
2
-
2
43
2
-
1
50
2
-
2
43
2
43
2
-
1110987654
1
50
3
-
2
43
3
31
3
55
1110987654
1
50
3
-
2
43
3
31
3
31
3
55
12111098765
1
50
4
51
2
43
3
31
4
-
4
76
4
51
1
50
4
51
2
43
3
31
4
-
4
76
131211109876
2
43
3
31
5
-
5
-
4
51
1
50
5
74
131211109876
2
43
3
31
5
-
5
-
5
74
4
51
5
74
2
43
3
31
4
76
5
74
4
51
5
74
2
43
3
31
4
76
4
76
5
74
4
51
5
74
1413121110987
2
43
3
31
6
88
4
76
6
-
5
74
4
51
6
-
1413121110987
6
88
2
43
3
31
4
76
6
-
5
74
4
51
6
-
15141312111098
6
88
2
43
3
31
7
12
4
76
7
99
5
74
4
51
7
-
15141312111098
6
88
2
43
3
31
7
12
7
12
4
76
7
99
5
74
4
51
7
-
15141312111098
7
12
15141312111099
6
88
2
43
3
31
7
12
8
-
4
76
8
55
5
74
4
51
8
-
15141312111099
6
88
2
43
3
31
7
12
8
-
4
76
8
55
8
55
5
74
4
51
8
-
15141312111099
Implementation status
• https://github.com/rokirx/solr-eq
• Working
– Collector logic / multiple queues
– Sort and limit parameter per subquery
– Parser support
• In testing
– Correct scoring
– Faceting
– Multiple sort fields per subquery
• Works with 6.4, 7.6, 8.0, 8.2
Case Study: Autosuggest
User Input tax
• Assumptions on the relevancy of completion:
– Highest priority if the term at the beginning and exact match, eg tax relief
– Lower priority exact match but term not at the beginnilng, eg income tax
– Lowest priority prefix match anywhere in the phrase, eg estate taxes
• Map this condition to queries:
– Term at the beginning of a phrase and exact match: ^tax$
– Exact match in the middle of a phrase: tax$
– Prefix match (edge n-gram): tax
Case Study: Autosuggest
User Input tax
• Resulting query: ^tax$ << tax$ << tax guarantees the specified behavior
• Additional benefit: optimize the performance by cancelling out subqueries
– If the exact hit count is not necessary
– And the minimum required number of hits in the preceeding queues is collected
– Stop fetching the docs from lower priority queue by cancelling them out of the collector/scorer
– Whitout missing out any relevant documents
Potential benefits
• Reduce the number of search requests
• Reduce the complexity of the architecture
• Additional dimension to control rank
• Pluggable, easy to evaluate
• Improve performance through runtime subquery cancellation
Summary
It is technically possible to implement a concept of subquery into Solr/Lucene
• Single request / one pass collection of results
• Individual limits on each subquery
• Individual sort parameters on each subquery
• Optimization if no total hits number needed
– cancel lower prioritized subqueries during evaluation without affecting top hits
• Plugin
THANK YOU

More Related Content

What's hot

Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...HostedbyConfluent
 
Telling the LivePerson Technology Story at Couchbase [SF] 2013
Telling the LivePerson Technology Story at Couchbase [SF] 2013Telling the LivePerson Technology Story at Couchbase [SF] 2013
Telling the LivePerson Technology Story at Couchbase [SF] 2013LivePerson
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian LitaKafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian Litaconfluent
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike SpicerKafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicerconfluent
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Rapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixRapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixData Con LA
 
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...confluent
 
Ml sprint16 thesis_intro
Ml sprint16 thesis_introMl sprint16 thesis_intro
Ml sprint16 thesis_introThanhNguyen3805
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyftmarkgrover
 
Empowering Zillow’s Developers with Self-Service ETL
Empowering Zillow’s Developers with Self-Service ETLEmpowering Zillow’s Developers with Self-Service ETL
Empowering Zillow’s Developers with Self-Service ETLDatabricks
 
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceSpark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceWei Di
 
Correlate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a JediCorrelate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a JediTrevor Parsons
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkFlink Forward
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalizationShriya Arora
 
Stream Processing @ Lyft
Stream Processing @ LyftStream Processing @ Lyft
Stream Processing @ LyftJamie Grier
 
Spline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured StreamingSpline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured StreamingVaclav Kosar
 

What's hot (20)

Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
 
Telling the LivePerson Technology Story at Couchbase [SF] 2013
Telling the LivePerson Technology Story at Couchbase [SF] 2013Telling the LivePerson Technology Story at Couchbase [SF] 2013
Telling the LivePerson Technology Story at Couchbase [SF] 2013
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian LitaKafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike SpicerKafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Rapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixRapid Data Analytics @ Netflix
Rapid Data Analytics @ Netflix
 
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
 
Ml sprint16 thesis_intro
Ml sprint16 thesis_introMl sprint16 thesis_intro
Ml sprint16 thesis_intro
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyft
 
Empowering Zillow’s Developers with Self-Service ETL
Empowering Zillow’s Developers with Self-Service ETLEmpowering Zillow’s Developers with Self-Service ETL
Empowering Zillow’s Developers with Self-Service ETL
 
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceSpark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
 
Correlate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a JediCorrelate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a Jedi
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalization
 
Stream Processing @ Lyft
Stream Processing @ LyftStream Processing @ Lyft
Stream Processing @ Lyft
 
Spline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured StreamingSpline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured Streaming
 

Similar to STAY CONNECTED WITH ACTIVATE 2019

Schema on read with runtime fields
Schema on read with runtime fieldsSchema on read with runtime fields
Schema on read with runtime fieldsElasticsearch
 
Activate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifiedsActivate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifiedsRoger Rafanell Mas
 
The Evolution of Testing Methodology at AWS: From Status Quo to Formal Method...
The Evolution of Testing Methodology at AWS: From Status Quo to Formal Method...The Evolution of Testing Methodology at AWS: From Status Quo to Formal Method...
The Evolution of Testing Methodology at AWS: From Status Quo to Formal Method...C4Media
 
WIPS Global Brochure, New
WIPS Global Brochure, NewWIPS Global Brochure, New
WIPS Global Brochure, Newshikha gupta
 
Behind the Wizard’s Curtain: Scalability and Security at Zuora (Subscribed13)
Behind the Wizard’s Curtain:  Scalability and Security at Zuora (Subscribed13)Behind the Wizard’s Curtain:  Scalability and Security at Zuora (Subscribed13)
Behind the Wizard’s Curtain: Scalability and Security at Zuora (Subscribed13)Zuora, Inc.
 
Enhancements on Spark SQL optimizer by Min Qiu
Enhancements on Spark SQL optimizer by Min QiuEnhancements on Spark SQL optimizer by Min Qiu
Enhancements on Spark SQL optimizer by Min QiuSpark Summit
 
MakeServiceContractEasy_NEOAUG_20120611
MakeServiceContractEasy_NEOAUG_20120611MakeServiceContractEasy_NEOAUG_20120611
MakeServiceContractEasy_NEOAUG_20120611Ravindra Tripathi
 
Monolithic to microservices
Monolithic to microservicesMonolithic to microservices
Monolithic to microservicesRonald Hsu
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant confluent
 
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Lucidworks
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0Matt Lucas
 
SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunk
 
SplunkLive! Advanced Session
SplunkLive! Advanced SessionSplunkLive! Advanced Session
SplunkLive! Advanced SessionSplunk
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentRTTS
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology confluent
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Databricks
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresCrai Macdonald
 
Replicating One Billion Records with Minimal API Usage
Replicating One Billion Records with Minimal API UsageReplicating One Billion Records with Minimal API Usage
Replicating One Billion Records with Minimal API UsageSalesforce Developers
 

Similar to STAY CONNECTED WITH ACTIVATE 2019 (20)

Schema on read with runtime fields
Schema on read with runtime fieldsSchema on read with runtime fields
Schema on read with runtime fields
 
Activate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifiedsActivate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifieds
 
The Evolution of Testing Methodology at AWS: From Status Quo to Formal Method...
The Evolution of Testing Methodology at AWS: From Status Quo to Formal Method...The Evolution of Testing Methodology at AWS: From Status Quo to Formal Method...
The Evolution of Testing Methodology at AWS: From Status Quo to Formal Method...
 
WIPS Global Brochure, New
WIPS Global Brochure, NewWIPS Global Brochure, New
WIPS Global Brochure, New
 
Behind the Wizard’s Curtain: Scalability and Security at Zuora (Subscribed13)
Behind the Wizard’s Curtain:  Scalability and Security at Zuora (Subscribed13)Behind the Wizard’s Curtain:  Scalability and Security at Zuora (Subscribed13)
Behind the Wizard’s Curtain: Scalability and Security at Zuora (Subscribed13)
 
Enhancements on Spark SQL optimizer by Min Qiu
Enhancements on Spark SQL optimizer by Min QiuEnhancements on Spark SQL optimizer by Min Qiu
Enhancements on Spark SQL optimizer by Min Qiu
 
MakeServiceContractEasy_NEOAUG_20120611
MakeServiceContractEasy_NEOAUG_20120611MakeServiceContractEasy_NEOAUG_20120611
MakeServiceContractEasy_NEOAUG_20120611
 
Monolithic to microservices
Monolithic to microservicesMonolithic to microservices
Monolithic to microservices
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
 
SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with Splunk
 
SplunkLive! Advanced Session
SplunkLive! Advanced SessionSplunkLive! Advanced Session
SplunkLive! Advanced Session
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing Infrastructures
 
Spm lecture-3
Spm lecture-3Spm lecture-3
Spm lecture-3
 
Lect3
Lect3Lect3
Lect3
 
Replicating One Billion Records with Minimal API Usage
Replicating One Billion Records with Minimal API UsageReplicating One Billion Records with Minimal API Usage
Replicating One Billion Records with Minimal API Usage
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 

STAY CONNECTED WITH ACTIVATE 2019

  • 1. STAY CONNECTED Twitter @activate_conf Facebook @activateconf #Activate19 Log in to wifi, follow Activate on social media, and download the event app where you can submit an evaluation after the session WIFI NETWORK: Activate2019 PASSWORD: Lucidworks DOWNLOAD THE ACTIVATE 2019 MOBILE APP Search Activate2019 in the App/Play store Or visit: http://crowd.cc/activate19
  • 2.
  • 4. Speaker Slide R O B E R T K I R C H G E S S N E R Search Technology Architect Wolters Kluwer E X P E R I E N C E • Search Algorithms Development • Content Analysis • Entity Recognition • Solr plugins / extensions • Strong software development experience for about 14 years in different commercial projects • Last 4 years working on search expertise, particularly with Apache Solr and cloud-based solution for this including availability and scalability. • Customers: Wolters Kluwer, TRAFIGURA, Daadkracht... N A Z A R S E N I U K Lead Software Engineer EPAM
  • 5. Agenda • Motivation • Implementation Idea • Implementation status • Case study autosuggest • Summary
  • 6. Some background • Developing search applications for legal market since 2003 • Inhomogeneous, structured content, rich metadata (laws, cases, commentaries) • Use of metadata for ranking is essential for good results • Up to 30% of queries contain legal / other entities • Relying on query cooking using entity recognition in the user input • Combining with full text search and tuning the results becomes a challenge
  • 7. Example User input: § 123 BGB Transformed to queries Q1, Q2, Q3, Q4 Expected output: • § 123 BGB (law document) • Legal commentary A to § 123 BGB (promoted content) • Legal commentary B to § 123 BGB (promoted content) • Some latest cases based on § 123 BGB (relevant content) • Full text (or whatever needed) How to achieve?
  • 8. Requirements C O N T E N T S T R U C T U R E • Handle entities in the user input properly: legal citations, locations, dates, names – e.g. place the correct document cited in the query on the top – given a book title place an entry document (table of contents) on the top • Top (1-5) hits expected to be unambiguous • Use the top slots efficiently (10-100 hits) • Keep balance between numerous document types (legal cases) and relevant or promoted document types Generally more precise control of what is going on in the top 10
  • 9. Possible solutions • Boost factors on queries, terms, documents • Sort fields • Ranking functions • Function queries • Reranking (in Solr or application) • Filtering • Multiple requests
  • 10. Works, but… • Some are too complex • Some are too slow • Others are not reliable • Missing a concept of subquery: – tracking from which subquery a document is coming from • Missing LIMIT as in SQL
  • 11. Example continued User input: § 123 BGB Transformed to queries Q1, Q2, Q3, Q4 Expected output: • § 123 BGB (law document) • Legal commentary A to § 123 BGB (promoted content) • Legal commentary B to § 123 BGB (promoted content) • Some latest cases based on § 123 BGB (relevant content) • Full text (or whatever needed) Want the request look like: Q1 << Q2 << Q3 << Q4
  • 12. Elevation query Initial Idea / Specification Given a list of queries Q1, Q2, …, QN produce a result fulfilling the conditions: • All the documents of Qn are placed before the documents of Qm for m>n • Each hit should occur in the leftmost possible subset • No duplication of hits • Meaningful scores • Correct faceting
  • 13. Elevation query Additional requirements / expectations • One request / one pass search • Usable via some new syntax / parser support • Implemented as plugin Furthermore it should be possible to • impose a limit on the results of each subquery • provide a sort parameter for each subquery
  • 14. Implementation Idea Where to start • TopFieldCollector.collect • TFC manages a priority queue • The priority queue is parametrized with size and sorting • DisjunctionMaxQuery: – „generates the union of documents produced by its subqueries“ Q1 8 71 7 28 6 13 5 23 4 50 3 10 2 31 1 23 7 28 6 13 4 50 3 10 1 23 9 66 8 71 7 28 6 13 5 23 4 50 3 10 2 31 1 23 1 23 10 42 9 66 8 71 7 28 6 13 5 23 4 50 3 10 2 31 2 31 11 63 10 42 9 66 8 71 7 28 6 13 5 23 4 50 3 10 3 10 12 19 11 63 10 42 9 66 8 71 7 28 6 13 5 23 4 50 4 50 1 23 3 10 13 36 12 19 11 63 10 42 9 66 8 71 7 28 6 13 5 23 14 47 13 36 12 19 11 63 10 42 9 66 8 71 7 28 6 13 6 13 3 10 15 99 14 47 13 36 12 19 11 63 10 42 9 66 8 71 7 28 7 28 1 23 6 13
  • 15. Implementation Idea Where to go • Provide more than one queue to collector • Propagate information from DisjunctionMaxScorer to the collector • Some additional bookkeeping – Scores – Sort field values – Subquery index – (Facets) Q3 Q2 Q1 87654321 87654321 98765432 1 50 1 - 1 - 1 50 1 50 1 - 1 - 109876543 1 50 2 - 2 43 2 - 1 50 2 - 2 43 2 43 2 - 1110987654 1 50 3 - 2 43 3 31 3 55 1110987654 1 50 3 - 2 43 3 31 3 31 3 55 12111098765 1 50 4 51 2 43 3 31 4 - 4 76 4 51 1 50 4 51 2 43 3 31 4 - 4 76 131211109876 2 43 3 31 5 - 5 - 4 51 1 50 5 74 131211109876 2 43 3 31 5 - 5 - 5 74 4 51 5 74 2 43 3 31 4 76 5 74 4 51 5 74 2 43 3 31 4 76 4 76 5 74 4 51 5 74 1413121110987 2 43 3 31 6 88 4 76 6 - 5 74 4 51 6 - 1413121110987 6 88 2 43 3 31 4 76 6 - 5 74 4 51 6 - 15141312111098 6 88 2 43 3 31 7 12 4 76 7 99 5 74 4 51 7 - 15141312111098 6 88 2 43 3 31 7 12 7 12 4 76 7 99 5 74 4 51 7 - 15141312111098 7 12 15141312111099 6 88 2 43 3 31 7 12 8 - 4 76 8 55 5 74 4 51 8 - 15141312111099 6 88 2 43 3 31 7 12 8 - 4 76 8 55 8 55 5 74 4 51 8 - 15141312111099
  • 16. Implementation status • https://github.com/rokirx/solr-eq • Working – Collector logic / multiple queues – Sort and limit parameter per subquery – Parser support • In testing – Correct scoring – Faceting – Multiple sort fields per subquery • Works with 6.4, 7.6, 8.0, 8.2
  • 17. Case Study: Autosuggest User Input tax • Assumptions on the relevancy of completion: – Highest priority if the term at the beginning and exact match, eg tax relief – Lower priority exact match but term not at the beginnilng, eg income tax – Lowest priority prefix match anywhere in the phrase, eg estate taxes • Map this condition to queries: – Term at the beginning of a phrase and exact match: ^tax$ – Exact match in the middle of a phrase: tax$ – Prefix match (edge n-gram): tax
  • 18. Case Study: Autosuggest User Input tax • Resulting query: ^tax$ << tax$ << tax guarantees the specified behavior • Additional benefit: optimize the performance by cancelling out subqueries – If the exact hit count is not necessary – And the minimum required number of hits in the preceeding queues is collected – Stop fetching the docs from lower priority queue by cancelling them out of the collector/scorer – Whitout missing out any relevant documents
  • 19. Potential benefits • Reduce the number of search requests • Reduce the complexity of the architecture • Additional dimension to control rank • Pluggable, easy to evaluate • Improve performance through runtime subquery cancellation
  • 20. Summary It is technically possible to implement a concept of subquery into Solr/Lucene • Single request / one pass collection of results • Individual limits on each subquery • Individual sort parameters on each subquery • Optimization if no total hits number needed – cancel lower prioritized subqueries during evaluation without affecting top hits • Plugin

Editor's Notes

  1. 2
  2. 2
  3. 5
  4. 7
  5. 9
  6. 11
  7. 12
  8. 12.5
  9. 13
  10. 14.5
  11. 18
  12. 23
  13. 24
  14. 25
  15. 27
  16. 29
  17. 30