SlideShare a Scribd company logo
1 of 24
Download to read offline
Advanced Relevancy Ranking
Paul Nelson
Chief Architect / Search Technologies
2
Search Technologies Overview
• Formed June 2005
• Over 100 employees and growing
• Over 400 customers worldwide
• Presence in US, Latin America, UK & Germany
• Deep enterprise search expertise
• Consistent revenue growth and profitability
• Search Engine Independent
3
Lucene Relevancy: Simple Operators
• term(A)  TF(A) * IDF(A)
• Implemented with DefaultSimilarity / TermQuery
• TF(A) = sqrt(termInDocCount)
• IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0
• and(A,B)  A * B
• Implemented with BooleanQuery()
• or(A, B)  A + B
• Implemented with BooleanQuery()
• max(A, B)  max(A, B)
• Implemented with DisjunctionMaxQuery()
3
4
Simple Operators - Example
and
or max
george martha washington custis
0.10 0.20 0.60 0.90
0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90
0.3 * 0.9 = 0.27
5
Less Used Operators
• boost(f, A)  (A * f)
• Implemented with Query.setBoost(f)
• constant(f, A)  if(A) then f else 0.0
• Implemented with ConstantScoreQuery()
• boostPlus(A, B)  if(A) then (A + B) else 0.0
• Implemented with BooleanQuery()
• boostMul(f, A, B)  if(B) then (A * f) else A
• Implemented with BoostingQuery()
5
6
Problem: Need for More Flexibility
• Difficult / impossible to use all operators
• Many not available in standard query parsers
• Complex expressions = string manipulation
• This is messy
• Query construction is in the application layer
• Your UI programmer is creating query expressions?
• Seriously?
• Hard to create and use new operators
• Requires modifying query parsers - yuck
6
7
Solr
Query Processing Language
7
User
Interface
QPL
Engine
Search
QPL
Script
8
Introducing: QPL
• Query Processing Language
• Domain Specific Language for Constructing Queries
• Built on Groovy
• https://wiki.searchtechnologies.com/index.php/QPL_Home_Page
• Solr Plug-Ins
• Query Parser
• Search Component
• “The 4GL for Text Search Query Expressions”
• Server-side Solr Access
• Cores, Analyzers, Embedded Search, Results XML
8
9
Solr Plug-Ins
10
QPL Configuration – solrconfig.xml
<queryParser name="qpl"
class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin">
<str name="scriptFile">parser.qpl</str>
<str name="defaultField">text</str>
</queryParser>
<searchComponent name="qplSearchFirst"
class="com.searchtechnologies.qpl.solr.QPLSearchComponent">
<str name="scriptFile">search.qpl</str>
<str name="defaultField">text</str>
<str name="isProcessScript">false</str>
</searchComponent>
Query Parser Configuration:
Search Component Configuration:
11
QPL Example #1
myTerms = solr.tokenize(query);
phraseQ = phrase(myTerms);
andQ = and(myTerms);
return phraseQ^3.0 | andQ^2.0 | orQ;
Tokenize:
Phrase Query:
And Query:
Put It All Together:
orQ = (myTerms.size() <= 2) ? null :
orMin( (myTerms.size()+1)/2, myTerms);
Or Query:
12
Thesaurus Example #2
myTerms = solr.tokenize(query);
thes = Thesaurus.load("thesaurus.xml")
thesQ = thes.expand(0.8f,
solr.tokenizer("text"), myTerms);
return and(thesQ);
Tokenize:
Load Thesaurus: (cached)
Thesaurus Expansion:
Put It All Together:
Original Query: bathroom humor
[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
13
More Operators
Boolean Query Parser:
pQ = parseQuery("(george or martha) near/5 washington")
Relevancy Ranking Operators:
q1 = boostPlus(query, optionalQ)
q2 = boostMul(0.5, query, optionalQ)
q3 = constant(0.5, query)
Composite Queries:
compQ = and(compositeMax(
["title":1.5, "body":0.8],
"george", "washington"))
14
News Feed Use Case
14
Order Documents Date
1 markets+terms Today
2 markets Today
3 terms Today
4 companies Today
5 markets+terms Yesterday
6 markets Yesterday
7 terms Yesterday
8 companies Yesterday
9 markets, companies older
15
News Feed Use Case – Step 1
markets = split(solr.markets, "s*;s*")
marketsQ = field("markets", or(markets));
terms = solr.tokenize(query);
termsQ = field("body",
or(thesaurus.expand(0.9f, terms)))
compIds = split(solr.compIds, "s*;s*")
compIdsQ = field("companyIds", or(compIds))
Segments:
Terms:
Companies:
16
News Feed Use Case – Step 2
todayDate = sdf.format(c.getTime())
todayQ = field("date_s",todayDate)
c.add(Calendar.DAY_OF_MONTH, -1)
yesterdayDate = sdf.format(c.getTime())
yesterdayQ = field("date_s",yesterdayDate)
Today:
Yesterday:
sdf = new SimpleDateFormat("yyyy-MM-dd")
cal = Calendar.getInstance()
17
News Feed Use Case
17
Order Documents Date
1 markets+terms Today
2 markets Today
3 terms Today
4 companies Today
5 markets+terms Yesterday
6 markets Yesterday
7 terms Yesterday
8 companies Yesterday
9 markets, companies older
18
News Feed Use Case – Step 3
sq1 = constant(4.0, and(marketsQ, termsQ))
sq2 = constant(3.0, marketsQ)
sq3 = constant(2.0, termsQ)
sq4 = constant(1.0, compIdsQ)
subjectQ = max(sq1, sq2, sq3, sq4)
tq1 = constant(10.0, todayQ)
tq2 = constant(1.0, yesterdayQ)
timeQ = max(tq1, tq2)
recentQ = and(subjectQ, timeQ)
Weighted Subject Queries:
Weighted Time Queries:
Put it All Together:
return max(recentQ, or(marketsQ,compIdsQ)^0.01))
19
Embedded Search Example #1
results = solr.search('subjectsCore', or(qTerms), 50)
subjectsQ = or(results*.subjectId)
return field("title", and(qTerms)) | subjectsQ^0.9;
Execute an Embedded Search:
Create a query from the results:
Put it all together:
qTerms = solr.tokenize(qTerms);
20
Embedded Search Example #2
results = solr.search('categories', and(qTerms), 10)
myList = solr.newList();
myList.add("relatedCategories", results*.title);
solr.addResponse(myList)
Execute an Embedded Search:
Create a Solr named list:
Add it to the XML response:
qTerms = solr.tokenize(qTerms);
21
Other Features
• Embedded Grouping Queries
• Oh yes they did!
• Proximity operators
• ADJ, NEAR/#, BEFORE/#
• Reverse Lemmatizer
• Prefers exact matches over variants
• Transformer
• Applies transformations recursively to query trees
21
22
Solr
Query Processing Language
22
User
Interface
QPL
Engine
Search
Data as entered
by user Boolean
Query Expression
QPL
Script
Application
Dev Team
Search Team
23
Solr
QPL: Using External Sources to Build Queries
23
User
Interface
QPL
Engine
Search
QPL
Script
RDBMS
Other
Indexes
Thesaurus
CONTACT
Paul Nelson
pnelson@searchtechnologies.com

More Related Content

What's hot

What's hot (20)

Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Foreign Data Wrapper Enhancements
Foreign Data Wrapper EnhancementsForeign Data Wrapper Enhancements
Foreign Data Wrapper Enhancements
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)
 
Mastering solr
Mastering solrMastering solr
Mastering solr
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill Workshop
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Apache SOLR in AEM 6
Apache SOLR in AEM 6Apache SOLR in AEM 6
Apache SOLR in AEM 6
 
Pig_Presentation
Pig_PresentationPig_Presentation
Pig_Presentation
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 

Similar to Advanced query parsing techniques

Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
lucenerevolution
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
Chengjen Lee
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 

Similar to Advanced query parsing techniques (20)

Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
#SalesforceSaturday : Salesforce BIG Objects Explained
#SalesforceSaturday : Salesforce BIG Objects Explained#SalesforceSaturday : Salesforce BIG Objects Explained
#SalesforceSaturday : Salesforce BIG Objects Explained
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
React inter3
React inter3React inter3
React inter3
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
 
SQL for Web APIs - Simplifying Data Access for API Consumers
SQL for Web APIs - Simplifying Data Access for API ConsumersSQL for Web APIs - Simplifying Data Access for API Consumers
SQL for Web APIs - Simplifying Data Access for API Consumers
 
Alternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builderAlternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builder
 
Oracle Application Express as add-on for Google Apps
Oracle Application Express as add-on for Google AppsOracle Application Express as add-on for Google Apps
Oracle Application Express as add-on for Google Apps
 
70433 Dumps DB
70433 Dumps DB70433 Dumps DB
70433 Dumps DB
 
Javascript
JavascriptJavascript
Javascript
 
Salesforce Summer 14 Release
Salesforce Summer 14 ReleaseSalesforce Summer 14 Release
Salesforce Summer 14 Release
 
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
 
Nu Skin: Integrating the Day CMS with Translation.com
Nu Skin: Integrating the Day CMS with Translation.comNu Skin: Integrating the Day CMS with Translation.com
Nu Skin: Integrating the Day CMS with Translation.com
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
 
Polyglot
PolyglotPolyglot
Polyglot
 

More from lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Advanced query parsing techniques

  • 1. Advanced Relevancy Ranking Paul Nelson Chief Architect / Search Technologies
  • 2. 2 Search Technologies Overview • Formed June 2005 • Over 100 employees and growing • Over 400 customers worldwide • Presence in US, Latin America, UK & Germany • Deep enterprise search expertise • Consistent revenue growth and profitability • Search Engine Independent
  • 3. 3 Lucene Relevancy: Simple Operators • term(A)  TF(A) * IDF(A) • Implemented with DefaultSimilarity / TermQuery • TF(A) = sqrt(termInDocCount) • IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0 • and(A,B)  A * B • Implemented with BooleanQuery() • or(A, B)  A + B • Implemented with BooleanQuery() • max(A, B)  max(A, B) • Implemented with DisjunctionMaxQuery() 3
  • 4. 4 Simple Operators - Example and or max george martha washington custis 0.10 0.20 0.60 0.90 0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90 0.3 * 0.9 = 0.27
  • 5. 5 Less Used Operators • boost(f, A)  (A * f) • Implemented with Query.setBoost(f) • constant(f, A)  if(A) then f else 0.0 • Implemented with ConstantScoreQuery() • boostPlus(A, B)  if(A) then (A + B) else 0.0 • Implemented with BooleanQuery() • boostMul(f, A, B)  if(B) then (A * f) else A • Implemented with BoostingQuery() 5
  • 6. 6 Problem: Need for More Flexibility • Difficult / impossible to use all operators • Many not available in standard query parsers • Complex expressions = string manipulation • This is messy • Query construction is in the application layer • Your UI programmer is creating query expressions? • Seriously? • Hard to create and use new operators • Requires modifying query parsers - yuck 6
  • 8. 8 Introducing: QPL • Query Processing Language • Domain Specific Language for Constructing Queries • Built on Groovy • https://wiki.searchtechnologies.com/index.php/QPL_Home_Page • Solr Plug-Ins • Query Parser • Search Component • “The 4GL for Text Search Query Expressions” • Server-side Solr Access • Cores, Analyzers, Embedded Search, Results XML 8
  • 10. 10 QPL Configuration – solrconfig.xml <queryParser name="qpl" class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin"> <str name="scriptFile">parser.qpl</str> <str name="defaultField">text</str> </queryParser> <searchComponent name="qplSearchFirst" class="com.searchtechnologies.qpl.solr.QPLSearchComponent"> <str name="scriptFile">search.qpl</str> <str name="defaultField">text</str> <str name="isProcessScript">false</str> </searchComponent> Query Parser Configuration: Search Component Configuration:
  • 11. 11 QPL Example #1 myTerms = solr.tokenize(query); phraseQ = phrase(myTerms); andQ = and(myTerms); return phraseQ^3.0 | andQ^2.0 | orQ; Tokenize: Phrase Query: And Query: Put It All Together: orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms); Or Query:
  • 12. 12 Thesaurus Example #2 myTerms = solr.tokenize(query); thes = Thesaurus.load("thesaurus.xml") thesQ = thes.expand(0.8f, solr.tokenizer("text"), myTerms); return and(thesQ); Tokenize: Load Thesaurus: (cached) Thesaurus Expansion: Put It All Together: Original Query: bathroom humor [or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
  • 13. 13 More Operators Boolean Query Parser: pQ = parseQuery("(george or martha) near/5 washington") Relevancy Ranking Operators: q1 = boostPlus(query, optionalQ) q2 = boostMul(0.5, query, optionalQ) q3 = constant(0.5, query) Composite Queries: compQ = and(compositeMax( ["title":1.5, "body":0.8], "george", "washington"))
  • 14. 14 News Feed Use Case 14 Order Documents Date 1 markets+terms Today 2 markets Today 3 terms Today 4 companies Today 5 markets+terms Yesterday 6 markets Yesterday 7 terms Yesterday 8 companies Yesterday 9 markets, companies older
  • 15. 15 News Feed Use Case – Step 1 markets = split(solr.markets, "s*;s*") marketsQ = field("markets", or(markets)); terms = solr.tokenize(query); termsQ = field("body", or(thesaurus.expand(0.9f, terms))) compIds = split(solr.compIds, "s*;s*") compIdsQ = field("companyIds", or(compIds)) Segments: Terms: Companies:
  • 16. 16 News Feed Use Case – Step 2 todayDate = sdf.format(c.getTime()) todayQ = field("date_s",todayDate) c.add(Calendar.DAY_OF_MONTH, -1) yesterdayDate = sdf.format(c.getTime()) yesterdayQ = field("date_s",yesterdayDate) Today: Yesterday: sdf = new SimpleDateFormat("yyyy-MM-dd") cal = Calendar.getInstance()
  • 17. 17 News Feed Use Case 17 Order Documents Date 1 markets+terms Today 2 markets Today 3 terms Today 4 companies Today 5 markets+terms Yesterday 6 markets Yesterday 7 terms Yesterday 8 companies Yesterday 9 markets, companies older
  • 18. 18 News Feed Use Case – Step 3 sq1 = constant(4.0, and(marketsQ, termsQ)) sq2 = constant(3.0, marketsQ) sq3 = constant(2.0, termsQ) sq4 = constant(1.0, compIdsQ) subjectQ = max(sq1, sq2, sq3, sq4) tq1 = constant(10.0, todayQ) tq2 = constant(1.0, yesterdayQ) timeQ = max(tq1, tq2) recentQ = and(subjectQ, timeQ) Weighted Subject Queries: Weighted Time Queries: Put it All Together: return max(recentQ, or(marketsQ,compIdsQ)^0.01))
  • 19. 19 Embedded Search Example #1 results = solr.search('subjectsCore', or(qTerms), 50) subjectsQ = or(results*.subjectId) return field("title", and(qTerms)) | subjectsQ^0.9; Execute an Embedded Search: Create a query from the results: Put it all together: qTerms = solr.tokenize(qTerms);
  • 20. 20 Embedded Search Example #2 results = solr.search('categories', and(qTerms), 10) myList = solr.newList(); myList.add("relatedCategories", results*.title); solr.addResponse(myList) Execute an Embedded Search: Create a Solr named list: Add it to the XML response: qTerms = solr.tokenize(qTerms);
  • 21. 21 Other Features • Embedded Grouping Queries • Oh yes they did! • Proximity operators • ADJ, NEAR/#, BEFORE/# • Reverse Lemmatizer • Prefers exact matches over variants • Transformer • Applies transformations recursively to query trees 21
  • 22. 22 Solr Query Processing Language 22 User Interface QPL Engine Search Data as entered by user Boolean Query Expression QPL Script Application Dev Team Search Team
  • 23. 23 Solr QPL: Using External Sources to Build Queries 23 User Interface QPL Engine Search QPL Script RDBMS Other Indexes Thesaurus