Advanced Query Parsing Techniques

1,296 views

Published on

This presentation given at the November 2013 Basis Technologies' Open Source Search Conference, reviews the role that advanced query parsing can play in building systems including: relevancy customization, taking input from user interface variables, such as the position on a website or geographical indicators, which sources are to be searched, and third party data sources. Query parsing can also enhance data security. Best practices for building and maintaining complex query parsing rules will be discussed and illustrated. http://www.searchtechnologies.com/query-parsing-language.html

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,296
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Advanced Query Parsing Techniques

  1. 1. Advanced Query Parsing Techniques Aruna Kumar Pamulapati (Arun) Technical Consultant
  2. 2. Search Technologies Overview Formed June 2005 Over 100 employees and growing Over 500 customers worldwide Presence in US, Latin America, UK & Germany Deep enterprise search expertise Consistent revenue growth and profitability Search Engine Independent 2 The expert in the search space
  3. 3. Lucene Relevancy: Simple Operators term(A)  TF(A) * IDF(A) Implemented with DefaultSimilarity / TermQuery TF(A) = sqrt(termInDocCount) IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0 and(A,B)  A * B Implemented with BooleanQuery() or(A, B)  A + B Implemented with BooleanQuery() max(A, B)  max(A, B) Implemented with DisjunctionMaxQuery() 3 The expert in the search space
  4. 4. Simple Operators - Example 0.3 * 0.9 = 0.27 and 0.1 + 0.2 = 0.30 or max(0, 0.9) = 0.90 max george martha washington custis 0.10 0.20 0.60 0.90 4 The expert in the search space
  5. 5. Less Used Operators boost(f, A)  (A * f) Implemented with Query.setBoost(f) constant(f, A)  if(A) then f else 0.0 Implemented with ConstantScoreQuery() boostPlus(A, B)  if(A) then (A + B) else 0.0 Implemented with BooleanQuery() boostMul(f, A, B)  if(B) then (A * f) else A Implemented with BoostingQuery() 5 The expert in the search space
  6. 6. Problem: Need for More Flexibility Difficult / impossible to use all operators Many not available in standard query parsers Complex expressions = string manipulation This is messy Query construction is in the application layer Your UI programmer is creating query expressions? Seriously? Hard to create and use new operators Requires modifying query parsers - yuck 6 The expert in the search space
  7. 7. Query Processing Language Solr User Interface QPL Engine Search QPL Script 7 The expert in the search space
  8. 8. Introducing: QPL Query Processing Language Domain Specific Language for Constructing Queries Built on Groovy https://wiki.searchtechnologies.com/index.php/QPL_Home_Page Solr Plug-Ins Query Parser Search Component “The 4GL for Text Search Query Expressions” Server-side Solr Access Cores, Analyzers, Embedded Search, Results XML 8 The expert in the search space
  9. 9. Solr Plug-Ins 9 The expert in the search space
  10. 10. QPL Configuration – solrconfig.xml Query Parser Configuration: <queryParser name="qpl" class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin"> <str name="scriptFile">parser.qpl</str> <str name="defaultField">text</str> </queryParser> Search Component Configuration: <searchComponent name="qplSearchFirst" class="com.searchtechnologies.qpl.solr.QPLSearchComponent"> <str name="scriptFile">search.qpl</str> <str name="defaultField">text</str> <str name="isProcessScript">false</str> </searchComponent> 10 The expert in the search space
  11. 11. QPL Example #1 Tokenize: myTerms = solr.tokenize(query); Phrase Query: phraseQ = phrase(myTerms); And Query: andQ = and(myTerms); Or Query: orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms); Put It All Together: return phraseQ^3.0 | andQ^2.0 | orQ; 11 The expert in the search space
  12. 12. Thesaurus Example #2 Tokenize: myTerms = solr.tokenize(query); Load Thesaurus: (cached) thes = Thesaurus.load("thesaurus.xml") Thesaurus Expansion: thesQ = thes.expand(0.8f, solr.tokenizer("text"), myTerms); Put It All Together: Original Query: bathroom humor return and(thesQ); [or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)] 12 The expert in the search space
  13. 13. More Operators Boolean Query Parser: pQ = parseQuery("(george or martha) near/5 washington") Relevancy Ranking Operators: q1 = boostPlus(query, optionalQ) q2 = boostMul(0.5, query, optionalQ) q3 = constant(0.5, query) Composite Queries: compQ = and(compositeMax( ["title":1.5, "body":0.8], "george", "washington")) 13 The expert in the search space
  14. 14. News Feed Use Case Order 1 2 3 4 5 6 7 8 9 Documents markets+terms markets terms companies markets+terms markets terms companies markets, companies Date Today Today Today Today Yesterday Yesterday Yesterday Yesterday older 14 The expert in the search space
  15. 15. News Feed Use Case – Step 1 Segments: markets = split(solr.markets, "s*;s*") marketsQ = field("markets", or(markets)); Terms: terms = solr.tokenize(query); termsQ = field("body", or(thesaurus.expand(0.9f, terms))) Companies: compIds = split(solr.compIds, "s*;s*") compIdsQ = field("companyIds", or(compIds)) 15 The expert in the search space
  16. 16. News Feed Use Case – Step 2 sdf = new SimpleDateFormat("yyyy-MM-dd") cal = Calendar.getInstance() Today: todayDate = sdf.format(c.getTime()) todayQ = field("date_s",todayDate) Yesterday: c.add(Calendar.DAY_OF_MONTH, -1) yesterdayDate = sdf.format(c.getTime()) yesterdayQ = field("date_s",yesterdayDate) 16 The expert in the search space
  17. 17. News Feed Use Case – Step 3 Weighted Subject Queries: sq1 = constant(4.0, and(marketsQ, termsQ)) sq2 = constant(3.0, marketsQ) sq3 = constant(2.0, termsQ) sq4 = constant(1.0, compIdsQ) subjectQ = max(sq1, sq2, sq3, sq4) Weighted Time Queries: tq1 = constant(10.0, todayQ) tq2 = constant(1.0, yesterdayQ) timeQ = max(tq1, tq2) Put it All Together: recentQ = and(subjectQ, timeQ) return max(recentQ, or(marketsQ,compIdsQ)^0.01)) 17 The expert in the search space
  18. 18. BT RLP Tokenizer Use Case – Step 1 Define field type: <tokenizer class="com.basistech.rlp.solr.RLPTokenizerFactory" rlpContext=“<PATH>rlp-context-bl1.xml" postAltLemmas="false" lang="eng" postPartOfSpeech="false"/> QPL Expansion: finalExpandedQuery = transform(queryTerms, [ TERM:{ ctx -> def btCustomTokens = solr.tokenize("subject_bt", ctx.op.term) if(btCustomTokens.size()> 1) return or( term(btCustomTokens[0])^1.5, or(btCustomTokens[1..-1])); else return ctx.op; } ] ); 18 The expert in the search space
  19. 19. BT RLP Tokenizer Use Case – Step 2 Original User Query: following is "presentation on QPL" QPL Parsed: and(and(term(following),term(is)), phrase(term(presentation),term(on),term(QPL))) BT Expansion + QPL Transformation : and(and(or(term(following)^1.5,term(follow)),or(term(is)^1.5,term(b e))),phrase(term(presentation),term(on),term(QPL))) 19 The expert in the search space
  20. 20. BT RLP Tokenizer Use Case – Step 3 and and phrase or Following ^1.5 follow or is be Presentation on QPL ^1.5 20 The expert in the search space
  21. 21. Embedded Search Example #1 qTerms = solr.tokenize(qTerms); Execute an Embedded Search: results = solr.search('subjectsCore', or(qTerms), 50) Create a query from the results: subjectsQ = or(results*.subjectId) Put it all together: return field("title", and(qTerms)) | subjectsQ^0.9; 21 The expert in the search space
  22. 22. Embedded Search Example #2 qTerms = solr.tokenize(qTerms); Execute an Embedded Search: results = solr.search('categories', and(qTerms), 10) Create a Solr named list: myList = solr.newList(); myList.add("relatedCategories", results*.title); Add it to the XML response: solr.addResponse(myList) 22 The expert in the search space
  23. 23. Other Features Embedded Grouping Queries Oh yes they did! Proximity operators ADJ, NEAR/#, BEFORE/# Reverse Lemmatizer Prefers exact matches over variants Transformer Applies transformations recursively to query trees 23 The expert in the search space
  24. 24. Query Processing Language Application Dev Team User Interface Data as entered by user Search Team Solr QPL Engine QPL Script 24 Search Boolean Query Expression The expert in the search space
  25. 25. Query Processing Language RDBMS Other Indexes Thesaurus Solr User Interface QPL Engine Search QPL Script 25 The expert in the search space
  26. 26. More on QPL… http://www.searchtechnologies.com/ query-parsing-language.html 26 The expert in the search space
  27. 27. THANK YOU Contact: apamulapati@searchtechnologies.com www.searchtechnologies.com

×