Your SlideShare is downloading. ×
0
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Advanced Relevancy Ranking
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Advanced Relevancy Ranking

371

Published on

Lucene and Solr provide a number of options for query parsing, and these are valuable tools for creating powerful search applications. This presentation given at the 2013 Lucene Revolution will …

Lucene and Solr provide a number of options for query parsing, and these are valuable tools for creating powerful search applications. This presentation given at the 2013 Lucene Revolution will review the role that advanced query parsing can play in building systems, including: Relevancy customization, taking input from user interface variables such as the position on a website or geographical indicators, which sources are to be searched and 3rd party data sources. Query parsing can also enhance data security. Best practices for building and maintaining complex query parsing rules will be discussed and illustrated. Chief Architect Paul Nelson provides this compelling presentation.
Search Technologies provides relevancy tuning services for Solr. For further information, see http://www.searchtechnologies.com/solr-lucene-relevancy.html
http://www.searchtechnologies.com

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
371
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Advanced Relevancy RankingPaul NelsonChief Architect / Search Technologies
  • 2. 2Search Technologies Overview• Formed June 2005• Over 100 employees and growing• Over 400 customers worldwide• Presence in US, Latin America, UK & Germany• Deep enterprise search expertise• Consistent revenue growth and profitability• Search Engine Independent
  • 3. 3Lucene Relevancy: Simple Operators• term(A)  TF(A) * IDF(A)• Implemented with DefaultSimilarity / TermQuery• TF(A) = sqrt(termInDocCount)• IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0• and(A,B)  A * B• Implemented with BooleanQuery()• or(A, B)  A + B• Implemented with BooleanQuery()• max(A, B)  max(A, B)• Implemented with DisjunctionMaxQuery()3
  • 4. 4Simple Operators - Exampleandor maxgeorge martha washington custis0.10 0.20 0.60 0.900.1 + 0.2 = 0.30 max(0, 0.9) = 0.900.3 * 0.9 = 0.27
  • 5. 5Less Used Operators• boost(f, A)  (A * f)• Implemented with Query.setBoost(f)• constant(f, A)  if(A) then f else 0.0• Implemented with ConstantScoreQuery()• boostPlus(A, B)  if(A) then (A + B) else 0.0• Implemented with BooleanQuery()• boostMul(f, A, B)  if(B) then (A * f) else A• Implemented with BoostingQuery()5
  • 6. 6Problem: Need for More Flexibility• Difficult / impossible to use all operators• Many not available in standard query parsers• Complex expressions = string manipulation• This is messy• Query construction is in the application layer• Your UI programmer is creating query expressions?• Seriously?• Hard to create and use new operators• Requires modifying query parsers - yuck6
  • 7. 7SolrQuery Processing Language7UserInterfaceQPLEngineSearchQPLScript
  • 8. 8Introducing: QPL• Query Processing Language• Domain Specific Language for Constructing Queries• Built on Groovy• https://wiki.searchtechnologies.com/index.php/QPL_Home_Page• Solr Plug-Ins• Query Parser• Search Component• “The 4GL for Text Search Query Expressions”• Server-side Solr Access• Cores, Analyzers, Embedded Search, Results XML8
  • 9. 9Solr Plug-Ins
  • 10. 10QPL Configuration – solrconfig.xml<queryParser name="qpl"class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin"><str name="scriptFile">parser.qpl</str><str name="defaultField">text</str></queryParser><searchComponent name="qplSearchFirst"class="com.searchtechnologies.qpl.solr.QPLSearchComponent"><str name="scriptFile">search.qpl</str><str name="defaultField">text</str><str name="isProcessScript">false</str></searchComponent>Query Parser Configuration:Search Component Configuration:
  • 11. 11QPL Example #1myTerms = solr.tokenize(query);phraseQ = phrase(myTerms);andQ = and(myTerms);return phraseQ^3.0 | andQ^2.0 | orQ;Tokenize:Phrase Query:And Query:Put It All Together:orQ = (myTerms.size() <= 2) ? null :orMin( (myTerms.size()+1)/2, myTerms);Or Query:
  • 12. 12Thesaurus Example #2myTerms = solr.tokenize(query);thes = Thesaurus.load("thesaurus.xml")thesQ = thes.expand(0.8f,solr.tokenizer("text"), myTerms);return and(thesQ);Tokenize:Load Thesaurus: (cached)Thesaurus Expansion:Put It All Together:Original Query: bathroom humor[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
  • 13. 13More OperatorsBoolean Query Parser:pQ = parseQuery("(george or martha) near/5 washington")Relevancy Ranking Operators:q1 = boostPlus(query, optionalQ)q2 = boostMul(0.5, query, optionalQ)q3 = constant(0.5, query)Composite Queries:compQ = and(compositeMax(["title":1.5, "body":0.8],"george", "washington"))
  • 14. 14News Feed Use Case14Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older
  • 15. 15News Feed Use Case – Step 1markets = split(solr.markets, "s*;s*")marketsQ = field("markets", or(markets));terms = solr.tokenize(query);termsQ = field("body",or(thesaurus.expand(0.9f, terms)))compIds = split(solr.compIds, "s*;s*")compIdsQ = field("companyIds", or(compIds))Segments:Terms:Companies:
  • 16. 16News Feed Use Case – Step 2todayDate = sdf.format(c.getTime())todayQ = field("date_s",todayDate)c.add(Calendar.DAY_OF_MONTH, -1)yesterdayDate = sdf.format(c.getTime())yesterdayQ = field("date_s",yesterdayDate)Today:Yesterday:sdf = new SimpleDateFormat("yyyy-MM-dd")cal = Calendar.getInstance()
  • 17. 17News Feed Use Case17Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older
  • 18. 18News Feed Use Case – Step 3sq1 = constant(4.0, and(marketsQ, termsQ))sq2 = constant(3.0, marketsQ)sq3 = constant(2.0, termsQ)sq4 = constant(1.0, compIdsQ)subjectQ = max(sq1, sq2, sq3, sq4)tq1 = constant(10.0, todayQ)tq2 = constant(1.0, yesterdayQ)timeQ = max(tq1, tq2)recentQ = and(subjectQ, timeQ)Weighted Subject Queries:Weighted Time Queries:Put it All Together:return max(recentQ, or(marketsQ,compIdsQ)^0.01))
  • 19. 19Embedded Search Example #1results = solr.search(subjectsCore, or(qTerms), 50)subjectsQ = or(results*.subjectId)return field("title", and(qTerms)) | subjectsQ^0.9;Execute an Embedded Search:Create a query from the results:Put it all together:qTerms = solr.tokenize(qTerms);
  • 20. 20Embedded Search Example #2results = solr.search(categories, and(qTerms), 10)myList = solr.newList();myList.add("relatedCategories", results*.title);solr.addResponse(myList)Execute an Embedded Search:Create a Solr named list:Add it to the XML response:qTerms = solr.tokenize(qTerms);
  • 21. 21Other Features• Embedded Grouping Queries• Oh yes they did!• Proximity operators• ADJ, NEAR/#, BEFORE/#• Reverse Lemmatizer• Prefers exact matches over variants• Transformer• Applies transformations recursively to query trees21
  • 22. 22SolrQuery Processing Language22UserInterfaceQPLEngineSearchData as enteredby user BooleanQuery ExpressionQPLScriptApplicationDev TeamSearch Team
  • 23. 23SolrQPL: Using External Sources to Build Queries23UserInterfaceQPLEngineSearchQPLScriptRDBMSOtherIndexesThesaurus
  • 24. CONTACTPaul Nelsonpnelson@searchtechnologies.com

×