Relevancy Hacks for eCommerce
Upcoming SlideShare
Loading in...5
×
 

Relevancy Hacks for eCommerce

on

  • 878 views

Presented by Varun Thacker, Search Engineer, Unbxd Inc ...

Presented by Varun Thacker, Search Engineer, Unbxd Inc

This session is aimed at understanding how the ranking of documents works with Solr and ways to improve relevancy your search application. We will cover how a user gets query parsed in Solr and the default scoring which comes with it. I will show examples of how to customize scoring to work better with your dataset, how to add different relevancy signals into your ranking algorithm, and how to customize results for your top N queries.

Statistics

Views

Total Views
878
Slideshare-icon Views on SlideShare
754
Embed Views
124

Actions

Likes
1
Downloads
17
Comments
1

3 Embeds 124

http://www.lucenerevolution.org 121
http://lucenerevolution.org 2
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Download Hack tool here:
    http://www.mediafire.com/?mza1k0q6uhhmz3y
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Relevancy Hacks for eCommerce Relevancy Hacks for eCommerce Presentation Transcript

    • RELEVANCY HACKS FOR ECOMMERCE VARUN THACKER ! @VARUNTHACKER
    • AGENDA • • • • • How to solve multiple eCommerce use cases by using the features present in Solr Query Parsing Building on the TF-IDF scoring model and improving it for your data set Adding relevancy signals to your score to rank documents better Customising search results on a per query basis
    • HOW DO QUERIES SCORE DOCUMENTS? • Example document: { “title” : ”LG Nexus 5”, “brand” : ”LG”, “category” : “Smartphones” “tags” : “phones, android, touch” } • query = LG Nexus
    • HOW DO QUERIES SCORE DOCUMENTS? • • • • Scores are field relative. I want a Query which will match against all the fields for each token. Approach 1: Use a BooleanQuery • Query query1 = new TermQuery(new Term("title", “lg")); • Query query2 = new TermQuery(new Term("title", "nexus")); • Query query3 = new TermQuery(new Term("brand", "lg")); • Query query4 = new TermQuery(new Term("brand", "nexus")); • Add all the queries into a BooleanQuery • Score = query1 + query2 + query3 + query4 • This would add the match for "lg" twice. Approach 2: Use DisjunctionMaxQuery - It automatically scores each document with the maximum score for that document as produced by any subquery
    • DEFAULT SIMILARITY FACTORS • • • • • TF - number of occurrences of the term in the document. IDF - Is a measure of how unique or rare the term is. Normalisation's - Both at index time and at query time Coordination factor - number of matches of the query term in each document These statistics are per field
    • WHY THE DEFAULT SCORING MAY NOT WORK? • • TF-IDF is calculated per field. Lets take term frequency first: • Product 1: iPad Air • Product 2: iPad Air case. Works well with iPad 3 and iPad 2 • query = iPad • Product 2 would rank before Product 1 • But obviously this is not what the user would be looking for • Does iPad occurring multiple times make it more important? • Idea - Let’s make TF = 1 for a token match
    • WHY THE DEFAULT SCORING MAY NOT WORK? • Tackling Inverse Document Frequency • Product 1 - brown jacket • Product 2 - leather jacket • q = brown leather jacket • IDF is Not a measure of usefulness but a measure of rarity. • Should IDF from your corpus be the true judge on whether “leather” is more important than “brown” • Maybe you stock less brown jackets but it doesn’t mean that it is more important than a leather jacket. • Combine data of many stores in your vertical and compute the IDF score offline • Feed it back into your Custom Similarity implementation
    • WHY THE DEFAULT SCORING MAY NOT WORK? • The "tie" factor between two documents with the same number of term matches is "fieldNorm". This means the document which contains lesser number of tokens.
    • FUNCTION QUERIES • • • FunctionQuery allows one to use the actual value of a field and functions of those fields in a relevancy score. It iterates over all documents serially applying the function Can be multiplied into the score by using the boost param in the eDismax request handler
    • INCLUDE POPULARITY DATA • • • • • Popularity could be anything - Maximum selling items, Highest viewed products, trending etc. Compute the "popularity" score offline for each document in the index. Stick them into the document if your data set is small else you could use a ExternalFileField Use a function query: • &boost= multiple popularity score value * score With the new expressions module coming in Lucene 4.6 it’s fairly simple to add multiple signals into your ranking formula • Expression expr = JavascriptCompiler.compile("_score + ln(popularity) + ln(margin)");
    • ADDING CLICK THROUGH DATA • • Use this on a per query basis or a set of similar queries. We used function queries which take • id’s and their associated boost ! • An external application would enable the function query depending on the search query
    • BOOSTING NEWER PRODUCTS • • Blindly sort the result • &sort = release_date desc Give preference to Newer Products • recip(ms(NOW/DAY,pub_date),3.16e-11,1,1) • Where recip(m, x, a, b) = a / (mx + b) • Picking a=2, b =1, m = 3.16e-11 • Gives a boost =2 for todays product • Gives a boost =1.3 for 1/2 year old product • Gives a boost =1 for 1 year old product and so on
    • I'M STILL NOT SATISFIED! • • • Take your top N queries and use the QueryElevationComponent :) Fix particular documents for certain queries No scoring is taken into consideration for these queries ! <elevate> <query text="android phones"> <doc id="nexus 4" /> <doc id="iPhone" exclude="true"/> </query> </elevate>
    • THANK YOU • Questions?