Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using the search engine as recommendation engine


Published on

Describes a simple approach to using the search engine to drive recommendations.

Published in: Technology
  • Be the first to comment

Using the search engine as recommendation engine

  1. 1. Recommendations from the search engine Sesam Hackathon, Warsaw, 2014-03-23 Lars Marius Garshol,, 1
  2. 2. This whole presentation is about Ted Dunning’s proposed approach to recommendations Based on his 1993 paper (below) – references at the end Very simple method, dead easy to implement – seems to work pretty well 2 Inspiration
  3. 3. Usually designed as prediction of ratings – Dunning believes this is the wrong approach – people’s ratings don’t necessarily reflect what they’ll buy – go by what people do rather than what they say You don’t want to recommend Bob Dylan – everyone’s already heard about him, and know what they think – you want to recommend things that are new to the user You don’t want to recommend things everyone likes 3 Thoughts on recommendations
  4. 4. Step 1 – work out which things tend to occur together – that is, if you buy this, you’re likely to also buy this – however, we only want pairs which are statistically significant Step 2 – index up the significant pairs in a search engine – use search to produce the actual results 4 The actual approach
  5. 5. Statistically significant co- occurrence Part the first
  6. 6. User Item u1 i1 u1 i2 u2 i1 u3 i2 u3 i3 u3 i4 ... ... The starting point Some kind of log of user actions User has – bought a movie | album | book | ... – opened a document – ... From this raw material, we can work out what things tend to go together – and whether this is significant
  7. 7. 7
  8. 8. i1 i2 i3 i4 i5 i6 i7 i1 23 42 0 0 5 7 i2 23 6 1 129 2 10 i3 42 6 3 0 492 1 i4 0 1 3 2 3 1 i5 0 129 0 2 94 2 i6 5 2 492 3 94 1 i7 7 10 1 1 2 1 8 Item-to-item matrix
  9. 9. k[0][0] = the number in the matrix on previous slide k[0][1] = the sum of that whole column minus k[0][0] k[1][0] = the sum of that whole row minus k[0][0] k[1][1] = the sum of the entire matrix minus k[0][0] minus k[1][0] minus k[0][1] 9 Producing the k 2x2 matrix How to compute the k matrix for a given cell in the matrix on the previous slide If the output of LLR(k) is above some threshold, the pair is considered significant.
  10. 10. Check the Python code on – snippets/tree/master/machine-learning/llr – this requires a lot of memory and CPU Or just use Mahout – RowSimilarityJob does exactly this 10 Doing it for real
  11. 11. Search engine as recommender Part the second
  12. 12. Take all the items and index them up with the search engine in the usual way – that is, each title has an id, a title, a description, etc Then, add a “magic” field – put into it the IDs of all the items that appear in a significant pair with this item – let’s call this field “indicators” Now we’re ready to do recommendations 12 Indexing with the search engine
  13. 13. Collect some set of items for which the user has expressed a preference – by buying them, looking at them, rating them, whatever The IDs of these items are your query – search the “indicators” field – the search results are your recommendations That’s it! – pack up, go home 13 Doing recommendations
  14. 14. Imagine that you’re searching for movies, and you type “the godfather” – “the” appears in all documents, so documents matching that get a low relevance score – “godfather” appears in very few documents, so matches on that get a high score – this is basically TF/IDF in a nutshell Now, imagine you liked two movies: “The Godfather” and “The Daytrippers” – nearly all movies have “The Godfather” as an indicator – very few have “The Daytrippers” – the second will therefore influence recommendations much more 14 Why does it work?
  15. 15. Trying it out for real Part the third
  16. 16. Again, the code is on Github – very simple webapp based on and Lucene – snippets/tree/master/machine-learning/llr The underlying data is the MovieLens dataset – 10 million ratings of 10,000 movies by 72,000 users – 16 Real demo with real data
  17. 17. – this chews the data, producing the significant pairs – takes huge amount of memory and about 30 minutes – have made absolutely no attempts to optimize it – reads output of previous script, makes Lucene index – the actual web application 17 Three scripts
  18. 18. 18
  19. 19. 19
  20. 20. 20 Liked one movie
  21. 21. 21 Liked two movies Movies with highest llr scor together with this movie
  22. 22. 22 Liked three movies Recommendations are actually now spot-on. At least for me.
  23. 23. class Movie: def GET(self, movieid): nocache() doc = search.do_query('id', movieid)[0] #recoms = search.do_query('indicators', movieid) recoms = [search.do_query('id', movieid)[0] for movieid in doc.bets] if hasattr(session, 'liked'): youlike = search.do_query('indicators', session.liked) else: youlike = [] return, recoms, youlike) 23 Complete code for movie page
  24. 24. Further work Winding up
  25. 25. Tweak the parameters a bit to see what happens Can we support a “Dislike” button? Test it with more kinds of data Learn how to do this with Mahout 25 Things left to do
  26. 26. 26 What is this? From Ted Dunning’s slides
  27. 27. 27 And this? From Ted Dunning’s slides
  28. 28. 28 And this? From Ted Dunning’s slides
  29. 29. The original 1993 paper – .5962 Ebook with lots of background but little detail – Slides covering the same material – recommendation-engines-using-search-engines Blog post with actual equations – coincidence.html 29 References