Your SlideShare is downloading. ×
Using the search engine as recommendation engine
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Using the search engine as recommendation engine

2,170
views

Published on

Describes a simple approach to using the search engine to drive recommendations.

Describes a simple approach to using the search engine to drive recommendations.

Published in: Technology

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,170
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
40
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Recommendations from the search engine Sesam Hackathon, Warsaw, 2014-03-23 Lars Marius Garshol, larsga@bouvet.no, http://twitter.com/larsga 1
  • 2. This whole presentation is about Ted Dunning’s proposed approach to recommendations Based on his 1993 paper (below) – references at the end Very simple method, dead easy to implement – seems to work pretty well 2 Inspiration
  • 3. Usually designed as prediction of ratings – Dunning believes this is the wrong approach – people’s ratings don’t necessarily reflect what they’ll buy – go by what people do rather than what they say You don’t want to recommend Bob Dylan – everyone’s already heard about him, and know what they think – you want to recommend things that are new to the user You don’t want to recommend things everyone likes 3 Thoughts on recommendations
  • 4. Step 1 – work out which things tend to occur together – that is, if you buy this, you’re likely to also buy this – however, we only want pairs which are statistically significant Step 2 – index up the significant pairs in a search engine – use search to produce the actual results 4 The actual approach
  • 5. Statistically significant co- occurrence Part the first
  • 6. User Item u1 i1 u1 i2 u2 i1 u3 i2 u3 i3 u3 i4 ... ... The starting point Some kind of log of user actions User has – bought a movie | album | book | ... – opened a document – ... From this raw material, we can work out what things tend to go together – and whether this is significant
  • 7. 7
  • 8. i1 i2 i3 i4 i5 i6 i7 i1 23 42 0 0 5 7 i2 23 6 1 129 2 10 i3 42 6 3 0 492 1 i4 0 1 3 2 3 1 i5 0 129 0 2 94 2 i6 5 2 492 3 94 1 i7 7 10 1 1 2 1 8 Item-to-item matrix
  • 9. k[0][0] = the number in the matrix on previous slide k[0][1] = the sum of that whole column minus k[0][0] k[1][0] = the sum of that whole row minus k[0][0] k[1][1] = the sum of the entire matrix minus k[0][0] minus k[1][0] minus k[0][1] 9 Producing the k 2x2 matrix How to compute the k matrix for a given cell in the matrix on the previous slide If the output of LLR(k) is above some threshold, the pair is considered significant.
  • 10. Check the Python code on – https://github.com/larsga/py- snippets/tree/master/machine-learning/llr – this requires a lot of memory and CPU Or just use Mahout – RowSimilarityJob does exactly this 10 Doing it for real
  • 11. Search engine as recommender Part the second
  • 12. Take all the items and index them up with the search engine in the usual way – that is, each title has an id, a title, a description, etc Then, add a “magic” field – put into it the IDs of all the items that appear in a significant pair with this item – let’s call this field “indicators” Now we’re ready to do recommendations 12 Indexing with the search engine
  • 13. Collect some set of items for which the user has expressed a preference – by buying them, looking at them, rating them, whatever The IDs of these items are your query – search the “indicators” field – the search results are your recommendations That’s it! – pack up, go home 13 Doing recommendations
  • 14. Imagine that you’re searching for movies, and you type “the godfather” – “the” appears in all documents, so documents matching that get a low relevance score – “godfather” appears in very few documents, so matches on that get a high score – this is basically TF/IDF in a nutshell Now, imagine you liked two movies: “The Godfather” and “The Daytrippers” – nearly all movies have “The Godfather” as an indicator – very few have “The Daytrippers” – the second will therefore influence recommendations much more 14 Why does it work?
  • 15. Trying it out for real Part the third
  • 16. Again, the code is on Github – very simple webapp based on web.py and Lucene – https://github.com/larsga/py- snippets/tree/master/machine-learning/llr The underlying data is the MovieLens dataset – 10 million ratings of 10,000 movies by 72,000 users – http://grouplens.org/datasets/movielens/ 16 Real demo with real data
  • 17. llr.py – this chews the data, producing the significant pairs – takes huge amount of memory and about 30 minutes – have made absolutely no attempts to optimize it llr_index.py – reads output of previous script, makes Lucene index recom-ui.py – the actual web application 17 Three scripts
  • 18. 18
  • 19. 19
  • 20. 20 Liked one movie
  • 21. 21 Liked two movies Movies with highest llr scor together with this movie
  • 22. 22 Liked three movies Recommendations are actually now spot-on. At least for me.
  • 23. class Movie: def GET(self, movieid): nocache() doc = search.do_query('id', movieid)[0] #recoms = search.do_query('indicators', movieid) recoms = [search.do_query('id', movieid)[0] for movieid in doc.bets] if hasattr(session, 'liked'): youlike = search.do_query('indicators', session.liked) else: youlike = [] return render.movie(doc, recoms, youlike) 23 Complete code for movie page
  • 24. Further work Winding up
  • 25. Tweak the parameters a bit to see what happens Can we support a “Dislike” button? Test it with more kinds of data Learn how to do this with Mahout 25 Things left to do
  • 26. 26 What is this? From Ted Dunning’s slides
  • 27. 27 And this? From Ted Dunning’s slides
  • 28. 28 And this? From Ted Dunning’s slides
  • 29. The original 1993 paper – http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14 .5962 Ebook with lots of background but little detail – http://www.mapr.com/practical-machine-learning Slides covering the same material – www.slideshare.net/tdunning/building-multimodal- recommendation-engines-using-search-engines Blog post with actual equations – http://tdunning.blogspot.com/2008/03/surprise-and- coincidence.html 29 References

×