Acts As Recommendable
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Acts As Recommendable

on

  • 2,753 views

RubyManor talk on using Recommendation systems in production.

RubyManor talk on using Recommendation systems in production.

Statistics

Views

Total Views
2,753
Views on SlideShare
2,749
Embed Views
4

Actions

Likes
9
Downloads
25
Comments
1

2 Embeds 4

http://www.slideshare.net 3
http://twitter.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Awesome plugin.

    I created a basic, more simple similarity matching plugin.

    http://www.freezzo.com/2009/06/04/acts_as_similar-a-basic-similarity-activerecord-plugin/
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Acts As Recommendable Presentation Transcript

  • 1. Recommendations in Production Alex MacCaw
  • 2. Netflix Prize
  • 3. Amazon.com Facebook Last.fm StumbleUpon Google Suggest iTunes Rotten Tomatoes Yelp
  • 4. Google Search
  • 5. Chicken or Egg
  • 6. • Google Reader • IMDB
  • 7. Acts As Recommendable
  • 8. Types of recommendations • Content Based • User Based • Item Based
  • 9. Programming Collective Intelligence
  • 10. Has Many Through Relationship
  • 11. User Has Many Through Book Has Many Has Many UserBooks Can have score (rating)
  • 12. User class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books end
  • 13. Gives you User#similar_users User#recommended_books Book#similar_books
  • 14. The algorithms • Manhattan Distance • Euclidean distance • Cosine • Pearson correlation coefficient • Jaccard • Levenshtein
  • 15. How does it work?
  • 16. Strategy • Map data into Euclidean Space • Calculate similarity • Use similarities to recommend
  • 17. The Black John Tucker Knight Must Die James 4 5 Jonah 3 2 George 5 3 Alex 4 2
  • 18. 5.00 3.75 The Black Knight 2.50 1.25 0 0 1.25 2.50 3.75 5.00 John Tucker Must Die
  • 19. 5.00 3.75 The Black Knight 2.50 1.25 0 0 1.25 2.50 3.75 5.00 John Tucker Must Die
  • 20. item id { user id 1 => { 1 => 1.0, 2 => 0.0, score ... }, ... }
  • 21. [[1, 0.5554], [2, 0.888], [3, 0.8843], ...]
  • 22. Problem 1 It was far too slow to calculate on the fly (obvious)
  • 23. SELECT * FROM quot;usersquot; WHERE (quot;usersquot;.quot;idquot; = 2) SELECT * FROM quot;booksquot; SELECT * FROM quot;usersquot; SELECT quot;user_booksquot;.* FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id IN (1,2,3,4,5,6,7,8,9,10)) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; IN (11,6,12,7,13,8,14,9,15,1,2,19,20,3,10,4,5)) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; IN (20,3,19,6)) All books All user_books
  • 24. Solution Cache the dataset Build offline rake recommendations:build
  • 25. SELECT * FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id = 2) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 5) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 4) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 8) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 7) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 2) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 1)
  • 26. Problem 2 Fetching the dataset took too long since it was so massive
  • 27. Solution Split up the cache by item
  • 28. Rails.cache.write( quot;aar_books_1quot;, scores )
  • 29. Problem 3 The dataset was so big it crashed Ruby!
  • 30. Solution Get rid of ActiveRecord Only deal with integers
  • 31. items = options[:on_class].connection.select_values( quot;SELECT id from #{options[:on_class].table_name}quot; ).collect(&:to_i)
  • 32. Problem 4 It still crashed Ruby!
  • 33. { 1 => { 1 => 1.0, 2 => 0.0, ... }, ... }
  • 34. Solution Remove unnecessary cruft from dataset
  • 35. { 1 => { 1 => 1.0, ... }, ... }
  • 36. Problem 5 It was too slow
  • 37. Solution Re-write the slow bits in C
  • 38. Details • RubyInline • Implemented Pearson • Monkey patched original Ruby methods • Very fast
  • 39. InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include quot;ruby.hquot; double c_sim_pearson(VALUE items) { Ruby Object
  • 40. InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include quot;ruby.hquot; double c_sim_pearson(VALUE items) { No Floats :(
  • 41. Hash Lookup if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) { prefs1_item = 0.0; } else { prefs1_item = NUM2DBL(prefs1_item_ob); }
  • 42. Conversion return num / den;
  • 43. Design Designs • Not too many relationships • Not to many ‘items’ • Similarity matrix for items, not users
  • 44. Changing data
  • 45. Scaling Even Further • K Means clustering • Split cluster by category
  • 46. Adding ratings ActiveRecord::Schema.define(:version => 1) do create_table quot;booksquot;, :force => true do |t| t.string quot;namequot; t.datetime quot;created_atquot; t.datetime quot;updated_atquot; end create_table quot;user_booksquot;, :force => true do |t| t.integer quot;user_idquot;, :null => false t.integer quot;book_idquot;, :null => false t.integer quot;ratingquot;, :default => 0 end create_table quot;usersquot;, :force => true do |t| t.string quot;namequot; t.datetime quot;created_atquot; t.datetime quot;updated_atquot; end end
  • 47. class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books, :score => :rating end
  • 48. That’s it
  • 49. Improvements? • Better API • Perform calculations over a cluster (EC2) using Map/Nanite
  • 50. class AARN < Nanite::Actor expose :sim_pearson def sim_pearson(item1, item2) Optimizations.c_sim_pearson(item1, item2) end end
  • 51. Questions? http://eribium.org/blog twitter : maccman email/jabber: maccman@gmail.com http://github.com/maccman/acts_as_recommendable http://rubyurl.com/kUpk