Acts As Recommendable

1,545
-1

Published on

RubyManor talk on using Recommendation systems in production.

1 Comment
9 Likes
Statistics
Notes
  • Awesome plugin.

    I created a basic, more simple similarity matching plugin.

    http://www.freezzo.com/2009/06/04/acts_as_similar-a-basic-similarity-activerecord-plugin/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,545
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
1
Likes
9
Embeds 0
No embeds

No notes for slide

Acts As Recommendable

  1. 1. Recommendations in Production Alex MacCaw
  2. 2. Netflix Prize
  3. 3. Amazon.com Facebook Last.fm StumbleUpon Google Suggest iTunes Rotten Tomatoes Yelp
  4. 4. Google Search
  5. 5. Chicken or Egg
  6. 6. • Google Reader • IMDB
  7. 7. Acts As Recommendable
  8. 8. Types of recommendations • Content Based • User Based • Item Based
  9. 9. Programming Collective Intelligence
  10. 10. Has Many Through Relationship
  11. 11. User Has Many Through Book Has Many Has Many UserBooks Can have score (rating)
  12. 12. User class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books end
  13. 13. Gives you User#similar_users User#recommended_books Book#similar_books
  14. 14. The algorithms • Manhattan Distance • Euclidean distance • Cosine • Pearson correlation coefficient • Jaccard • Levenshtein
  15. 15. How does it work?
  16. 16. Strategy • Map data into Euclidean Space • Calculate similarity • Use similarities to recommend
  17. 17. The Black John Tucker Knight Must Die James 4 5 Jonah 3 2 George 5 3 Alex 4 2
  18. 18. 5.00 3.75 The Black Knight 2.50 1.25 0 0 1.25 2.50 3.75 5.00 John Tucker Must Die
  19. 19. 5.00 3.75 The Black Knight 2.50 1.25 0 0 1.25 2.50 3.75 5.00 John Tucker Must Die
  20. 20. item id { user id 1 => { 1 => 1.0, 2 => 0.0, score ... }, ... }
  21. 21. [[1, 0.5554], [2, 0.888], [3, 0.8843], ...]
  22. 22. Problem 1 It was far too slow to calculate on the fly (obvious)
  23. 23. SELECT * FROM quot;usersquot; WHERE (quot;usersquot;.quot;idquot; = 2) SELECT * FROM quot;booksquot; SELECT * FROM quot;usersquot; SELECT quot;user_booksquot;.* FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id IN (1,2,3,4,5,6,7,8,9,10)) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; IN (11,6,12,7,13,8,14,9,15,1,2,19,20,3,10,4,5)) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; IN (20,3,19,6)) All books All user_books
  24. 24. Solution Cache the dataset Build offline rake recommendations:build
  25. 25. SELECT * FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id = 2) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 5) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 4) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 8) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 7) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 2) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 1)
  26. 26. Problem 2 Fetching the dataset took too long since it was so massive
  27. 27. Solution Split up the cache by item
  28. 28. Rails.cache.write( quot;aar_books_1quot;, scores )
  29. 29. Problem 3 The dataset was so big it crashed Ruby!
  30. 30. Solution Get rid of ActiveRecord Only deal with integers
  31. 31. items = options[:on_class].connection.select_values( quot;SELECT id from #{options[:on_class].table_name}quot; ).collect(&:to_i)
  32. 32. Problem 4 It still crashed Ruby!
  33. 33. { 1 => { 1 => 1.0, 2 => 0.0, ... }, ... }
  34. 34. Solution Remove unnecessary cruft from dataset
  35. 35. { 1 => { 1 => 1.0, ... }, ... }
  36. 36. Problem 5 It was too slow
  37. 37. Solution Re-write the slow bits in C
  38. 38. Details • RubyInline • Implemented Pearson • Monkey patched original Ruby methods • Very fast
  39. 39. InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include quot;ruby.hquot; double c_sim_pearson(VALUE items) { Ruby Object
  40. 40. InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include quot;ruby.hquot; double c_sim_pearson(VALUE items) { No Floats :(
  41. 41. Hash Lookup if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) { prefs1_item = 0.0; } else { prefs1_item = NUM2DBL(prefs1_item_ob); }
  42. 42. Conversion return num / den;
  43. 43. Design Designs • Not too many relationships • Not to many ‘items’ • Similarity matrix for items, not users
  44. 44. Changing data
  45. 45. Scaling Even Further • K Means clustering • Split cluster by category
  46. 46. Adding ratings ActiveRecord::Schema.define(:version => 1) do create_table quot;booksquot;, :force => true do |t| t.string quot;namequot; t.datetime quot;created_atquot; t.datetime quot;updated_atquot; end create_table quot;user_booksquot;, :force => true do |t| t.integer quot;user_idquot;, :null => false t.integer quot;book_idquot;, :null => false t.integer quot;ratingquot;, :default => 0 end create_table quot;usersquot;, :force => true do |t| t.string quot;namequot; t.datetime quot;created_atquot; t.datetime quot;updated_atquot; end end
  47. 47. class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books, :score => :rating end
  48. 48. That’s it
  49. 49. Improvements? • Better API • Perform calculations over a cluster (EC2) using Map/Nanite
  50. 50. class AARN < Nanite::Actor expose :sim_pearson def sim_pearson(item1, item2) Optimizations.c_sim_pearson(item1, item2) end end
  51. 51. Questions? http://eribium.org/blog twitter : maccman email/jabber: maccman@gmail.com http://github.com/maccman/acts_as_recommendable http://rubyurl.com/kUpk

×