Your SlideShare is downloading. ×
Acts As Recommendable
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Acts As Recommendable

1,485
views

Published on

RubyManor talk on using Recommendation systems in production.

RubyManor talk on using Recommendation systems in production.


1 Comment
9 Likes
Statistics
Notes
  • Awesome plugin.

    I created a basic, more simple similarity matching plugin.

    http://www.freezzo.com/2009/06/04/acts_as_similar-a-basic-similarity-activerecord-plugin/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,485
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
25
Comments
1
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Recommendations in Production Alex MacCaw
  • 2. Netflix Prize
  • 3. Amazon.com Facebook Last.fm StumbleUpon Google Suggest iTunes Rotten Tomatoes Yelp
  • 4. Google Search
  • 5. Chicken or Egg
  • 6. • Google Reader • IMDB
  • 7. Acts As Recommendable
  • 8. Types of recommendations • Content Based • User Based • Item Based
  • 9. Programming Collective Intelligence
  • 10. Has Many Through Relationship
  • 11. User Has Many Through Book Has Many Has Many UserBooks Can have score (rating)
  • 12. User class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books end
  • 13. Gives you User#similar_users User#recommended_books Book#similar_books
  • 14. The algorithms • Manhattan Distance • Euclidean distance • Cosine • Pearson correlation coefficient • Jaccard • Levenshtein
  • 15. How does it work?
  • 16. Strategy • Map data into Euclidean Space • Calculate similarity • Use similarities to recommend
  • 17. The Black John Tucker Knight Must Die James 4 5 Jonah 3 2 George 5 3 Alex 4 2
  • 18. 5.00 3.75 The Black Knight 2.50 1.25 0 0 1.25 2.50 3.75 5.00 John Tucker Must Die
  • 19. 5.00 3.75 The Black Knight 2.50 1.25 0 0 1.25 2.50 3.75 5.00 John Tucker Must Die
  • 20. item id { user id 1 => { 1 => 1.0, 2 => 0.0, score ... }, ... }
  • 21. [[1, 0.5554], [2, 0.888], [3, 0.8843], ...]
  • 22. Problem 1 It was far too slow to calculate on the fly (obvious)
  • 23. SELECT * FROM quot;usersquot; WHERE (quot;usersquot;.quot;idquot; = 2) SELECT * FROM quot;booksquot; SELECT * FROM quot;usersquot; SELECT quot;user_booksquot;.* FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id IN (1,2,3,4,5,6,7,8,9,10)) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; IN (11,6,12,7,13,8,14,9,15,1,2,19,20,3,10,4,5)) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; IN (20,3,19,6)) All books All user_books
  • 24. Solution Cache the dataset Build offline rake recommendations:build
  • 25. SELECT * FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id = 2) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 5) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 4) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 8) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 7) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 2) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 1)
  • 26. Problem 2 Fetching the dataset took too long since it was so massive
  • 27. Solution Split up the cache by item
  • 28. Rails.cache.write( quot;aar_books_1quot;, scores )
  • 29. Problem 3 The dataset was so big it crashed Ruby!
  • 30. Solution Get rid of ActiveRecord Only deal with integers
  • 31. items = options[:on_class].connection.select_values( quot;SELECT id from #{options[:on_class].table_name}quot; ).collect(&:to_i)
  • 32. Problem 4 It still crashed Ruby!
  • 33. { 1 => { 1 => 1.0, 2 => 0.0, ... }, ... }
  • 34. Solution Remove unnecessary cruft from dataset
  • 35. { 1 => { 1 => 1.0, ... }, ... }
  • 36. Problem 5 It was too slow
  • 37. Solution Re-write the slow bits in C
  • 38. Details • RubyInline • Implemented Pearson • Monkey patched original Ruby methods • Very fast
  • 39. InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include quot;ruby.hquot; double c_sim_pearson(VALUE items) { Ruby Object
  • 40. InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include quot;ruby.hquot; double c_sim_pearson(VALUE items) { No Floats :(
  • 41. Hash Lookup if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) { prefs1_item = 0.0; } else { prefs1_item = NUM2DBL(prefs1_item_ob); }
  • 42. Conversion return num / den;
  • 43. Design Designs • Not too many relationships • Not to many ‘items’ • Similarity matrix for items, not users
  • 44. Changing data
  • 45. Scaling Even Further • K Means clustering • Split cluster by category
  • 46. Adding ratings ActiveRecord::Schema.define(:version => 1) do create_table quot;booksquot;, :force => true do |t| t.string quot;namequot; t.datetime quot;created_atquot; t.datetime quot;updated_atquot; end create_table quot;user_booksquot;, :force => true do |t| t.integer quot;user_idquot;, :null => false t.integer quot;book_idquot;, :null => false t.integer quot;ratingquot;, :default => 0 end create_table quot;usersquot;, :force => true do |t| t.string quot;namequot; t.datetime quot;created_atquot; t.datetime quot;updated_atquot; end end
  • 47. class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books, :score => :rating end
  • 48. That’s it
  • 49. Improvements? • Better API • Perform calculations over a cluster (EC2) using Map/Nanite
  • 50. class AARN < Nanite::Actor expose :sim_pearson def sim_pearson(item1, item2) Optimizations.c_sim_pearson(item1, item2) end end
  • 51. Questions? http://eribium.org/blog twitter : maccman email/jabber: maccman@gmail.com http://github.com/maccman/acts_as_recommendable http://rubyurl.com/kUpk