Recommendations
  in Production
     Alex MacCaw
Netflix Prize
Amazon.com
Facebook
Last.fm
StumbleUpon
Google Suggest
iTunes
Rotten Tomatoes
Yelp
Google Search
Chicken or Egg
• Google Reader
• IMDB
Acts As
Recommendable
Types of
   recommendations

• Content Based
• User Based
• Item Based
Programming Collective Intelligence
Has Many Through Relationship
User            Has Many Through
                                              Book

  Has Many                           ...
User

class User < ActiveRecord::Base
  has_many :user_books
  has_many :books, :through => :user_books
  acts_as_recommen...
Gives you


User#similar_users
User#recommended_books
Book#similar_books
The algorithms
• Manhattan Distance
• Euclidean distance
• Cosine
• Pearson correlation coefficient
• Jaccard
• Levenshtein
How does it work?
Strategy

• Map data into Euclidean Space
• Calculate similarity
• Use similarities to recommend
The Black   John Tucker
          Knight      Must Die
James       4            5

Jonah       3            2

George     ...
5.00

                   3.75
The Black Knight
                   2.50

                   1.25

                     0
  ...
5.00

                   3.75
The Black Knight
                   2.50

                   1.25

                     0
  ...
item id
{
                        user id
    1 => {
       1 => 1.0,
       2 => 0.0,                  score
       ...
 ...
[[1, 0.5554], [2, 0.888], [3, 0.8843], ...]
Problem 1

It was far too slow to calculate on the fly
                 (obvious)
SELECT   * FROM quot;usersquot;   WHERE (quot;usersquot;.quot;idquot; = 2)
SELECT   * FROM quot;booksquot;
SELECT   * FROM...
Solution
      Cache the dataset
        Build offline


rake recommendations:build
SELECT   *   FROM   quot;user_booksquot; WHERE (quot;user_booksquot;.user_id = 2)
SELECT   *   FROM   quot;booksquot; WHER...
Problem 2

Fetching the dataset took too
 long since it was so massive
Solution


Split up the cache by item
Rails.cache.write(
   quot;aar_books_1quot;, scores
 )
Problem 3

The dataset was so big it
     crashed Ruby!
Solution


Get rid of ActiveRecord

Only deal with integers
items = options[:on_class].connection.select_values(
  quot;SELECT id from #{options[:on_class].table_name}quot;
  ).colle...
Problem 4


It still crashed Ruby!
{
    1 => {
       1 => 1.0,
       2 => 0.0,
       ...
    },
    ...
}
Solution

Remove unnecessary
 cruft from dataset
{
    1 => {
       1 => 1.0,
       ...
    },
    ...
}
Problem 5

It was too slow
Solution


Re-write the slow bits in C
Details

• RubyInline
• Implemented Pearson
• Monkey patched original Ruby methods
• Very fast
InlineC = Module.new do
    inline do |builder|
      builder.c '
      #include <math.h>
      #include quot;ruby.hquot;
...
InlineC = Module.new do
        inline do |builder|
          builder.c '
          #include <math.h>
          #include q...
Hash Lookup

if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) {
  prefs1_item = 0.0;
} else {
  prefs1_ite...
Conversion

return num / den;
Design Designs

• Not too many relationships
• Not to many ‘items’
• Similarity matrix for items, not users
Changing data
Scaling Even Further


• K Means clustering
• Split cluster by category
Adding ratings
ActiveRecord::Schema.define(:version => 1) do
  create_table quot;booksquot;, :force => true do |t|
    t.s...
class User < ActiveRecord::Base
  has_many :user_books
  has_many :books, :through => :user_books
  acts_as_recommendable ...
That’s it
Improvements?


• Better API
• Perform calculations over a cluster (EC2)
  using Map/Nanite
class AARN < Nanite::Actor
  expose :sim_pearson

  def sim_pearson(item1, item2)
    Optimizations.c_sim_pearson(item1, i...
Questions?
              http://eribium.org/blog

           twitter : maccman
       email/jabber: maccman@gmail.com



h...
Acts As Recommendable
Acts As Recommendable
Upcoming SlideShare
Loading in...5
×

Acts As Recommendable

1,512

Published on

RubyManor talk on using Recommendation systems in production.

1 Comment
9 Likes
Statistics
Notes
  • Awesome plugin.

    I created a basic, more simple similarity matching plugin.

    http://www.freezzo.com/2009/06/04/acts_as_similar-a-basic-similarity-activerecord-plugin/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,512
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
1
Likes
9
Embeds 0
No embeds

No notes for slide

Acts As Recommendable

  1. 1. Recommendations in Production Alex MacCaw
  2. 2. Netflix Prize
  3. 3. Amazon.com Facebook Last.fm StumbleUpon Google Suggest iTunes Rotten Tomatoes Yelp
  4. 4. Google Search
  5. 5. Chicken or Egg
  6. 6. • Google Reader • IMDB
  7. 7. Acts As Recommendable
  8. 8. Types of recommendations • Content Based • User Based • Item Based
  9. 9. Programming Collective Intelligence
  10. 10. Has Many Through Relationship
  11. 11. User Has Many Through Book Has Many Has Many UserBooks Can have score (rating)
  12. 12. User class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books end
  13. 13. Gives you User#similar_users User#recommended_books Book#similar_books
  14. 14. The algorithms • Manhattan Distance • Euclidean distance • Cosine • Pearson correlation coefficient • Jaccard • Levenshtein
  15. 15. How does it work?
  16. 16. Strategy • Map data into Euclidean Space • Calculate similarity • Use similarities to recommend
  17. 17. The Black John Tucker Knight Must Die James 4 5 Jonah 3 2 George 5 3 Alex 4 2
  18. 18. 5.00 3.75 The Black Knight 2.50 1.25 0 0 1.25 2.50 3.75 5.00 John Tucker Must Die
  19. 19. 5.00 3.75 The Black Knight 2.50 1.25 0 0 1.25 2.50 3.75 5.00 John Tucker Must Die
  20. 20. item id { user id 1 => { 1 => 1.0, 2 => 0.0, score ... }, ... }
  21. 21. [[1, 0.5554], [2, 0.888], [3, 0.8843], ...]
  22. 22. Problem 1 It was far too slow to calculate on the fly (obvious)
  23. 23. SELECT * FROM quot;usersquot; WHERE (quot;usersquot;.quot;idquot; = 2) SELECT * FROM quot;booksquot; SELECT * FROM quot;usersquot; SELECT quot;user_booksquot;.* FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id IN (1,2,3,4,5,6,7,8,9,10)) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; IN (11,6,12,7,13,8,14,9,15,1,2,19,20,3,10,4,5)) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; IN (20,3,19,6)) All books All user_books
  24. 24. Solution Cache the dataset Build offline rake recommendations:build
  25. 25. SELECT * FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id = 2) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 5) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 4) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 8) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 7) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 2) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 1)
  26. 26. Problem 2 Fetching the dataset took too long since it was so massive
  27. 27. Solution Split up the cache by item
  28. 28. Rails.cache.write( quot;aar_books_1quot;, scores )
  29. 29. Problem 3 The dataset was so big it crashed Ruby!
  30. 30. Solution Get rid of ActiveRecord Only deal with integers
  31. 31. items = options[:on_class].connection.select_values( quot;SELECT id from #{options[:on_class].table_name}quot; ).collect(&:to_i)
  32. 32. Problem 4 It still crashed Ruby!
  33. 33. { 1 => { 1 => 1.0, 2 => 0.0, ... }, ... }
  34. 34. Solution Remove unnecessary cruft from dataset
  35. 35. { 1 => { 1 => 1.0, ... }, ... }
  36. 36. Problem 5 It was too slow
  37. 37. Solution Re-write the slow bits in C
  38. 38. Details • RubyInline • Implemented Pearson • Monkey patched original Ruby methods • Very fast
  39. 39. InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include quot;ruby.hquot; double c_sim_pearson(VALUE items) { Ruby Object
  40. 40. InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include quot;ruby.hquot; double c_sim_pearson(VALUE items) { No Floats :(
  41. 41. Hash Lookup if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) { prefs1_item = 0.0; } else { prefs1_item = NUM2DBL(prefs1_item_ob); }
  42. 42. Conversion return num / den;
  43. 43. Design Designs • Not too many relationships • Not to many ‘items’ • Similarity matrix for items, not users
  44. 44. Changing data
  45. 45. Scaling Even Further • K Means clustering • Split cluster by category
  46. 46. Adding ratings ActiveRecord::Schema.define(:version => 1) do create_table quot;booksquot;, :force => true do |t| t.string quot;namequot; t.datetime quot;created_atquot; t.datetime quot;updated_atquot; end create_table quot;user_booksquot;, :force => true do |t| t.integer quot;user_idquot;, :null => false t.integer quot;book_idquot;, :null => false t.integer quot;ratingquot;, :default => 0 end create_table quot;usersquot;, :force => true do |t| t.string quot;namequot; t.datetime quot;created_atquot; t.datetime quot;updated_atquot; end end
  47. 47. class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books, :score => :rating end
  48. 48. That’s it
  49. 49. Improvements? • Better API • Perform calculations over a cluster (EC2) using Map/Nanite
  50. 50. class AARN < Nanite::Actor expose :sim_pearson def sim_pearson(item1, item2) Optimizations.c_sim_pearson(item1, item2) end end
  51. 51. Questions? http://eribium.org/blog twitter : maccman email/jabber: maccman@gmail.com http://github.com/maccman/acts_as_recommendable http://rubyurl.com/kUpk
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×