Acts As Recommendable
Upcoming SlideShare
Loading in...5
×
 

Acts As Recommendable

on

  • 2,735 views

RubyManor talk on using Recommendation systems in production.

RubyManor talk on using Recommendation systems in production.

Statistics

Views

Total Views
2,735
Views on SlideShare
2,731
Embed Views
4

Actions

Likes
9
Downloads
25
Comments
1

2 Embeds 4

http://www.slideshare.net 3
http://twitter.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Awesome plugin.

    I created a basic, more simple similarity matching plugin.

    http://www.freezzo.com/2009/06/04/acts_as_similar-a-basic-similarity-activerecord-plugin/
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Acts As Recommendable Acts As Recommendable Presentation Transcript

  • Recommendations in Production Alex MacCaw
  • Netflix Prize
  • Amazon.com Facebook Last.fm StumbleUpon Google Suggest iTunes Rotten Tomatoes Yelp
  • Google Search
  • Chicken or Egg
  • • Google Reader • IMDB
  • Acts As Recommendable
  • Types of recommendations • Content Based • User Based • Item Based
  • Programming Collective Intelligence
  • Has Many Through Relationship
  • User Has Many Through Book Has Many Has Many UserBooks Can have score (rating)
  • User class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books end
  • Gives you User#similar_users User#recommended_books Book#similar_books
  • The algorithms • Manhattan Distance • Euclidean distance • Cosine • Pearson correlation coefficient • Jaccard • Levenshtein
  • How does it work?
  • Strategy • Map data into Euclidean Space • Calculate similarity • Use similarities to recommend
  • The Black John Tucker Knight Must Die James 4 5 Jonah 3 2 George 5 3 Alex 4 2
  • 5.00 3.75 The Black Knight 2.50 1.25 0 0 1.25 2.50 3.75 5.00 John Tucker Must Die
  • 5.00 3.75 The Black Knight 2.50 1.25 0 0 1.25 2.50 3.75 5.00 John Tucker Must Die
  • item id { user id 1 => { 1 => 1.0, 2 => 0.0, score ... }, ... }
  • [[1, 0.5554], [2, 0.888], [3, 0.8843], ...]
  • Problem 1 It was far too slow to calculate on the fly (obvious)
  • SELECT * FROM quot;usersquot; WHERE (quot;usersquot;.quot;idquot; = 2) SELECT * FROM quot;booksquot; SELECT * FROM quot;usersquot; SELECT quot;user_booksquot;.* FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id IN (1,2,3,4,5,6,7,8,9,10)) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; IN (11,6,12,7,13,8,14,9,15,1,2,19,20,3,10,4,5)) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; IN (20,3,19,6)) All books All user_books
  • Solution Cache the dataset Build offline rake recommendations:build
  • SELECT * FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id = 2) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 5) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 4) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 8) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 7) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 2) SELECT * FROM quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 1)
  • Problem 2 Fetching the dataset took too long since it was so massive
  • Solution Split up the cache by item
  • Rails.cache.write( quot;aar_books_1quot;, scores )
  • Problem 3 The dataset was so big it crashed Ruby!
  • Solution Get rid of ActiveRecord Only deal with integers
  • items = options[:on_class].connection.select_values( quot;SELECT id from #{options[:on_class].table_name}quot; ).collect(&:to_i)
  • Problem 4 It still crashed Ruby!
  • { 1 => { 1 => 1.0, 2 => 0.0, ... }, ... }
  • Solution Remove unnecessary cruft from dataset
  • { 1 => { 1 => 1.0, ... }, ... }
  • Problem 5 It was too slow
  • Solution Re-write the slow bits in C
  • Details • RubyInline • Implemented Pearson • Monkey patched original Ruby methods • Very fast
  • InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include quot;ruby.hquot; double c_sim_pearson(VALUE items) { Ruby Object
  • InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include quot;ruby.hquot; double c_sim_pearson(VALUE items) { No Floats :(
  • Hash Lookup if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) { prefs1_item = 0.0; } else { prefs1_item = NUM2DBL(prefs1_item_ob); }
  • Conversion return num / den;
  • Design Designs • Not too many relationships • Not to many ‘items’ • Similarity matrix for items, not users
  • Changing data
  • Scaling Even Further • K Means clustering • Split cluster by category
  • Adding ratings ActiveRecord::Schema.define(:version => 1) do create_table quot;booksquot;, :force => true do |t| t.string quot;namequot; t.datetime quot;created_atquot; t.datetime quot;updated_atquot; end create_table quot;user_booksquot;, :force => true do |t| t.integer quot;user_idquot;, :null => false t.integer quot;book_idquot;, :null => false t.integer quot;ratingquot;, :default => 0 end create_table quot;usersquot;, :force => true do |t| t.string quot;namequot; t.datetime quot;created_atquot; t.datetime quot;updated_atquot; end end
  • class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books, :score => :rating end
  • That’s it
  • Improvements? • Better API • Perform calculations over a cluster (EC2) using Map/Nanite
  • class AARN < Nanite::Actor expose :sim_pearson def sim_pearson(item1, item2) Optimizations.c_sim_pearson(item1, item2) end end
  • Questions? http://eribium.org/blog twitter : maccman email/jabber: maccman@gmail.com http://github.com/maccman/acts_as_recommendable http://rubyurl.com/kUpk