RECOMMENDING STUFF IN RUBY     (there’s life beyond the CRUD)              @herval
ABOUT ME‣Making software since the 90’s‣Retired “startup guy”‣Pivot
ARTIFICIAL INTELLIGENCE"the design of systems thatperceive their environment andtake actions to maximize itschances of suc...
ARTIFICIAL INTELLIGENCE‣End goal: achieve superhuman intelligence in a machine‣The "golden dream" of the 80s‣The “holy gra...
ZOMG ROBOTS!
WELL, SORT OF.
“BUT WHAT SHOULD I USE AI FOR...?”                      Pick the most cost-effective combination of flightsGenetic algorit...
But that’s all SO COMPLICATED…Isn’t there anything a bit more          practical?                  - some pragmatic progra...
ARTIFICIAL “INTELLIGENCE”‣Basic techniques‣Far from “intelligent”, still very useful                   “classify blogs in ...
RECOMMENDATION ENGINES        (Finally!)
IN A NUTSHELL...‣Collect ratings/preferences from users (eg.: reviews, likes)‣Compare people to other people based on what...
STEP 1: BUILD A DATASETBoil preferences down to numbers*‣ User liked a game = 0.5 point‣ User purchased a game = 1 point‣ ...
STEP 1: BUILD A DATASET[   Entity.new(Lisa, {     Prince of Persia => 2.5, Doom => 3.5, Castle Wolfenstein => 3.0, Rise of...
STEP 2: COMPARE PEOPLE‣ Compare each person to one another, generating a  similarity score‣ Euclidian distance between eac...
STEP 2: COMPARE PEOPLE# Returns the euclidian distance between person1 and person2def distance(person1, person2)  rated_by...
STEP 3: SIMILAR USERS    Grab the n users with the   highest level of similarity  (in other words, the people closest to y...
STEP 3: SIMILAR USERS# Returns the 5 best matching people (most similar preferences)def top_matches(person, all_ratings)  ...
STEP 3: SIMILAR USERS# Returns the 5 best matching people (most similar preferences)def top_matches(person, all_ratings)  ...
STEP 4: RECOMMENDING A GAME‣Grab each user’s ratings to games you haven’t rated‣Multiply that by how similar the other use...
STEP 4: RECOMMENDING A GAME# Gets recommendations for a person by using a weighted average of every other users ratingsdef...
STEP 4: RECOMMENDING A GAME# Gets recommendations for a person by using a weighted average of every other users ratingsdef...
STEP 5: SIMILAR GAMES‣ Invert users x preferences, then use the exact same algorithm as  step 1 to find similar games base...
STEP 5: SIMILAR GAMES# Create a dictionary of games showing which other games they# are most similar to. This should be ru...
STEP 5: SIMILAR GAMES# Create a dictionary of games showing which other games they# are most similar to. This should be ru...
BONUS STAGE: FASTER RECOMMENDATIONS               (if you’re still with me)‣A slightly tweaked version of the algorithm on...
BONUS STAGE: FASTER RECOMMENDATIONS                                              (if you’re still with me)# this is very s...
BONUS STAGE: FASTER RECOMMENDATIONS                                              (if you’re still with me)# this is very s...
That’s awesome and stuff, but...do these come in little boxes?                  - that pragmatic programmer
Yes, we have RubyGems                       top choice according to    recommendify                              RubyToolb...
“RECOMMENDIFY” EXAMPLEclass GameRecommender < Recommendify::Base    # store only the top 10 neighbors per item    max_neig...
“RECOMMENDABLE” EXAMPLEclass User < ActiveRecord::Base  recommends :books, :movies, :gamesend>>   friend.like(Movie.where(...
CLOSING REMARKS‣Gems are cool, but you’ll have to dive into the code for better results‣Crossing social filtering with oth...
ZOMG I NEED TO KNOW MORECode from this presentation: https://gist.github.com/herval/4992503Stuff to read:‣ AI Application ...
QUESTCHUNS?
Recommendation engines
Upcoming SlideShare
Loading in...5
×

Recommendation engines

1,027

Published on

1 Comment
10 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,027
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
19
Comments
1
Likes
10
Embeds 0
No embeds

No notes for slide

Recommendation engines

  1. 1. RECOMMENDING STUFF IN RUBY (there’s life beyond the CRUD) @herval
  2. 2. ABOUT ME‣Making software since the 90’s‣Retired “startup guy”‣Pivot
  3. 3. ARTIFICIAL INTELLIGENCE"the design of systems thatperceive their environment andtake actions to maximize itschances of success"  - John McCarthy (1956)
  4. 4. ARTIFICIAL INTELLIGENCE‣End goal: achieve superhuman intelligence in a machine‣The "golden dream" of the 80s‣The “holy grail” of the Singularitarians
  5. 5. ZOMG ROBOTS!
  6. 6. WELL, SORT OF.
  7. 7. “BUT WHAT SHOULD I USE AI FOR...?” Pick the most cost-effective combination of flightsGenetic algorithms between NYC and Tashkent*Natural Language "Siri, call me sexy!" (iPhone) Processing Machine Vision Unlock your Android phone just by looking at it Neural networks "automatically detect and flag NSFW pictures" (Flickr) "I got a high fever and my knees hurt. Am I Classifiers dying?" (WebMD) * that’s in uzbekistan, if you’re asking
  8. 8. But that’s all SO COMPLICATED…Isn’t there anything a bit more practical? - some pragmatic programmer
  9. 9. ARTIFICIAL “INTELLIGENCE”‣Basic techniques‣Far from “intelligent”, still very useful “classify blogs in categories based Data clustering on their content” (Flipboard) Recommendation “people who watched The X-Files engines also liked...”
  10. 10. RECOMMENDATION ENGINES (Finally!)
  11. 11. IN A NUTSHELL...‣Collect ratings/preferences from users (eg.: reviews, likes)‣Compare people to other people based on what they like‣Offer stuff similar people already liked
  12. 12. STEP 1: BUILD A DATASETBoil preferences down to numbers*‣ User liked a game = 0.5 point‣ User purchased a game = 1 point‣ User played a game = 2 points‣ User reviewed a game = 3 points‣ User liked, purchased, played and reviewed a game = 6.5 points* Resist the urge to use 1-5 star ratings
  13. 13. STEP 1: BUILD A DATASET[ Entity.new(Lisa, { Prince of Persia => 2.5, Doom => 3.5, Castle Wolfenstein => 3.0, Rise of the Triad => 3.5, Commander Keen =>2.5, Duke Nukem => 3.0 }), Entity.new(Larry, { Prince of Persia => 3.0, Doom => 3.5, Castle Wolfenstein => 1.5, Rise of the Triad => 5.0, Duke Nukem => 3.0,Commander Keen => 3.5 }), Entity.new(Robert, { Prince of Persia => 2.5, Doom => 3.0, Rise of the Triad => 3.5, Duke Nukem => 4.0 }), Entity.new(Claudia, { Doom => 3.5, Castle Wolfenstein => 3.0, Duke Nukem => 4.5, Rise of the Triad => 4.0, Commander Keen => 2.5 }), Entity.new(Mark, { Prince of Persia => 3.0, Doom => 4.0, Castle Wolfenstein => 2.0, Rise of the Triad => 3.0, Duke Nukem => 3.0,Commander Keen => 2.0 }), Entity.new(Jane, { Prince of Persia => 3.0, Doom => 4.0, Duke Nukem => 3.0, Rise of the Triad => 5.0, Commander Keen => 3.5 }), Entity.new(John, { Doom => 4.5, Commander Keen => 1.0, Rise of the Triad => 4.0 })]
  14. 14. STEP 2: COMPARE PEOPLE‣ Compare each person to one another, generating a similarity score‣ Euclidian distance between each pair of ratings:‣ Many other distance calculation algorithms exist: linear distance, jaccard, manhattan, tanimoto, etc‣ Better algorithm = more interesting results
  15. 15. STEP 2: COMPARE PEOPLE# Returns the euclidian distance between person1 and person2def distance(person1, person2) rated_by_both = person1.ratings.select { |game| person2.ratings[game] } # if they have no ratings in common, return 0 return 0.0 if rated_by_both.empty? # add up the squares of all the differences sum_of_squares = 0.0 person1.ratings.collect do |game, score| person2_score = person2.ratings[game] next if !person2_score sum_of_squares += ((score - person2_score) ** 2) end 1.0 / (1.0 + sum_of_squares)end
  16. 16. STEP 3: SIMILAR USERS Grab the n users with the highest level of similarity (in other words, the people closest to you,according to the distance algorithm from step 2)
  17. 17. STEP 3: SIMILAR USERS# Returns the 5 best matching people (most similar preferences)def top_matches(person, all_ratings) other_people = all_ratings.select { |person2| person2.name != person.name } other_people.collect do |other_person| [ other_person, distance(person, other_person) # change this to use other algorithms ] end.sort_by { |sim| sim[1] }.reverse[0..5]end# People similar to John:# Mark (30% match)# Robert (28% match)# Claudia (23% match)# Lisa (22% match)# Jane (11% match)# Larry (10% match)
  18. 18. STEP 3: SIMILAR USERS# Returns the 5 best matching people (most similar preferences)def top_matches(person, all_ratings) other_people = all_ratings.select { |person2| person2.name != person.name } other_people.collect do |other_person| [ other_person, distance(person, other_person) # change this to use other algorithms ] end.sort_by { |sim| sim[1] }.reverse[0..5]end# People similar to John:# Mark (30% match)# Robert (28% match) !# Claudia (23% match) Achievement unlocked# Lisa (22% match)# Jane (11% match)# Larry (10% match) “people you should follow: John, Mary”
  19. 19. STEP 4: RECOMMENDING A GAME‣Grab each user’s ratings to games you haven’t rated‣Multiply that by how similar the other user is to you(opinions from people similar to you weight more)‣Grab the highest numbers
  20. 20. STEP 4: RECOMMENDING A GAME# Gets recommendations for a person by using a weighted average of every other users ratingsdef recommendations(person, other_people) similarities = {} other_people.each do |other_person| similarity = distance(person, other_person) # ignore scores of zero or lower next if similarity <= 0 other_person.ratings.each do |other_person_game, other_person_score| # only score what I havent rated yet next if person.ratings[other_person_game] similarity_for_game = similarities[other_person_game] ||= { :weighted => 0, :sum => 0 } # sum of weighted rating times similarity and total similarity similarity_for_game[:weighted] += other_person.ratings[other_person_game] * similarity similarity_for_game[:sum] += similarity end end # normalize list and sort by highest scores first # Recommended games for John: similarities.collect do |game_name, score| # Duke Nukem [ game_name, (score[:weighted] / score[:sum]) ] end.sort_by { |sim| sim[1] }.reverse # Prince of Persiaend # Castle Wolfenstein
  21. 21. STEP 4: RECOMMENDING A GAME# Gets recommendations for a person by using a weighted average of every other users ratingsdef recommendations(person, other_people) similarities = {} other_people.each do |other_person| similarity = distance(person, other_person) # ignore scores of zero or lower next if similarity <= 0 other_person.ratings.each do |other_person_game, other_person_score| # only score what I havent rated yet next if person.ratings[other_person_game] similarity_for_game = similarities[other_person_game] ||= { :weighted => 0, :sum => 0 } # sum of weighted rating times similarity and total similarity similarity_for_game[:weighted] += other_person.ratings[other_person_game] * similarity similarity_for_game[:sum] += similarity end end ! Achievement unlocked # Recommended games for John: # normalize list and sort by highest scores first similarities.collect do |game_name, score| # Duke Nukem [ game_name, (score[:weighted] / score[:sum]) ] end.sort_by { |sim| sim[1] }.reverse “games recommended to John: #Pac Man,Persia 3” Prince of Doomend # Castle Wolfenstein
  22. 22. STEP 5: SIMILAR GAMES‣ Invert users x preferences, then use the exact same algorithm as step 1 to find similar games based solely on people’s interactions (“item-based filtering”). # User has many ratings "Larry" => { "Prince of Persia" => 3.0, "Doom" => 3.5, "Castle Wolfenstein" => 1.5 }, "Robert" => { "Prince of Persia" => 2.5, "Doom" => 3.0 }, "Jane" => { "Prince of Persia" => 3.0, "Doom" => 4.0 }, "John" => { "Doom" => 4.5 } # Game rated by many users "Prince of Persia" => { "Larry"=>3.0, "Robert"=>2.5, "Jane"=>3.0 }, "Doom" => { "Larry"=>3.5, "Robert"=>3.0, "Jane"=>4.0, "John"=>4.5 }, "Castle Wolfenstein" => { "Larry"=>1.5, "Mark"=>2.0 }‣ Cross-compare everything. This might take a very long time for a large number of games…Hint: save this data on a persistent storage will lead to very fast recommendation lookups (that’s what most recommendation engines save, in fact)
  23. 23. STEP 5: SIMILAR GAMES# Create a dictionary of games showing which other games they# are most similar to. This should be run often and cached for reusedef calculate_similar_games(game_ratings) Hash[game_ratings.collect do |game| [ game.name, top_matches(game, game_ratings) ] end]end# Similar games:# Prince of Persia: Commander Keen (40%), Duke Nukem (28%), Castle Wolfenstein (22%), Doom (22%), Rise of the Triad (9%)# Doom: Prince of Persia (22%), Duke Nukem (18%), Rise of the Triad (16%), Castle Wolfenstein (10%), Commander Keen (5%)# Castle Wolfenstein: Prince of Persia (22%), Commander Keen (18%), Duke Nukem (15%), Doom (10%), Rise of the Triad (6%)# Rise of the Triad: Doom (16%), Duke Nukem (10%), Prince of Persia (9%), Castle Wolfenstein (6%), Commander Keen (5%)# Commander Keen: Prince of Persia (40%), Castle Wolfenstein (18%), Duke Nukem (14%), Rise of the Triad (5%), Doom (5%)# Duke Nukem: Prince of Persia (28%), Doom (18%), Castle Wolfenstein (15%), Commander Keen (14%), Rise of the Triad (10%)
  24. 24. STEP 5: SIMILAR GAMES# Create a dictionary of games showing which other games they# are most similar to. This should be run often and cached for reusedef calculate_similar_games(game_ratings) Hash[game_ratings.collect do |game| [ game.name, top_matches(game, game_ratings) ] end]end !# Similar games: Achievement unlocked# Prince of Persia: Commander Keen (40%), Duke Nukem (28%), Castle Wolfenstein (22%), Doom (22%), Rise of the Triad (9%)# Doom: Prince of Persia (22%), Duke Nukem (18%), Rise of the Triad (16%), Castle Wolfenstein (10%), Commander Keen (5%)# Castle Wolfenstein: Prince of Persia (22%), Commander Keen (18%), Duke Nukem (15%), Doom (10%), Rise of the Triad (6%)# Rise of the Triad: Doom (16%), Duke Nukem (10%), Prince of Persia (9%), Castle Wolfenstein (6%), Commander Keen (5%)# “Doom is similar to Daikatana and Quake” Commander Keen: Prince of Persia (40%), Castle Wolfenstein (18%), Duke Nukem (14%), Rise of the Triad (5%), Doom (5%)# Duke Nukem: Prince of Persia (28%), Doom (18%), Castle Wolfenstein (15%), Commander Keen (14%), Rise of the Triad (10%)
  25. 25. BONUS STAGE: FASTER RECOMMENDATIONS (if you’re still with me)‣A slightly tweaked version of the algorithm on step 2: just use the pre-calculated similarities instead of doing distances in the loop‣Up to 10x faster in a pure Ruby implementation
  26. 26. BONUS STAGE: FASTER RECOMMENDATIONS (if you’re still with me)# this is very similar to the recommendations() algorithm,# except we use a pre-calculated similar_games_matrix instead of# calculating distances heredef recommended_games(similar_games_matrix, user) similarities = {} user.ratings.each do |game_name, user_rating| # Loop over pre-cached game similarities to the current game similar_games_matrix[game_name].each do |game, similarity| # Ignore if this user has already rated this similar game next if user.ratings[game.name] score_for_game = similarities[game.name] ||= { :weighted => 0, :sum => 0 } # Weighted sum of rating times similarity and sum of similarities score_for_game[:weighted] += similarity * user_rating score_for_game[:sum] += similarity end end # Divide each total score by total weighting to get an average # Return the rankings from highest to lowest similarities.collect do |game_name, score| [ game_name, (score[:weighted] / score[:sum]) ] end.sort_by { |sim| sim[1] }.reverseend
  27. 27. BONUS STAGE: FASTER RECOMMENDATIONS (if you’re still with me)# this is very similar to the recommendations() algorithm,# except we use a pre-calculated similar_games_matrix instead of# calculating distances heredef recommended_games(similar_games_matrix, user) similarities = {} user.ratings.each do |game_name, user_rating| # Loop over pre-cached game similarities to the current game similar_games_matrix[game_name].each do |game, similarity| # Ignore if this user has already rated this similar game next if user.ratings[game.name] score_for_game = similarities[game.name] ||= { :weighted => 0, :sum => 0 } # Weighted sum of rating times similarity and sum of similarities score_for_game[:weighted] += similarity * user_rating score_for_game[:sum] += similarity end end # Divide each total score by total weighting to get an average ! # Return the rankings from highest to lowest similarities.collect do |game_name, score| end.sort_by { |sim| sim[1] }.reverseend Achievement unlocked [ game_name, (score[:weighted] / score[:sum]) ] EPIC WIN!
  28. 28. That’s awesome and stuff, but...do these come in little boxes? - that pragmatic programmer
  29. 29. Yes, we have RubyGems top choice according to recommendify RubyToolbox recommendable “Rails-compatible” doesn’t require Redis (notacts_as_recommended actively maintained)
  30. 30. “RECOMMENDIFY” EXAMPLEclass GameRecommender < Recommendify::Base # store only the top 10 neighbors per item max_neighbors 10 # define an input data set "game_ratings". well add "user_id->game_id" # pairs to this input and use the jaccard coefficient to retrieve a # "users that liked game i1 also liked game i2" list input_matrix :game_ratings, :similarity_func => :jaccard, :weight => 5.0endrecommender = GameRecommender.new# add `order_id->product_id` interactions to the order_item_sim input# you can add data incrementally and call RecommendedItem.process! to update# the similarity matrix at any time.recommender.game_ratings.add_set("John", ["Duke Nukem", "Doom", "Quake"])recommender.game_ratings.add_set("Mary", ["Prince of Persia", "Doom"])# Calculate all elements of the similarity matrixrecommender.process!# retrieve similar games to "Doom"recommender.for("Doom")=> [ <Recommendify::Neighbor item_id:"Duke Nukem" similarity:0.23>, (...) ]
  31. 31. “RECOMMENDABLE” EXAMPLEclass User < ActiveRecord::Base recommends :books, :movies, :gamesend>> friend.like(Movie.where(:name => "2001: A Space Odyssey").first)>> friend.like(Book.where(:title => "A Clockwork Orange").first)>> friend.like(Book.where(:title => "Brave New World").first)>> friend.like(Book.where(:title => "One Flew Over the Cuckoos Next").first)>> user.like(Book.where(:title => "A Clockwork Orange").first)>> user.recommended_books=> [#<Book title: "Brave New World">, #<Book title: "One Flew Over the CuckoosNest">]>> user.recommended_movies=> [#<Movie name: "A Clockwork Orange">]
  32. 32. CLOSING REMARKS‣Gems are cool, but you’ll have to dive into the code for better results‣Crossing social filtering with other AI techniques (e.g.: content classification) produces dramatically better results
  33. 33. ZOMG I NEED TO KNOW MORECode from this presentation: https://gist.github.com/herval/4992503Stuff to read:‣ AI Application Programming: http://ai-app-prog.rubyforge.org‣ Programming Collective Intelligence: http://amzn.to/XtANMl (great book for noobs)Ready-to-use Gems‣ https://github.com/paulasmuth/recommendify‣ https://github.com/davidcelis/recommendable‣ https://github.com/Draiken/acts_as_recommendedSerious AI algorithms in Ruby‣ https://github.com/SergioFierens/ai4r‣ http://web.media.mit.edu/~dustin/papers/ai_ruby_plugins/‣ https://github.com/kanwei/algorithms
  34. 34. QUESTCHUNS?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×