SlideShare a Scribd company logo
RECOMMENDING STUFF IN RUBY
     (there’s life beyond the CRUD)




              @herval
ABOUT ME
‣Making software since
 the 90’s
‣Retired “startup guy”
‣Pivot
ARTIFICIAL INTELLIGENCE

"the design of systems that
perceive their environment and
take actions to maximize its
chances of success" 
                     - John McCarthy (1956)
ARTIFICIAL INTELLIGENCE

‣End goal: achieve superhuman
 intelligence in a machine
‣The "golden dream" of the 80's
‣The “holy grail” of the Singularitarians
ZOMG ROBOTS!
WELL, SORT OF.
“BUT WHAT SHOULD I USE AI FOR...?”
                      Pick the most cost-effective combination of flights
Genetic algorithms
                                   between NYC and Tashkent*

Natural Language
                                "Siri, call me sexy!" (iPhone)
   Processing


 Machine Vision         Unlock your Android phone just by looking at it



 Neural networks     "automatically detect and flag NSFW pictures" (Flickr)


                          "I got a high fever and my knees hurt. Am I
   Classifiers
                                        dying?" (WebMD)

                                                           * that’s in uzbekistan, if you’re asking
But that’s all SO COMPLICATED…
Isn’t there anything a bit more
          practical?
                  - some pragmatic programmer
ARTIFICIAL “INTELLIGENCE”
‣Basic techniques
‣Far from “intelligent”, still very useful

                   “classify blogs in categories based
 Data clustering
                      on their content” (Flipboard)

  Recommendation     “people who watched The X-Files
     engines                 also liked...”
RECOMMENDATION ENGINES



        (Finally!)
IN A NUTSHELL...

‣Collect ratings/preferences from users (eg.:
 reviews, likes)
‣Compare people to other people based on what
 they like
‣Offer stuff similar people already liked
STEP 1: BUILD A DATASET
Boil preferences down to numbers*
‣ User liked a game = 0.5 point
‣ User purchased a game = 1 point
‣ User played a game = 2 points
‣ User reviewed a game = 3 points
‣ User liked, purchased, played and
  reviewed a game = 6.5 points



* Resist the urge to use 1-5 star ratings
STEP 1: BUILD A DATASET
[
   Entity.new('Lisa', {
     'Prince of Persia' => 2.5, 'Doom' => 3.5, 'Castle Wolfenstein' => 3.0, 'Rise of the Triad' => 3.5, 'Commander Keen' =>
2.5, 'Duke Nukem' => 3.0
   }),
  Entity.new('Larry', {
     'Prince of Persia' => 3.0, 'Doom' => 3.5, 'Castle Wolfenstein' => 1.5, 'Rise of the Triad' => 5.0, 'Duke Nukem' => 3.0,
'Commander Keen' => 3.5
   }),
  Entity.new('Robert', {
     'Prince of Persia' => 2.5, 'Doom' => 3.0, 'Rise of the Triad' => 3.5, 'Duke Nukem' => 4.0
   }),
  Entity.new('Claudia', {
     'Doom' => 3.5, 'Castle Wolfenstein' => 3.0, 'Duke Nukem' => 4.5, 'Rise of the Triad' => 4.0, 'Commander Keen' => 2.5
   }),
  Entity.new('Mark', {
     'Prince of Persia' => 3.0, 'Doom' => 4.0, 'Castle Wolfenstein' => 2.0, 'Rise of the Triad' => 3.0, 'Duke Nukem' => 3.0,
'Commander Keen' => 2.0
   }),
  Entity.new('Jane', {
     'Prince of Persia' => 3.0, 'Doom' => 4.0, 'Duke Nukem' => 3.0, 'Rise of the Triad' => 5.0, 'Commander Keen' => 3.5
   }),
  Entity.new('John', {
     'Doom' => 4.5, 'Commander Keen' => 1.0, 'Rise of the Triad' => 4.0
   })
]
STEP 2: COMPARE PEOPLE
‣ Compare each person to one another, generating a
  similarity score

‣ Euclidian distance between each pair of ratings:




‣ Many other distance calculation algorithms exist: linear distance, jaccard,
  manhattan, tanimoto, etc
‣ Better algorithm = more interesting results
STEP 2: COMPARE PEOPLE
# Returns the euclidian distance between person1 and person2
def distance(person1, person2)
  rated_by_both = person1.ratings.select { |game| person2.ratings[game] }

 # if they have no ratings in common, return 0
 return 0.0 if rated_by_both.empty?

 # add up the squares of all the differences
 sum_of_squares = 0.0
 person1.ratings.collect do |game, score|
   person2_score = person2.ratings[game]
   next if !person2_score
   sum_of_squares += ((score - person2_score) ** 2)
 end

  1.0 / (1.0 + sum_of_squares)
end
STEP 3: SIMILAR USERS

    Grab the n users with the
   highest level of similarity
  (in other words, the people closest to you,
according to the distance algorithm from step 2)
STEP 3: SIMILAR USERS
# Returns the 5 best matching people (most similar preferences)
def top_matches(person, all_ratings)
  other_people = all_ratings.select { |person2| person2.name != person.name }
  other_people.collect do |other_person|
    [
      other_person,
      distance(person, other_person) # change this to use other algorithms
    ]
  end.sort_by { |sim| sim[1] }.reverse[0..5]
end
#   People similar to John:
#   Mark (30% match)
#   Robert (28% match)
#   Claudia (23% match)
#   Lisa (22% match)
#   Jane (11% match)
#   Larry (10% match)
STEP 3: SIMILAR USERS
# Returns the 5 best matching people (most similar preferences)
def top_matches(person, all_ratings)
  other_people = all_ratings.select { |person2| person2.name != person.name }
  other_people.collect do |other_person|
    [
      other_person,
      distance(person, other_person) # change this to use other algorithms
    ]
  end.sort_by { |sim| sim[1] }.reverse[0..5]
end
#   People similar to John:
#   Mark (30% match)
#   Robert (28% match)




                              !
#   Claudia (23% match)


                                  Achievement unlocked
#   Lisa (22% match)
#   Jane (11% match)
#   Larry (10% match)

                                  “people you should follow: John, Mary”
STEP 4: RECOMMENDING A GAME

‣Grab each user’s ratings to games you
 haven’t rated
‣Multiply that by how similar the other
 user is to you(opinions from people similar to you weight more)



‣Grab the highest numbers
STEP 4: RECOMMENDING A GAME
# Gets recommendations for a person by using a weighted average of every other user's ratings
def recommendations(person, other_people)
  similarities = {}

  other_people.each do |other_person|
    similarity = distance(person, other_person)

   # ignore scores of zero or lower
   next if similarity <= 0

    other_person.ratings.each do |other_person_game, other_person_score|
      # only score what I haven't rated yet
      next if person.ratings[other_person_game]

      similarity_for_game = similarities[other_person_game] ||= { :weighted => 0, :sum => 0 }

     # sum of weighted rating times similarity and total similarity
     similarity_for_game[:weighted] += other_person.ratings[other_person_game] * similarity
     similarity_for_game[:sum] += similarity
   end
 end

  # normalize list and sort by highest scores first                                     #   Recommended games for John:
  similarities.collect do |game_name, score|                                            #   Duke Nukem
    [ game_name, (score[:weighted] / score[:sum]) ]
  end.sort_by { |sim| sim[1] }.reverse                                                  #   Prince of Persia
end                                                                                     #   Castle Wolfenstein
STEP 4: RECOMMENDING A GAME
# Gets recommendations for a person by using a weighted average of every other user's ratings
def recommendations(person, other_people)
  similarities = {}

  other_people.each do |other_person|
    similarity = distance(person, other_person)

   # ignore scores of zero or lower
   next if similarity <= 0

    other_person.ratings.each do |other_person_game, other_person_score|
      # only score what I haven't rated yet
      next if person.ratings[other_person_game]

      similarity_for_game = similarities[other_person_game] ||= { :weighted => 0, :sum => 0 }

     # sum of weighted rating times similarity and total similarity
     similarity_for_game[:weighted] += other_person.ratings[other_person_game] * similarity
     similarity_for_game[:sum] += similarity
   end
 end




                               !         Achievement unlocked                        # Recommended games for John:
  # normalize list and sort by highest scores first
  similarities.collect do |game_name, score|                                         # Duke Nukem
    [ game_name, (score[:weighted] / score[:sum]) ]
  end.sort_by { |sim| sim[1] }.reverse   “games recommended to                 John: #Pac Man,Persia 3”
                                                                                       Prince of   Doom
end                                                                                  # Castle Wolfenstein
STEP 5: SIMILAR GAMES
‣ Invert users x preferences, then use the exact same algorithm as
  step 1 to find similar games based solely on people’s
  interactions (“item-based filtering”).
                # User has many ratings
                "Larry" => { "Prince of Persia" => 3.0, "Doom" => 3.5, "Castle Wolfenstein" => 1.5 },
                "Robert" => { "Prince of Persia" => 2.5, "Doom" => 3.0 },
                "Jane" => { "Prince of Persia" => 3.0, "Doom" => 4.0 },
                "John" => { "Doom" => 4.5 }

                # Game rated by many users
                "Prince of Persia" => { "Larry"=>3.0, "Robert"=>2.5, "Jane"=>3.0 },
                "Doom" => { "Larry"=>3.5, "Robert"=>3.0, "Jane"=>4.0, "John"=>4.5 },
                "Castle Wolfenstein" => { "Larry"=>1.5, "Mark"=>2.0 }



‣ Cross-compare everything. This might take a very long time for a
   large number of games…
Hint: save this data on a persistent storage will lead to very fast recommendation lookups
  (that’s what most recommendation engines save, in fact)
STEP 5: SIMILAR GAMES
# Create a dictionary of games showing which other games they
# are most similar to. This should be run often and cached for reuse
def calculate_similar_games(game_ratings)
  Hash[game_ratings.collect do |game|
    [
       game.name,
      top_matches(game, game_ratings)
    ]
  end]
end
#   Similar games:
#   Prince of Persia: Commander Keen (40%), Duke Nukem (28%), Castle Wolfenstein (22%), Doom (22%), Rise of the Triad (9%)
#   Doom: Prince of Persia (22%), Duke Nukem (18%), Rise of the Triad (16%), Castle Wolfenstein (10%), Commander Keen (5%)
#   Castle Wolfenstein: Prince of Persia (22%), Commander Keen (18%), Duke Nukem (15%), Doom (10%), Rise of the Triad (6%)
#   Rise of the Triad: Doom (16%), Duke Nukem (10%), Prince of Persia (9%), Castle Wolfenstein (6%), Commander Keen (5%)
#   Commander Keen: Prince of Persia (40%), Castle Wolfenstein (18%), Duke Nukem (14%), Rise of the Triad (5%), Doom (5%)
#   Duke Nukem: Prince of Persia (28%), Doom (18%), Castle Wolfenstein (15%), Commander Keen (14%), Rise of the Triad (10%)
STEP 5: SIMILAR GAMES
# Create a dictionary of games showing which other games they
# are most similar to. This should be run often and cached for reuse
def calculate_similar_games(game_ratings)
  Hash[game_ratings.collect do |game|
    [
       game.name,
      top_matches(game, game_ratings)
    ]
  end]
end



                                !
#   Similar games:


                                            Achievement unlocked
#   Prince of Persia: Commander Keen (40%), Duke Nukem (28%), Castle Wolfenstein (22%), Doom (22%), Rise of the Triad (9%)
#   Doom: Prince of Persia (22%), Duke Nukem (18%), Rise of the Triad (16%), Castle Wolfenstein (10%), Commander Keen (5%)
#   Castle Wolfenstein: Prince of Persia (22%), Commander Keen (18%), Duke Nukem (15%), Doom (10%), Rise of the Triad (6%)
#   Rise of the Triad: Doom (16%), Duke Nukem (10%), Prince of Persia (9%), Castle Wolfenstein (6%), Commander Keen (5%)
#                                           “Doom is similar to Daikatana and Quake”
    Commander Keen: Prince of Persia (40%), Castle Wolfenstein (18%), Duke Nukem (14%), Rise of the Triad (5%), Doom (5%)
#   Duke Nukem: Prince of Persia (28%), Doom (18%), Castle Wolfenstein (15%), Commander Keen (14%), Rise of the Triad (10%)
BONUS STAGE: FASTER RECOMMENDATIONS
               (if you’re still with me)




‣A slightly tweaked version of the algorithm
 on step 2: just use the pre-calculated
 similarities instead of doing distances in
 the loop
‣Up to 10x faster in a pure Ruby
  implementation
BONUS STAGE: FASTER RECOMMENDATIONS
                                              (if you’re still with me)

# this is very similar to the recommendations() algorithm,
# except we use a pre-calculated similar_games_matrix instead of
# calculating distances here
def recommended_games(similar_games_matrix, user)
  similarities = {}
  user.ratings.each do |game_name, user_rating|
    # Loop over pre-cached game similarities to the current game
    similar_games_matrix[game_name].each do |game, similarity|
      # Ignore if this user has already rated this similar game
      next if user.ratings[game.name]

      score_for_game = similarities[game.name] ||= { :weighted => 0, :sum => 0 }

     # Weighted sum of rating times similarity and sum of similarities
     score_for_game[:weighted] += similarity * user_rating
     score_for_game[:sum] += similarity
   end
 end

  # Divide each total score by total weighting to get an average
  # Return the rankings from highest to lowest
  similarities.collect do |game_name, score|
    [ game_name, (score[:weighted] / score[:sum]) ]
  end.sort_by { |sim| sim[1] }.reverse
end
BONUS STAGE: FASTER RECOMMENDATIONS
                                              (if you’re still with me)

# this is very similar to the recommendations() algorithm,
# except we use a pre-calculated similar_games_matrix instead of
# calculating distances here
def recommended_games(similar_games_matrix, user)
  similarities = {}
  user.ratings.each do |game_name, user_rating|
    # Loop over pre-cached game similarities to the current game
    similar_games_matrix[game_name].each do |game, similarity|
      # Ignore if this user has already rated this similar game
      next if user.ratings[game.name]

      score_for_game = similarities[game.name] ||= { :weighted => 0, :sum => 0 }

     # Weighted sum of rating times similarity and sum of similarities
     score_for_game[:weighted] += similarity * user_rating
     score_for_game[:sum] += similarity
   end
 end

  # Divide each total score by total weighting to get an average




                                !
  # Return the rankings from highest to lowest
  similarities.collect do |game_name, score|

  end.sort_by { |sim| sim[1] }.reverse
end
                                          Achievement unlocked
    [ game_name, (score[:weighted] / score[:sum]) ]


                                          EPIC WIN!
Recommendation engines
That’s awesome and stuff, but...
do these come in little boxes?
                  - that pragmatic programmer
Yes, we have RubyGems

                       top choice according to
    recommendify
                              RubyToolbox


  recommendable          “Rails-compatible”


                      doesn’t require Redis (not
acts_as_recommended
                          actively maintained)
“RECOMMENDIFY” EXAMPLE
class GameRecommender < Recommendify::Base
    # store only the top 10 neighbors per item
    max_neighbors 10

      # define an input data set "game_ratings". we'll add "user_id->game_id"
      # pairs to this input and use the jaccard coefficient to retrieve a
      # "users that liked game i1 also liked game i2" list
      input_matrix :game_ratings,
      :similarity_func => :jaccard,
      :weight => 5.0
end

recommender = GameRecommender.new

# add `order_id->product_id` interactions to the order_item_sim input
# you can add data incrementally and call RecommendedItem.process! to update
# the similarity matrix at any time.
recommender.game_ratings.add_set("John", ["Duke Nukem", "Doom", "Quake"])
recommender.game_ratings.add_set("Mary", ["Prince of Persia", "Doom"])

# Calculate all elements of the similarity matrix
recommender.process!

# retrieve similar games to "Doom"
recommender.for("Doom")
=> [ <Recommendify::Neighbor item_id:"Duke Nukem" similarity:0.23>, (...) ]
“RECOMMENDABLE” EXAMPLE
class User < ActiveRecord::Base
  recommends :books, :movies, :games
end

>>   friend.like(Movie.where(:name => "2001: A Space Odyssey").first)
>>   friend.like(Book.where(:title => "A Clockwork Orange").first)
>>   friend.like(Book.where(:title => "Brave New World").first)
>>   friend.like(Book.where(:title => "One Flew Over the Cuckoo's Next").first)
>>   user.like(Book.where(:title => "A Clockwork Orange").first)

>> user.recommended_books
=> [#<Book title: "Brave New World">, #<Book title: "One Flew Over the Cuckoo's
Nest">]

>> user.recommended_movies
=> [#<Movie name: "A Clockwork Orange">]
CLOSING REMARKS

‣Gems are cool, but you’ll have to dive into
 the code for better results
‣Crossing social filtering with other AI
 techniques (e.g.: content classification)
 produces dramatically better results
ZOMG I NEED TO KNOW MORE
Code from this presentation: https://gist.github.com/herval/4992503

Stuff to read:
‣ AI Application Programming: http://ai-app-prog.rubyforge.org
‣ Programming Collective Intelligence: http://amzn.to/XtANMl
 (great book for noobs)



Ready-to-use Gems
‣ https://github.com/paulasmuth/recommendify
‣ https://github.com/davidcelis/recommendable
‣ https://github.com/Draiken/acts_as_recommended

Serious AI algorithms in Ruby
‣ https://github.com/SergioFierens/ai4r
‣ http://web.media.mit.edu/~dustin/papers/ai_ruby_plugins/
‣ https://github.com/kanwei/algorithms
QUESTCHUNS?

More Related Content

What's hot

Main presentation
Main presentationMain presentation
Main presentation
Josh Brownsword
 
1. case study
1. case study1. case study
1. case study
jack-morton
 
High concept document for 'The Nightmare'
High concept document for 'The Nightmare'High concept document for 'The Nightmare'
High concept document for 'The Nightmare'
Sai Narayan
 
1. case study
1. case study1. case study
1. case study
William Thirlaway
 
Writing a gaming proposal
Writing a gaming proposalWriting a gaming proposal
Writing a gaming proposal
J'ette Novakovich
 
1. case study
1. case study1. case study
1. case study
jack-morton
 
Work flow
Work flowWork flow
Work flow
Deightonater
 
Task 2 research
Task 2 researchTask 2 research
Task 2 research
amybrockbank
 
Task 2 research
Task 2 researchTask 2 research
Task 2 research
amybrockbank
 
6 lecture (game elements ) 2 2-2021
6 lecture (game elements ) 2 2-20216 lecture (game elements ) 2 2-2021
6 lecture (game elements ) 2 2-2021
Durgesh Pandey
 
Green My Place Game7 Switch Search
Green My Place Game7 Switch SearchGreen My Place Game7 Switch Search
Green My Place Game7 Switch Search
Ben Cowley
 
Unit 72 my computer game user guide
Unit 72 my computer game user guideUnit 72 my computer game user guide
Unit 72 my computer game user guide
Timophy marsbar
 
FMP #3 Research Made Finally
FMP #3 Research Made FinallyFMP #3 Research Made Finally
FMP #3 Research Made Finally
JoeDuffy28
 
FMP #3 Research Completed
FMP #3 Research CompletedFMP #3 Research Completed
FMP #3 Research Completed
JoeDuffy28
 
LAFS Game Design 10 - Fun and Accessability
LAFS Game Design 10 - Fun and AccessabilityLAFS Game Design 10 - Fun and Accessability
LAFS Game Design 10 - Fun and Accessability
David Mullich
 
The Principles of Game Design
The Principles of Game DesignThe Principles of Game Design
The Principles of Game Design
InstantTechInfo
 
Vision Wall X Box - 68 of 261
Vision Wall X Box - 68  of 261Vision Wall X Box - 68  of 261
Vision Wall X Box - 68 of 261
Dr. Bobbie E. Emerald Way
 
Joys of Absence: Emotion, Emotion Display, and Interaction Tension in Video G...
Joys of Absence: Emotion, Emotion Display, and Interaction Tension in Video G...Joys of Absence: Emotion, Emotion Display, and Interaction Tension in Video G...
Joys of Absence: Emotion, Emotion Display, and Interaction Tension in Video G...
Sebastian Deterding
 
Action puzzle games
Action puzzle gamesAction puzzle games
Action puzzle games
Martin Grider
 

What's hot (19)

Main presentation
Main presentationMain presentation
Main presentation
 
1. case study
1. case study1. case study
1. case study
 
High concept document for 'The Nightmare'
High concept document for 'The Nightmare'High concept document for 'The Nightmare'
High concept document for 'The Nightmare'
 
1. case study
1. case study1. case study
1. case study
 
Writing a gaming proposal
Writing a gaming proposalWriting a gaming proposal
Writing a gaming proposal
 
1. case study
1. case study1. case study
1. case study
 
Work flow
Work flowWork flow
Work flow
 
Task 2 research
Task 2 researchTask 2 research
Task 2 research
 
Task 2 research
Task 2 researchTask 2 research
Task 2 research
 
6 lecture (game elements ) 2 2-2021
6 lecture (game elements ) 2 2-20216 lecture (game elements ) 2 2-2021
6 lecture (game elements ) 2 2-2021
 
Green My Place Game7 Switch Search
Green My Place Game7 Switch SearchGreen My Place Game7 Switch Search
Green My Place Game7 Switch Search
 
Unit 72 my computer game user guide
Unit 72 my computer game user guideUnit 72 my computer game user guide
Unit 72 my computer game user guide
 
FMP #3 Research Made Finally
FMP #3 Research Made FinallyFMP #3 Research Made Finally
FMP #3 Research Made Finally
 
FMP #3 Research Completed
FMP #3 Research CompletedFMP #3 Research Completed
FMP #3 Research Completed
 
LAFS Game Design 10 - Fun and Accessability
LAFS Game Design 10 - Fun and AccessabilityLAFS Game Design 10 - Fun and Accessability
LAFS Game Design 10 - Fun and Accessability
 
The Principles of Game Design
The Principles of Game DesignThe Principles of Game Design
The Principles of Game Design
 
Vision Wall X Box - 68 of 261
Vision Wall X Box - 68  of 261Vision Wall X Box - 68  of 261
Vision Wall X Box - 68 of 261
 
Joys of Absence: Emotion, Emotion Display, and Interaction Tension in Video G...
Joys of Absence: Emotion, Emotion Display, and Interaction Tension in Video G...Joys of Absence: Emotion, Emotion Display, and Interaction Tension in Video G...
Joys of Absence: Emotion, Emotion Display, and Interaction Tension in Video G...
 
Action puzzle games
Action puzzle gamesAction puzzle games
Action puzzle games
 

Similar to Recommendation engines

cai
caicai
Intro to programming games with clojure
Intro to programming games with clojureIntro to programming games with clojure
Intro to programming games with clojure
Juio Barros
 
Badiya haihn
Badiya haihnBadiya haihn
Badiya haihn
amitp26
 
Adversarial Search and Game-Playing .ppt
Adversarial Search and Game-Playing .pptAdversarial Search and Game-Playing .ppt
Adversarial Search and Game-Playing .ppt
0fakerxfaker0
 
Node meetup feb_20_12
Node meetup feb_20_12Node meetup feb_20_12
Node meetup feb_20_12
jafar104
 
ZaCon 4 (2012) - Game Hacking
ZaCon 4 (2012) - Game HackingZaCon 4 (2012) - Game Hacking
ZaCon 4 (2012) - Game Hacking
HypnZA
 
MINI-MAX ALGORITHM.pptx
MINI-MAX ALGORITHM.pptxMINI-MAX ALGORITHM.pptx
MINI-MAX ALGORITHM.pptx
NayanChandak1
 
Ai
AiAi
Artificial intelligence games
Artificial intelligence gamesArtificial intelligence games
Artificial intelligence games
Sujithmlamthadam
 

Similar to Recommendation engines (9)

cai
caicai
cai
 
Intro to programming games with clojure
Intro to programming games with clojureIntro to programming games with clojure
Intro to programming games with clojure
 
Badiya haihn
Badiya haihnBadiya haihn
Badiya haihn
 
Adversarial Search and Game-Playing .ppt
Adversarial Search and Game-Playing .pptAdversarial Search and Game-Playing .ppt
Adversarial Search and Game-Playing .ppt
 
Node meetup feb_20_12
Node meetup feb_20_12Node meetup feb_20_12
Node meetup feb_20_12
 
ZaCon 4 (2012) - Game Hacking
ZaCon 4 (2012) - Game HackingZaCon 4 (2012) - Game Hacking
ZaCon 4 (2012) - Game Hacking
 
MINI-MAX ALGORITHM.pptx
MINI-MAX ALGORITHM.pptxMINI-MAX ALGORITHM.pptx
MINI-MAX ALGORITHM.pptx
 
Ai
AiAi
Ai
 
Artificial intelligence games
Artificial intelligence gamesArtificial intelligence games
Artificial intelligence games
 

More from Herval Freire

Stress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year EvesStress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year Eves
Herval Freire
 
So you think JSON is cool?
So you think JSON is cool?So you think JSON is cool?
So you think JSON is cool?
Herval Freire
 
Scala na soundcloud [QCon]
Scala na soundcloud [QCon]Scala na soundcloud [QCon]
Scala na soundcloud [QCon]
Herval Freire
 
Startup Don'ts
Startup Don'tsStartup Don'ts
Startup Don'ts
Herval Freire
 
Sollipsis Publishing Catalog
Sollipsis   Publishing CatalogSollipsis   Publishing Catalog
Sollipsis Publishing Catalog
Herval Freire
 
Sollipsis Venture Forum
Sollipsis   Venture ForumSollipsis   Venture Forum
Sollipsis Venture Forum
Herval Freire
 
Sollipsis Premium Games
Sollipsis   Premium GamesSollipsis   Premium Games
Sollipsis Premium Games
Herval Freire
 
Sollipsis Catalog
Sollipsis   CatalogSollipsis   Catalog
Sollipsis Catalog
Herval Freire
 
Padrões De Projeto e Anti Patterns
Padrões De Projeto e Anti PatternsPadrões De Projeto e Anti Patterns
Padrões De Projeto e Anti Patterns
Herval Freire
 
Biofeedback - Slides CBIS
Biofeedback - Slides CBISBiofeedback - Slides CBIS
Biofeedback - Slides CBIS
Herval Freire
 
Tag Libraries
Tag LibrariesTag Libraries
Tag Libraries
Herval Freire
 
[ “Love", :Ruby ].each { |i| p i }
[ “Love", :Ruby ].each { |i| p i }[ “Love", :Ruby ].each { |i| p i }
[ “Love", :Ruby ].each { |i| p i }
Herval Freire
 

More from Herval Freire (12)

Stress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year EvesStress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year Eves
 
So you think JSON is cool?
So you think JSON is cool?So you think JSON is cool?
So you think JSON is cool?
 
Scala na soundcloud [QCon]
Scala na soundcloud [QCon]Scala na soundcloud [QCon]
Scala na soundcloud [QCon]
 
Startup Don'ts
Startup Don'tsStartup Don'ts
Startup Don'ts
 
Sollipsis Publishing Catalog
Sollipsis   Publishing CatalogSollipsis   Publishing Catalog
Sollipsis Publishing Catalog
 
Sollipsis Venture Forum
Sollipsis   Venture ForumSollipsis   Venture Forum
Sollipsis Venture Forum
 
Sollipsis Premium Games
Sollipsis   Premium GamesSollipsis   Premium Games
Sollipsis Premium Games
 
Sollipsis Catalog
Sollipsis   CatalogSollipsis   Catalog
Sollipsis Catalog
 
Padrões De Projeto e Anti Patterns
Padrões De Projeto e Anti PatternsPadrões De Projeto e Anti Patterns
Padrões De Projeto e Anti Patterns
 
Biofeedback - Slides CBIS
Biofeedback - Slides CBISBiofeedback - Slides CBIS
Biofeedback - Slides CBIS
 
Tag Libraries
Tag LibrariesTag Libraries
Tag Libraries
 
[ “Love", :Ruby ].each { |i| p i }
[ “Love", :Ruby ].each { |i| p i }[ “Love", :Ruby ].each { |i| p i }
[ “Love", :Ruby ].each { |i| p i }
 

Recommendation engines

  • 1. RECOMMENDING STUFF IN RUBY (there’s life beyond the CRUD) @herval
  • 2. ABOUT ME ‣Making software since the 90’s ‣Retired “startup guy” ‣Pivot
  • 3. ARTIFICIAL INTELLIGENCE "the design of systems that perceive their environment and take actions to maximize its chances of success"  - John McCarthy (1956)
  • 4. ARTIFICIAL INTELLIGENCE ‣End goal: achieve superhuman intelligence in a machine ‣The "golden dream" of the 80's ‣The “holy grail” of the Singularitarians
  • 7. “BUT WHAT SHOULD I USE AI FOR...?” Pick the most cost-effective combination of flights Genetic algorithms between NYC and Tashkent* Natural Language "Siri, call me sexy!" (iPhone) Processing Machine Vision Unlock your Android phone just by looking at it Neural networks "automatically detect and flag NSFW pictures" (Flickr) "I got a high fever and my knees hurt. Am I Classifiers dying?" (WebMD) * that’s in uzbekistan, if you’re asking
  • 8. But that’s all SO COMPLICATED… Isn’t there anything a bit more practical? - some pragmatic programmer
  • 9. ARTIFICIAL “INTELLIGENCE” ‣Basic techniques ‣Far from “intelligent”, still very useful “classify blogs in categories based Data clustering on their content” (Flipboard) Recommendation “people who watched The X-Files engines also liked...”
  • 11. IN A NUTSHELL... ‣Collect ratings/preferences from users (eg.: reviews, likes) ‣Compare people to other people based on what they like ‣Offer stuff similar people already liked
  • 12. STEP 1: BUILD A DATASET Boil preferences down to numbers* ‣ User liked a game = 0.5 point ‣ User purchased a game = 1 point ‣ User played a game = 2 points ‣ User reviewed a game = 3 points ‣ User liked, purchased, played and reviewed a game = 6.5 points * Resist the urge to use 1-5 star ratings
  • 13. STEP 1: BUILD A DATASET [ Entity.new('Lisa', { 'Prince of Persia' => 2.5, 'Doom' => 3.5, 'Castle Wolfenstein' => 3.0, 'Rise of the Triad' => 3.5, 'Commander Keen' => 2.5, 'Duke Nukem' => 3.0 }), Entity.new('Larry', { 'Prince of Persia' => 3.0, 'Doom' => 3.5, 'Castle Wolfenstein' => 1.5, 'Rise of the Triad' => 5.0, 'Duke Nukem' => 3.0, 'Commander Keen' => 3.5 }), Entity.new('Robert', { 'Prince of Persia' => 2.5, 'Doom' => 3.0, 'Rise of the Triad' => 3.5, 'Duke Nukem' => 4.0 }), Entity.new('Claudia', { 'Doom' => 3.5, 'Castle Wolfenstein' => 3.0, 'Duke Nukem' => 4.5, 'Rise of the Triad' => 4.0, 'Commander Keen' => 2.5 }), Entity.new('Mark', { 'Prince of Persia' => 3.0, 'Doom' => 4.0, 'Castle Wolfenstein' => 2.0, 'Rise of the Triad' => 3.0, 'Duke Nukem' => 3.0, 'Commander Keen' => 2.0 }), Entity.new('Jane', { 'Prince of Persia' => 3.0, 'Doom' => 4.0, 'Duke Nukem' => 3.0, 'Rise of the Triad' => 5.0, 'Commander Keen' => 3.5 }), Entity.new('John', { 'Doom' => 4.5, 'Commander Keen' => 1.0, 'Rise of the Triad' => 4.0 }) ]
  • 14. STEP 2: COMPARE PEOPLE ‣ Compare each person to one another, generating a similarity score ‣ Euclidian distance between each pair of ratings: ‣ Many other distance calculation algorithms exist: linear distance, jaccard, manhattan, tanimoto, etc ‣ Better algorithm = more interesting results
  • 15. STEP 2: COMPARE PEOPLE # Returns the euclidian distance between person1 and person2 def distance(person1, person2) rated_by_both = person1.ratings.select { |game| person2.ratings[game] } # if they have no ratings in common, return 0 return 0.0 if rated_by_both.empty? # add up the squares of all the differences sum_of_squares = 0.0 person1.ratings.collect do |game, score| person2_score = person2.ratings[game] next if !person2_score sum_of_squares += ((score - person2_score) ** 2) end 1.0 / (1.0 + sum_of_squares) end
  • 16. STEP 3: SIMILAR USERS Grab the n users with the highest level of similarity (in other words, the people closest to you, according to the distance algorithm from step 2)
  • 17. STEP 3: SIMILAR USERS # Returns the 5 best matching people (most similar preferences) def top_matches(person, all_ratings) other_people = all_ratings.select { |person2| person2.name != person.name } other_people.collect do |other_person| [ other_person, distance(person, other_person) # change this to use other algorithms ] end.sort_by { |sim| sim[1] }.reverse[0..5] end # People similar to John: # Mark (30% match) # Robert (28% match) # Claudia (23% match) # Lisa (22% match) # Jane (11% match) # Larry (10% match)
  • 18. STEP 3: SIMILAR USERS # Returns the 5 best matching people (most similar preferences) def top_matches(person, all_ratings) other_people = all_ratings.select { |person2| person2.name != person.name } other_people.collect do |other_person| [ other_person, distance(person, other_person) # change this to use other algorithms ] end.sort_by { |sim| sim[1] }.reverse[0..5] end # People similar to John: # Mark (30% match) # Robert (28% match) ! # Claudia (23% match) Achievement unlocked # Lisa (22% match) # Jane (11% match) # Larry (10% match) “people you should follow: John, Mary”
  • 19. STEP 4: RECOMMENDING A GAME ‣Grab each user’s ratings to games you haven’t rated ‣Multiply that by how similar the other user is to you(opinions from people similar to you weight more) ‣Grab the highest numbers
  • 20. STEP 4: RECOMMENDING A GAME # Gets recommendations for a person by using a weighted average of every other user's ratings def recommendations(person, other_people) similarities = {} other_people.each do |other_person| similarity = distance(person, other_person) # ignore scores of zero or lower next if similarity <= 0 other_person.ratings.each do |other_person_game, other_person_score| # only score what I haven't rated yet next if person.ratings[other_person_game] similarity_for_game = similarities[other_person_game] ||= { :weighted => 0, :sum => 0 } # sum of weighted rating times similarity and total similarity similarity_for_game[:weighted] += other_person.ratings[other_person_game] * similarity similarity_for_game[:sum] += similarity end end # normalize list and sort by highest scores first # Recommended games for John: similarities.collect do |game_name, score| # Duke Nukem [ game_name, (score[:weighted] / score[:sum]) ] end.sort_by { |sim| sim[1] }.reverse # Prince of Persia end # Castle Wolfenstein
  • 21. STEP 4: RECOMMENDING A GAME # Gets recommendations for a person by using a weighted average of every other user's ratings def recommendations(person, other_people) similarities = {} other_people.each do |other_person| similarity = distance(person, other_person) # ignore scores of zero or lower next if similarity <= 0 other_person.ratings.each do |other_person_game, other_person_score| # only score what I haven't rated yet next if person.ratings[other_person_game] similarity_for_game = similarities[other_person_game] ||= { :weighted => 0, :sum => 0 } # sum of weighted rating times similarity and total similarity similarity_for_game[:weighted] += other_person.ratings[other_person_game] * similarity similarity_for_game[:sum] += similarity end end ! Achievement unlocked # Recommended games for John: # normalize list and sort by highest scores first similarities.collect do |game_name, score| # Duke Nukem [ game_name, (score[:weighted] / score[:sum]) ] end.sort_by { |sim| sim[1] }.reverse “games recommended to John: #Pac Man,Persia 3” Prince of Doom end # Castle Wolfenstein
  • 22. STEP 5: SIMILAR GAMES ‣ Invert users x preferences, then use the exact same algorithm as step 1 to find similar games based solely on people’s interactions (“item-based filtering”). # User has many ratings "Larry" => { "Prince of Persia" => 3.0, "Doom" => 3.5, "Castle Wolfenstein" => 1.5 }, "Robert" => { "Prince of Persia" => 2.5, "Doom" => 3.0 }, "Jane" => { "Prince of Persia" => 3.0, "Doom" => 4.0 }, "John" => { "Doom" => 4.5 } # Game rated by many users "Prince of Persia" => { "Larry"=>3.0, "Robert"=>2.5, "Jane"=>3.0 }, "Doom" => { "Larry"=>3.5, "Robert"=>3.0, "Jane"=>4.0, "John"=>4.5 }, "Castle Wolfenstein" => { "Larry"=>1.5, "Mark"=>2.0 } ‣ Cross-compare everything. This might take a very long time for a large number of games… Hint: save this data on a persistent storage will lead to very fast recommendation lookups (that’s what most recommendation engines save, in fact)
  • 23. STEP 5: SIMILAR GAMES # Create a dictionary of games showing which other games they # are most similar to. This should be run often and cached for reuse def calculate_similar_games(game_ratings) Hash[game_ratings.collect do |game| [ game.name, top_matches(game, game_ratings) ] end] end # Similar games: # Prince of Persia: Commander Keen (40%), Duke Nukem (28%), Castle Wolfenstein (22%), Doom (22%), Rise of the Triad (9%) # Doom: Prince of Persia (22%), Duke Nukem (18%), Rise of the Triad (16%), Castle Wolfenstein (10%), Commander Keen (5%) # Castle Wolfenstein: Prince of Persia (22%), Commander Keen (18%), Duke Nukem (15%), Doom (10%), Rise of the Triad (6%) # Rise of the Triad: Doom (16%), Duke Nukem (10%), Prince of Persia (9%), Castle Wolfenstein (6%), Commander Keen (5%) # Commander Keen: Prince of Persia (40%), Castle Wolfenstein (18%), Duke Nukem (14%), Rise of the Triad (5%), Doom (5%) # Duke Nukem: Prince of Persia (28%), Doom (18%), Castle Wolfenstein (15%), Commander Keen (14%), Rise of the Triad (10%)
  • 24. STEP 5: SIMILAR GAMES # Create a dictionary of games showing which other games they # are most similar to. This should be run often and cached for reuse def calculate_similar_games(game_ratings) Hash[game_ratings.collect do |game| [ game.name, top_matches(game, game_ratings) ] end] end ! # Similar games: Achievement unlocked # Prince of Persia: Commander Keen (40%), Duke Nukem (28%), Castle Wolfenstein (22%), Doom (22%), Rise of the Triad (9%) # Doom: Prince of Persia (22%), Duke Nukem (18%), Rise of the Triad (16%), Castle Wolfenstein (10%), Commander Keen (5%) # Castle Wolfenstein: Prince of Persia (22%), Commander Keen (18%), Duke Nukem (15%), Doom (10%), Rise of the Triad (6%) # Rise of the Triad: Doom (16%), Duke Nukem (10%), Prince of Persia (9%), Castle Wolfenstein (6%), Commander Keen (5%) # “Doom is similar to Daikatana and Quake” Commander Keen: Prince of Persia (40%), Castle Wolfenstein (18%), Duke Nukem (14%), Rise of the Triad (5%), Doom (5%) # Duke Nukem: Prince of Persia (28%), Doom (18%), Castle Wolfenstein (15%), Commander Keen (14%), Rise of the Triad (10%)
  • 25. BONUS STAGE: FASTER RECOMMENDATIONS (if you’re still with me) ‣A slightly tweaked version of the algorithm on step 2: just use the pre-calculated similarities instead of doing distances in the loop ‣Up to 10x faster in a pure Ruby implementation
  • 26. BONUS STAGE: FASTER RECOMMENDATIONS (if you’re still with me) # this is very similar to the recommendations() algorithm, # except we use a pre-calculated similar_games_matrix instead of # calculating distances here def recommended_games(similar_games_matrix, user) similarities = {} user.ratings.each do |game_name, user_rating| # Loop over pre-cached game similarities to the current game similar_games_matrix[game_name].each do |game, similarity| # Ignore if this user has already rated this similar game next if user.ratings[game.name] score_for_game = similarities[game.name] ||= { :weighted => 0, :sum => 0 } # Weighted sum of rating times similarity and sum of similarities score_for_game[:weighted] += similarity * user_rating score_for_game[:sum] += similarity end end # Divide each total score by total weighting to get an average # Return the rankings from highest to lowest similarities.collect do |game_name, score| [ game_name, (score[:weighted] / score[:sum]) ] end.sort_by { |sim| sim[1] }.reverse end
  • 27. BONUS STAGE: FASTER RECOMMENDATIONS (if you’re still with me) # this is very similar to the recommendations() algorithm, # except we use a pre-calculated similar_games_matrix instead of # calculating distances here def recommended_games(similar_games_matrix, user) similarities = {} user.ratings.each do |game_name, user_rating| # Loop over pre-cached game similarities to the current game similar_games_matrix[game_name].each do |game, similarity| # Ignore if this user has already rated this similar game next if user.ratings[game.name] score_for_game = similarities[game.name] ||= { :weighted => 0, :sum => 0 } # Weighted sum of rating times similarity and sum of similarities score_for_game[:weighted] += similarity * user_rating score_for_game[:sum] += similarity end end # Divide each total score by total weighting to get an average ! # Return the rankings from highest to lowest similarities.collect do |game_name, score| end.sort_by { |sim| sim[1] }.reverse end Achievement unlocked [ game_name, (score[:weighted] / score[:sum]) ] EPIC WIN!
  • 29. That’s awesome and stuff, but... do these come in little boxes? - that pragmatic programmer
  • 30. Yes, we have RubyGems top choice according to recommendify RubyToolbox recommendable “Rails-compatible” doesn’t require Redis (not acts_as_recommended actively maintained)
  • 31. “RECOMMENDIFY” EXAMPLE class GameRecommender < Recommendify::Base # store only the top 10 neighbors per item max_neighbors 10 # define an input data set "game_ratings". we'll add "user_id->game_id" # pairs to this input and use the jaccard coefficient to retrieve a # "users that liked game i1 also liked game i2" list input_matrix :game_ratings, :similarity_func => :jaccard, :weight => 5.0 end recommender = GameRecommender.new # add `order_id->product_id` interactions to the order_item_sim input # you can add data incrementally and call RecommendedItem.process! to update # the similarity matrix at any time. recommender.game_ratings.add_set("John", ["Duke Nukem", "Doom", "Quake"]) recommender.game_ratings.add_set("Mary", ["Prince of Persia", "Doom"]) # Calculate all elements of the similarity matrix recommender.process! # retrieve similar games to "Doom" recommender.for("Doom") => [ <Recommendify::Neighbor item_id:"Duke Nukem" similarity:0.23>, (...) ]
  • 32. “RECOMMENDABLE” EXAMPLE class User < ActiveRecord::Base recommends :books, :movies, :games end >> friend.like(Movie.where(:name => "2001: A Space Odyssey").first) >> friend.like(Book.where(:title => "A Clockwork Orange").first) >> friend.like(Book.where(:title => "Brave New World").first) >> friend.like(Book.where(:title => "One Flew Over the Cuckoo's Next").first) >> user.like(Book.where(:title => "A Clockwork Orange").first) >> user.recommended_books => [#<Book title: "Brave New World">, #<Book title: "One Flew Over the Cuckoo's Nest">] >> user.recommended_movies => [#<Movie name: "A Clockwork Orange">]
  • 33. CLOSING REMARKS ‣Gems are cool, but you’ll have to dive into the code for better results ‣Crossing social filtering with other AI techniques (e.g.: content classification) produces dramatically better results
  • 34. ZOMG I NEED TO KNOW MORE Code from this presentation: https://gist.github.com/herval/4992503 Stuff to read: ‣ AI Application Programming: http://ai-app-prog.rubyforge.org ‣ Programming Collective Intelligence: http://amzn.to/XtANMl (great book for noobs) Ready-to-use Gems ‣ https://github.com/paulasmuth/recommendify ‣ https://github.com/davidcelis/recommendable ‣ https://github.com/Draiken/acts_as_recommended Serious AI algorithms in Ruby ‣ https://github.com/SergioFierens/ai4r ‣ http://web.media.mit.edu/~dustin/papers/ai_ruby_plugins/ ‣ https://github.com/kanwei/algorithms