Menno van der Sman
       Lead Developer


   Coen Stevens
   Recommendation Engineer
Mission:
Discover software & games
Updates
Searching




            powered by
Recommendations




   Codename: Ludwig
How to get started?



Research                                              Mathemagicians
 Amazon, Netflix etc
          ...
Challenges
when building your first recommender system
Data
                     what do we have?

  Usage (implicit)         vs.      Ratings (explicit)

• Noisy               ...
Item-Based Collaborative Filtering
             User software usage matrix
                       Software items




     ...
Classified user software usage matrix (1, 2, 3)
                    Software items




            3   2            2      ...
How do we predict the probability that I would like to use GMail?
                              Software items




       ...
Calculate the similarities between Gmail and the other software items.
                                            Softwar...
Calculate the similarities between Gmail and the other software items.
    Gmail similarities




              0.6       ...
Calculate the predicted value for Gmail
Gmail similarities   User usage




          0.6               3

          0.8  ...
Calculate the predicted value for Gmail
Gmail similarities   User usage



                                      We take o...
Calculate all unknown values and
show the Top-N recommendations to each user
                    Software items




      ...
Metrics
                  measure for success


                     Space complexity: O(m + Kn)


Computational complexit...
Evaluating the approach


Maximize
           (      performance

                      cost      )
      This is easy wit...
Why EC2?

Low cost

              Flexibility

Ease of use
Infrastructure
Wakoopa                      EC2
              checkout
Repository
                           Computing
App...
1 evening, 3 speakers, 100 developers
   Pre-register on www.recked.org
Wakoopa Recommendation Engine on AWS
Wakoopa Recommendation Engine on AWS
Upcoming SlideShare
Loading in …5
×

Wakoopa Recommendation Engine on AWS

3,259 views

Published on

The slides of the talk I gave at AWS start-up event in Amsterdam.

Published in: Technology, Business

Wakoopa Recommendation Engine on AWS

  1. 1. Menno van der Sman Lead Developer Coen Stevens Recommendation Engineer
  2. 2. Mission: Discover software & games
  3. 3. Updates
  4. 4. Searching powered by
  5. 5. Recommendations Codename: Ludwig
  6. 6. How to get started? Research Mathemagicians Amazon, Netflix etc Peter Tegelaar & Coen Stevens Ludwig created recommender system in ruby running on EC2
  7. 7. Challenges when building your first recommender system
  8. 8. Data what do we have? Usage (implicit) vs. Ratings (explicit) • Noisy • Accurate • Only positive • Positive and negative feedback feedback • Easy to collect • Hard to collect
  9. 9. Item-Based Collaborative Filtering User software usage matrix Software items 220 90 180 22 280 12 42 80 Users 175 210 210 45 165 14 35 195 13 25 100 50 185 35 190 60 65 185
  10. 10. Classified user software usage matrix (1, 2, 3) Software items 3 2 2 2 3 2 1 2 Users 3 3 2 3 2 1 2 2 3 2 3 2 2 2 3 1 2 3
  11. 11. How do we predict the probability that I would like to use GMail? Software items 3 2 2 2 3 2 1 2 Users 3 3 ? 2 3 2 1 2 2 3 2 3 2 2 2 3 1 2 3
  12. 12. Calculate the similarities between Gmail and the other software items. Software items 3 2 2 2 3 2 1 2 Users 3 3 2 3 2 1 2 2 3 2 3 2 2 2 3 1 2 3 Cosine Similarity(Firefox, Gmail)
  13. 13. Calculate the similarities between Gmail and the other software items. Gmail similarities 0.6 3 2 2 2 0.8 3 2 1 2 1.0 3 3 2 3 0.4 2 1 2 2 3 2 0.4 3 2 2 2 3 0.3 1 2 3 0.3
  14. 14. Calculate the predicted value for Gmail Gmail similarities User usage 0.6 3 0.8 3 1.0 0.4 2 0.4 0.3 3 0.3
  15. 15. Calculate the predicted value for Gmail Gmail similarities User usage We take only the ‘K’ most similar items (say 2) 0.6 3 0.8 3 1.0 0.4 2 0.4 0.3 3 0.6·3 + 0.8·3 = 1.5 0.6 + 0.8 + 0.4 + 0.4 + 0.3 + 0.3 0.3
  16. 16. Calculate all unknown values and show the Top-N recommendations to each user Software items 3 2 ? 2 ? ? 2 3 2 1 ? 2 ? ? Users 3 3 ? 2 ? 3 ? 2 1 2 2 3 2 ? ? 3 2 2 ? 2 3 ? 1 ? 2 ? ? 3
  17. 17. Metrics measure for success Space complexity: O(m + Kn) Computational complexity: O(m + n²) Performance: Root Mean Squared Error
  18. 18. Evaluating the approach Maximize ( performance cost ) This is easy with EC2
  19. 19. Why EC2? Low cost Flexibility Ease of use
  20. 20. Infrastructure Wakoopa EC2 checkout Repository Computing Application power Database ssh tunnel Big Database
  21. 21. 1 evening, 3 speakers, 100 developers Pre-register on www.recked.org

×