The Problem
                Strategies
   Some Funny New Science




        The Netflix Prize:
yet another million dollar ...
The Problem
                               Strategies
                  Some Funny New Science



7 + 1 Million Dollar Pro...
The Problem
                               Strategies
                  Some Funny New Science



7 + 1 Million Dollar Pro...
The Problem
                               Strategies
                  Some Funny New Science



7 + 1 Million Dollar Pro...
The Problem
                               Strategies
                  Some Funny New Science



7 + 1 Million Dollar Pro...
The Problem
                                Strategies
                   Some Funny New Science



7 + 1 Million Dollar P...
The Problem
                               Strategies
                  Some Funny New Science



7 + 1 Million Dollar Pro...
The Problem
                               Strategies
                  Some Funny New Science



7 + 1 Million Dollar Pro...
The Problem
                               Strategies
                  Some Funny New Science



7 + 1 Million Dollar Pro...
The Problem
                               Strategies
                  Some Funny New Science



7 + 1 Million Dollar Pro...
The Problem
                                Strategies
                   Some Funny New Science



7 + 1 Million Dollar P...
The Problem
                                Strategies
                   Some Funny New Science



7 + 1 Million Dollar P...
The Problem
                                Strategies
                   Some Funny New Science



7 + 1 Million Dollar P...
The Problem
                                Strategies
                   Some Funny New Science



7 + 1 Million Dollar P...
The Problem
                               Strategies
                  Some Funny New Science



7 + 1 Million Dollar Pro...
The Problem
                               Strategies
                  Some Funny New Science



7 + 1 Million Dollar Pro...
The Problem
                                Strategies
                   Some Funny New Science



7 + 1 Million Dollar P...
The Problem
                                   Strategies
                      Some Funny New Science



7 + 1 Million Do...
The Problem
                                Strategies
                   Some Funny New Science



7 + 1 Million Dollar P...
The Problem
                                Strategies
                   Some Funny New Science



7 + 1 Million Dollar P...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                            Rules
                               Strategies
                  ...
The Problem
                                             Rules
                                Strategies
                ...
The Problem
                                             Rules
                                Strategies
                ...
The Problem
                                             Rules
                                Strategies
                ...
The Problem
                                             Rules
                                Strategies
                ...
The Problem
                                             Rules
                                Strategies
                ...
The Problem
                                          Rules
                             Strategies
                      ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                          Rules
                             Strategies
                      ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                            Rules
                               Strategies
                  ...
The Problem
                                            Rules
                               Strategies
                  ...
The Problem
                                            Rules
                               Strategies
                  ...
The Problem
                                            Rules
                               Strategies
                  ...
The Problem
                                          Rules
                             Strategies
                      ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                            Rules
                               Strategies
                  ...
The Problem
                                            Rules
                               Strategies
                  ...
The Problem
                                            Rules
                               Strategies
                  ...
The Problem
                                         Rules
                            Strategies
                        ...
The Problem
                                          Rules
                             Strategies
                      ...
The Problem
                                          Rules
                             Strategies
                      ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                           Rules
                              Strategies
                    ...
The Problem
                                            Rules
                               Strategies
                  ...
Practical issues
                           The Problem
                                          Regressions
            ...
Practical issues
                           The Problem
                                          Regressions
            ...
Practical issues
                           The Problem
                                          Regressions
            ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                           The Problem
                                          Regressions
            ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                           The Problem
                                          Regressions
            ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                            The Problem
                                           Regressions
          ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                               The Problem
                                              Regressions
    ...
Practical issues
                               The Problem
                                              Regressions
    ...
Practical issues
                               The Problem
                                              Regressions
    ...
Practical issues
                               The Problem
                                              Regressions
    ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                                The Problem
                                               Regressions
  ...
Practical issues
                                The Problem
                                               Regressions
  ...
Practical issues
                                The Problem
                                               Regressions
  ...
Practical issues
                                The Problem
                                               Regressions
  ...
Practical issues
                             The Problem
                                            Regressions
        ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
Practical issues
                              The Problem
                                             Regressions
      ...
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
The Netflix prize: yet another million dollar problem
Upcoming SlideShare
Loading in...5
×

The Netflix prize: yet another million dollar problem

2,213

Published on

Slides from an introductory talk on machine learning, and why mathematicians should take interest in it.

This is a very basic introduction, for math undergraduates & other curious minds.

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,213
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

The Netflix prize: yet another million dollar problem

  1. 1. The Problem Strategies Some Funny New Science The Netflix Prize: yet another million dollar problem David Bessis Ecole Normale Sup´rieure, 27/01/2010 e David Bessis The Netflix Prize: yet another million dollar problem
  2. 2. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: David Bessis The Netflix Prize: yet another million dollar problem
  3. 3. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. David Bessis The Netflix Prize: yet another million dollar problem
  4. 4. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. David Bessis The Netflix Prize: yet another million dollar problem
  5. 5. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Solutions must David Bessis The Netflix Prize: yet another million dollar problem
  6. 6. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Solutions must ”be published in a refereed mathematics publication of worldwide repute” David Bessis The Netflix Prize: yet another million dollar problem
  7. 7. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Solutions must ”be published in a refereed mathematics publication of worldwide repute” ”have general acceptance in the mathematics community two years after” David Bessis The Netflix Prize: yet another million dollar problem
  8. 8. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. David Bessis The Netflix Prize: yet another million dollar problem
  9. 9. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e David Bessis The Netflix Prize: yet another million dollar problem
  10. 10. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. David Bessis The Netflix Prize: yet another million dollar problem
  11. 11. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. Netflix Prize: David Bessis The Netflix Prize: yet another million dollar problem
  12. 12. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. Netflix Prize: Funded in 2006 by the DVD rental company Netflix. David Bessis The Netflix Prize: yet another million dollar problem
  13. 13. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. Netflix Prize: Funded in 2006 by the DVD rental company Netflix. A problem in Applied Mathematics. David Bessis The Netflix Prize: yet another million dollar problem
  14. 14. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. Netflix Prize: Funded in 2006 by the DVD rental company Netflix. A problem in Applied Mathematics Computer Science. David Bessis The Netflix Prize: yet another million dollar problem
  15. 15. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. Netflix Prize: Funded in 2006 by the DVD rental company Netflix. A problem in Applied Mathematics Computer Science Psychology. David Bessis The Netflix Prize: yet another million dollar problem
  16. 16. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. Netflix Prize: Funded in 2006 by the DVD rental company Netflix. A problem in Applied Mathematics Computer Science Psychology (do we really care?) David Bessis The Netflix Prize: yet another million dollar problem
  17. 17. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. Netflix Prize: Funded in 2006 by the DVD rental company Netflix. A problem in Some Funny New Science. David Bessis The Netflix Prize: yet another million dollar problem
  18. 18. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. Netflix Prize: Funded in 2006 by the DVD rental company Netflix. A problem in Some Funny New Science. Clear rules. David Bessis The Netflix Prize: yet another million dollar problem
  19. 19. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. Netflix Prize: Funded in 2006 by the DVD rental company Netflix. A problem in Some Funny New Science. Reasonably clear rules. David Bessis The Netflix Prize: yet another million dollar problem
  20. 20. The Problem Strategies Some Funny New Science 7 + 1 Million Dollar Problems Millenium Prize Problems: Funded in 2000 by the Clay Mathematical Institute. Seven classical open problems in Mathematics. Fuzzy rules. The Poincar´ conjecture was solved by Perelman in 2003. e No award yet. Netflix Prize: Funded in 2006 by the DVD rental company Netflix. A problem in Some Funny New Science. Reasonably clear rules. Prize awarded in September 2009. David Bessis The Netflix Prize: yet another million dollar problem
  21. 21. The Problem Rules Strategies Competition Some Funny New Science Context Netflix has an “all-you-can-eat” pricing model. David Bessis The Netflix Prize: yet another million dollar problem
  22. 22. The Problem Rules Strategies Competition Some Funny New Science Context Netflix has an “all-you-can-eat” pricing model. They need their users to watch a lot of movies. David Bessis The Netflix Prize: yet another million dollar problem
  23. 23. The Problem Rules Strategies Competition Some Funny New Science Context Netflix has an “all-you-can-eat” pricing model. They need their users to watch a lot of movies. Beyond a few obvious choices, people don’t know what they want to watch. David Bessis The Netflix Prize: yet another million dollar problem
  24. 24. The Problem Rules Strategies Competition Some Funny New Science Context Netflix has an “all-you-can-eat” pricing model. They need their users to watch a lot of movies. Beyond a few obvious choices, people don’t know what they want to watch. Collaborative filtering: recommending products based on prior evaluations by other users (just like Amazon does). David Bessis The Netflix Prize: yet another million dollar problem
  25. 25. The Problem Rules Strategies Competition Some Funny New Science Context Netflix has an “all-you-can-eat” pricing model. They need their users to watch a lot of movies. Beyond a few obvious choices, people don’t know what they want to watch. Collaborative filtering: recommending products based on prior evaluations by other users (just like Amazon does). The Netflix prize is a collaborative filtering competition: David Bessis The Netflix Prize: yet another million dollar problem
  26. 26. The Problem Rules Strategies Competition Some Funny New Science Context Netflix has an “all-you-can-eat” pricing model. They need their users to watch a lot of movies. Beyond a few obvious choices, people don’t know what they want to watch. Collaborative filtering: recommending products based on prior evaluations by other users (just like Amazon does). The Netflix prize is a collaborative filtering competition: Based on a huge dataset of actual ratings by Netflix users. David Bessis The Netflix Prize: yet another million dollar problem
  27. 27. The Problem Rules Strategies Competition Some Funny New Science Context Netflix has an “all-you-can-eat” pricing model. They need their users to watch a lot of movies. Beyond a few obvious choices, people don’t know what they want to watch. Collaborative filtering: recommending products based on prior evaluations by other users (just like Amazon does). The Netflix prize is a collaborative filtering competition: Based on a huge dataset of actual ratings by Netflix users. Open to almost everyone. David Bessis The Netflix Prize: yet another million dollar problem
  28. 28. The Problem Rules Strategies Competition Some Funny New Science Context Netflix has an “all-you-can-eat” pricing model. They need their users to watch a lot of movies. Beyond a few obvious choices, people don’t know what they want to watch. Collaborative filtering: recommending products based on prior evaluations by other users (just like Amazon does). The Netflix prize is a collaborative filtering competition: Based on a huge dataset of actual ratings by Netflix users. Open to almost everyone. Endowed with a $1.000.000 prize. David Bessis The Netflix Prize: yet another million dollar problem
  29. 29. The Problem Rules Strategies Competition Some Funny New Science The Dataset The user space U consists of 480 189 users (identified by a meaningless non-sequential integral id). David Bessis The Netflix Prize: yet another million dollar problem
  30. 30. The Problem Rules Strategies Competition Some Funny New Science The Dataset The user space U consists of 480 189 users (identified by a meaningless non-sequential integral id). The movie space M consists of 17 770 movies (identified by integers 1, . . . , 17 770, and the associated list of titles and release years is provided – this data is meaningful and minable). David Bessis The Netflix Prize: yet another million dollar problem
  31. 31. The Problem Rules Strategies Competition Some Funny New Science The Dataset The user space U consists of 480 189 users (identified by a meaningless non-sequential integral id). The movie space M consists of 17 770 movies (identified by integers 1, . . . , 17 770, and the associated list of titles and release years is provided – this data is meaningful and minable). The date space D spans the period Oct. 1998 – Dec. 2005 (extremely meaningful data; no time of day is provided). David Bessis The Netflix Prize: yet another million dollar problem
  32. 32. The Problem Rules Strategies Competition Some Funny New Science The Dataset The user space U consists of 480 189 users (identified by a meaningless non-sequential integral id). The movie space M consists of 17 770 movies (identified by integers 1, . . . , 17 770, and the associated list of titles and release years is provided – this data is meaningful and minable). The date space D spans the period Oct. 1998 – Dec. 2005 (extremely meaningful data; no time of day is provided). The rating space R is {1, 2, 3, 4, 5} (”stars”). David Bessis The Netflix Prize: yet another million dollar problem
  33. 33. The Problem Rules Strategies Competition Some Funny New Science The Dataset The user space U consists of 480 189 users (identified by a meaningless non-sequential integral id). The movie space M consists of 17 770 movies (identified by integers 1, . . . , 17 770, and the associated list of titles and release years is provided – this data is meaningful and minable). The date space D spans the period Oct. 1998 – Dec. 2005 (extremely meaningful data; no time of day is provided). The rating space R is {1, 2, 3, 4, 5} (”stars”). The training dataset T contains 100 480 507 quadruples (u, m, d, r ) ∈ U × M × D × R. David Bessis The Netflix Prize: yet another million dollar problem
  34. 34. The Problem Rules Strategies Competition Some Funny New Science The Dataset The user space U consists of 480 189 users (identified by a meaningless non-sequential integral id). The movie space M consists of 17 770 movies (identified by integers 1, . . . , 17 770, and the associated list of titles and release years is provided – this data is meaningful and minable). The date space D spans the period Oct. 1998 – Dec. 2005 (extremely meaningful data; no time of day is provided). The rating space R is {1, 2, 3, 4, 5} (”stars”). The training dataset T contains 100 480 507 quadruples (u, m, d, r ) ∈ U × M × D × R. The qualifying dataset Q contains 2 817 131 triples (u, m, d) ∈ U × M × D. David Bessis The Netflix Prize: yet another million dollar problem
  35. 35. The Problem Rules Strategies Competition Some Funny New Science The Challenge Open to everyone David Bessis The Netflix Prize: yet another million dollar problem
  36. 36. The Problem Rules Strategies Competition Some Funny New Science The Challenge Open to everyone except Netflix employees and their relatives David Bessis The Netflix Prize: yet another million dollar problem
  37. 37. The Problem Rules Strategies Competition Some Funny New Science The Challenge Open to everyone except Netflix employees and their relatives and residents of Cuba, Iran, Syria, North Korea, Myanmar and Sudan. David Bessis The Netflix Prize: yet another million dollar problem
  38. 38. The Problem Rules Strategies Competition Some Funny New Science The Challenge Open to everyone except Netflix employees and their relatives and residents of Cuba, Iran, Syria, North Korea, Myanmar and Sudan. Participants can join efforts in teams. David Bessis The Netflix Prize: yet another million dollar problem
  39. 39. The Problem Rules Strategies Competition Some Funny New Science The Challenge Open to everyone except Netflix employees and their relatives and residents of Cuba, Iran, Syria, North Korea, Myanmar and Sudan. Participants can join efforts in teams. They can upload their predictions up to once a day. David Bessis The Netflix Prize: yet another million dollar problem
  40. 40. The Problem Rules Strategies Competition Some Funny New Science The Challenge Open to everyone except Netflix employees and their relatives and residents of Cuba, Iran, Syria, North Korea, Myanmar and Sudan. Participants can join efforts in teams. They can upload their predictions up to once a day. Predictions are maps from the qualifying set Q to the interval [1, 5]. David Bessis The Netflix Prize: yet another million dollar problem
  41. 41. The Problem Rules Strategies Competition Some Funny New Science The Challenge Open to everyone except Netflix employees and their relatives and residents of Cuba, Iran, Syria, North Korea, Myanmar and Sudan. Participants can join efforts in teams. They can upload their predictions up to once a day. Predictions are maps from the qualifying set Q to the interval [1, 5]. The metric used to benchmark predictions is the RMSE (”root of mean square error”) 1 RMSE = |predicted rating for q − actual rating for q|2 |Q| q∈Q David Bessis The Netflix Prize: yet another million dollar problem
  42. 42. The Problem Rules Strategies Competition Some Funny New Science Typical RMSEs Theoretically, the RMSE cannot be greater than 2. David Bessis The Netflix Prize: yet another million dollar problem
  43. 43. The Problem Rules Strategies Competition Some Funny New Science Typical RMSEs Theoretically, the RMSE cannot be greater than 2. Users tend to view and rate movies they like, so they typically give 3, 4 or 5 stars rather than 1 or 2 (the above upper bound is unrealistically pessimistic). David Bessis The Netflix Prize: yet another million dollar problem
  44. 44. The Problem Rules Strategies Competition Some Funny New Science Typical RMSEs Theoretically, the RMSE cannot be greater than 2. Users tend to view and rate movies they like, so they typically give 3, 4 or 5 stars rather than 1 or 2 (the above upper bound is unrealistically pessimistic). A basic prediction consists of mapping a triple (u, m, d) to the average rating obtained by the movie m. David Bessis The Netflix Prize: yet another million dollar problem
  45. 45. The Problem Rules Strategies Competition Some Funny New Science Typical RMSEs Theoretically, the RMSE cannot be greater than 2. Users tend to view and rate movies they like, so they typically give 3, 4 or 5 stars rather than 1 or 2 (the above upper bound is unrealistically pessimistic). A basic prediction consists of mapping a triple (u, m, d) to the average rating obtained by the movie m. It achieves 1.0540. David Bessis The Netflix Prize: yet another million dollar problem
  46. 46. The Problem Rules Strategies Competition Some Funny New Science Typical RMSEs Theoretically, the RMSE cannot be greater than 2. Users tend to view and rate movies they like, so they typically give 3, 4 or 5 stars rather than 1 or 2 (the above upper bound is unrealistically pessimistic). A basic prediction consists of mapping a triple (u, m, d) to the average rating obtained by the movie m. It achieves 1.0540. At the beginning of the Challenge, Netflix’s in-house prediction system Cinematch achieved 0.9514 (roughly a 10% improvement). David Bessis The Netflix Prize: yet another million dollar problem
  47. 47. The Problem Rules Strategies Competition Some Funny New Science Typical RMSEs Theoretically, the RMSE cannot be greater than 2. Users tend to view and rate movies they like, so they typically give 3, 4 or 5 stars rather than 1 or 2 (the above upper bound is unrealistically pessimistic). A basic prediction consists of mapping a triple (u, m, d) to the average rating obtained by the movie m. It achieves 1.0540. At the beginning of the Challenge, Netflix’s in-house prediction system Cinematch achieved 0.9514 (roughly a 10% improvement). Netflix set the following target: obtain a further 10% improvement over Cinematch. David Bessis The Netflix Prize: yet another million dollar problem
  48. 48. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 1: a Cryptographic Trick Netflix has secretly partitioned the qualifying set Q = Q1 Q2 into two subsets of (approximately) equal sizes. David Bessis The Netflix Prize: yet another million dollar problem
  49. 49. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 1: a Cryptographic Trick Netflix has secretly partitioned the qualifying set Q = Q1 Q2 into two subsets of (approximately) equal sizes. The RMSE achieved on Q1 is revealed to participants (there is a public leaderboard). David Bessis The Netflix Prize: yet another million dollar problem
  50. 50. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 1: a Cryptographic Trick Netflix has secretly partitioned the qualifying set Q = Q1 Q2 into two subsets of (approximately) equal sizes. The RMSE achieved on Q1 is revealed to participants (there is a public leaderboard). The RMSE achieved on Q2 is used to determine the winner. David Bessis The Netflix Prize: yet another million dollar problem
  51. 51. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 1: a Cryptographic Trick Netflix has secretly partitioned the qualifying set Q = Q1 Q2 into two subsets of (approximately) equal sizes. The RMSE achieved on Q1 is revealed to participants (there is a public leaderboard). The RMSE achieved on Q2 is used to determine the winner. This prevented participants from “learning from the oracle”. David Bessis The Netflix Prize: yet another million dollar problem
  52. 52. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 1: a Cryptographic Trick Netflix has secretly partitioned the qualifying set Q = Q1 Q2 into two subsets of (approximately) equal sizes. The RMSE achieved on Q1 is revealed to participants (there is a public leaderboard). The RMSE achieved on Q2 is used to determine the winner. This prevented participants from “learning from the oracle”. The goal was to achieve 0.8572. David Bessis The Netflix Prize: yet another million dollar problem
  53. 53. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 2: Crowd Psychology Tricks The Challenged opened on October 2, 2006. David Bessis The Netflix Prize: yet another million dollar problem
  54. 54. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 2: Crowd Psychology Tricks The Challenged opened on October 2, 2006. Annual $50.000 prizes were offered to current leaders David Bessis The Netflix Prize: yet another million dollar problem
  55. 55. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 2: Crowd Psychology Tricks The Challenged opened on October 2, 2006. Annual $50.000 prizes were offered to current leaders provided they made their current methodology public. David Bessis The Netflix Prize: yet another million dollar problem
  56. 56. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 2: Crowd Psychology Tricks The Challenged opened on October 2, 2006. Annual $50.000 prizes were offered to current leaders provided they made their current methodology public. The Challenge was to last for 30 more days after the goal was achieved. David Bessis The Netflix Prize: yet another million dollar problem
  57. 57. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 2: Crowd Psychology Tricks The Challenged opened on October 2, 2006. Annual $50.000 prizes were offered to current leaders provided they made their current methodology public. The Challenge was to last for 30 more days after the goal was achieved. The winner would be the team with the best RMSE after this 30 days period David Bessis The Netflix Prize: yet another million dollar problem
  58. 58. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 2: Crowd Psychology Tricks The Challenged opened on October 2, 2006. Annual $50.000 prizes were offered to current leaders provided they made their current methodology public. The Challenge was to last for 30 more days after the goal was achieved. The winner would be the team with the best RMSE after this 30 days period (no backstabbing arXiv-style “I posted first” effect). David Bessis The Netflix Prize: yet another million dollar problem
  59. 59. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 2: Crowd Psychology Tricks The Challenged opened on October 2, 2006. Annual $50.000 prizes were offered to current leaders provided they made their current methodology public. The Challenge was to last for 30 more days after the goal was achieved. The winner would be the team with the best RMSE after this 30 days period (no backstabbing arXiv-style “I posted first” effect). Every detail was carefully anticipated (even the possibility of a tie). David Bessis The Netflix Prize: yet another million dollar problem
  60. 60. The Problem Rules Strategies Competition Some Funny New Science Very Smart Rules 2: Crowd Psychology Tricks The Challenged opened on October 2, 2006. Annual $50.000 prizes were offered to current leaders provided they made their current methodology public. The Challenge was to last for 30 more days after the goal was achieved. The winner would be the team with the best RMSE after this 30 days period (no backstabbing arXiv-style “I posted first” effect). Every detail was carefully anticipated (even the possibility of a tie). These smart rules, together with the $1.000.000 prize, attracted thousands of participants. David Bessis The Netflix Prize: yet another million dollar problem
  61. 61. The Problem Rules Strategies Competition Some Funny New Science Timeline October 2006: Cinematch RMSE = 0.9514. David Bessis The Netflix Prize: yet another million dollar problem
  62. 62. The Problem Rules Strategies Competition Some Funny New Science Timeline October 2006: Cinematch RMSE = 0.9514. October 2007: team KorBell leads with 0.8712 (8.43% improvement). David Bessis The Netflix Prize: yet another million dollar problem
  63. 63. The Problem Rules Strategies Competition Some Funny New Science Timeline October 2006: Cinematch RMSE = 0.9514. October 2007: team KorBell leads with 0.8712 (8.43% improvement). October 2008: team “BellKor in BigChaos” (two teams merging efforts) leads with 0.8616 (9.44% improvement). David Bessis The Netflix Prize: yet another million dollar problem
  64. 64. The Problem Rules Strategies Competition Some Funny New Science Timeline October 2006: Cinematch RMSE = 0.9514. October 2007: team KorBell leads with 0.8712 (8.43% improvement). October 2008: team “BellKor in BigChaos” (two teams merging efforts) leads with 0.8616 (9.44% improvement). June 26, 2009: the goal is achieved. David Bessis The Netflix Prize: yet another million dollar problem
  65. 65. The Problem Rules Strategies Competition Some Funny New Science Timeline October 2006: Cinematch RMSE = 0.9514. October 2007: team KorBell leads with 0.8712 (8.43% improvement). October 2008: team “BellKor in BigChaos” (two teams merging efforts) leads with 0.8616 (9.44% improvement). June 26, 2009: the goal is achieved. July 26, 2009: Netflix stops gathering solutions. David Bessis The Netflix Prize: yet another million dollar problem
  66. 66. The Problem Rules Strategies Competition Some Funny New Science Timeline October 2006: Cinematch RMSE = 0.9514. October 2007: team KorBell leads with 0.8712 (8.43% improvement). October 2008: team “BellKor in BigChaos” (two teams merging efforts) leads with 0.8616 (9.44% improvement). June 26, 2009: the goal is achieved. July 26, 2009: Netflix stops gathering solutions. The winner is announced on September 18, 2009. David Bessis The Netflix Prize: yet another million dollar problem
  67. 67. The Problem Rules Strategies Competition Some Funny New Science The winning team Three teams combined their results to win the competition: BellKor Bob Bell (AT&T) Yehuda Koren (Yahoo) Chris Volinsky (AT&T) BigChaos Michael Jahrer (Commendo research and consulting) Andreas T¨scher (Commendo research and consulting) o Pragmatic Theory Martin Chabbert (Pragmatic Theory) Martin Piotte (Pragmatic Theory) David Bessis The Netflix Prize: yet another million dollar problem
  68. 68. The Problem Rules Strategies Competition Some Funny New Science The winning team Three teams combined their results to win the competition: BellKor Bob Bell (AT&T) Yehuda Koren (Yahoo) Chris Volinsky (AT&T) BigChaos Michael Jahrer (Commendo research and consulting) Andreas T¨scher (Commendo research and consulting) o Pragmatic Theory Martin Chabbert (Pragmatic Theory) Martin Piotte (Pragmatic Theory) Their winnning submission achieved a RMSE of 0.8567 (10.06% improvement over Cinematch.) David Bessis The Netflix Prize: yet another million dollar problem
  69. 69. The Problem Rules Strategies Competition Some Funny New Science The winning team Three teams combined their results to win the competition: BellKor Bob Bell (AT&T) Yehuda Koren (Yahoo) Chris Volinsky (AT&T) BigChaos Michael Jahrer (Commendo research and consulting) Andreas T¨scher (Commendo research and consulting) o Pragmatic Theory Martin Chabbert (Pragmatic Theory) Martin Piotte (Pragmatic Theory) Their winnning submission achieved a RMSE of 0.8567 (10.06% improvement over Cinematch.) Another team, The Ensemble, achieved the same RMSE... David Bessis The Netflix Prize: yet another million dollar problem
  70. 70. The Problem Rules Strategies Competition Some Funny New Science The winning team Three teams combined their results to win the competition: BellKor Bob Bell (AT&T) Yehuda Koren (Yahoo) Chris Volinsky (AT&T) BigChaos Michael Jahrer (Commendo research and consulting) Andreas T¨scher (Commendo research and consulting) o Pragmatic Theory Martin Chabbert (Pragmatic Theory) Martin Piotte (Pragmatic Theory) Their winnning submission achieved a RMSE of 0.8567 (10.06% improvement over Cinematch.) Another team, The Ensemble, achieved the same RMSE... ...and lost because their submission was posted 24 minutes later! David Bessis The Netflix Prize: yet another million dollar problem
  71. 71. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Computer implementation Memory requirements: Movies can be encoded on 2 bytes (17770 < 2562 ). David Bessis The Netflix Prize: yet another million dollar problem
  72. 72. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Computer implementation Memory requirements: Movies can be encoded on 2 bytes (17770 < 2562 ). Viewers can be encoded on 3 bytes (480189 < 2563 ). David Bessis The Netflix Prize: yet another million dollar problem
  73. 73. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Computer implementation Memory requirements: Movies can be encoded on 2 bytes (17770 < 2562 ). Viewers can be encoded on 3 bytes (480189 < 2563 ). Dates can be encoded on 2 bytes. David Bessis The Netflix Prize: yet another million dollar problem
  74. 74. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Computer implementation Memory requirements: Movies can be encoded on 2 bytes (17770 < 2562 ). Viewers can be encoded on 3 bytes (480189 < 2563 ). Dates can be encoded on 2 bytes. A triple (m, v , d) can be encoded on 7 bytes. David Bessis The Netflix Prize: yet another million dollar problem
  75. 75. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Computer implementation Memory requirements: Movies can be encoded on 2 bytes (17770 < 2562 ). Viewers can be encoded on 3 bytes (480189 < 2563 ). Dates can be encoded on 2 bytes. A triple (m, v , d) can be encoded on 7 bytes. 700 MB suffice to store the dataset. David Bessis The Netflix Prize: yet another million dollar problem
  76. 76. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Computer implementation Memory requirements: Movies can be encoded on 2 bytes (17770 < 2562 ). Viewers can be encoded on 3 bytes (480189 < 2563 ). Dates can be encoded on 2 bytes. A triple (m, v , d) can be encoded on 7 bytes. 700 MB suffice to store the dataset. It is possible (necessary) to work in RAM. David Bessis The Netflix Prize: yet another million dollar problem
  77. 77. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Computer implementation Memory requirements: Movies can be encoded on 2 bytes (17770 < 2562 ). Viewers can be encoded on 3 bytes (480189 < 2563 ). Dates can be encoded on 2 bytes. A triple (m, v , d) can be encoded on 7 bytes. 700 MB suffice to store the dataset. It is possible (necessary) to work in RAM. Commodity hardware is sufficient. David Bessis The Netflix Prize: yet another million dollar problem
  78. 78. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Computer implementation Memory requirements: Movies can be encoded on 2 bytes (17770 < 2562 ). Viewers can be encoded on 3 bytes (480189 < 2563 ). Dates can be encoded on 2 bytes. A triple (m, v , d) can be encoded on 7 bytes. 700 MB suffice to store the dataset. It is possible (necessary) to work in RAM. Commodity hardware is sufficient. (I have some Ruby code to interactively play with the dataset.) David Bessis The Netflix Prize: yet another million dollar problem
  79. 79. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. David Bessis The Netflix Prize: yet another million dollar problem
  80. 80. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. This is likely caused by Cinematch’s data gathering procedure: David Bessis The Netflix Prize: yet another million dollar problem
  81. 81. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. This is likely caused by Cinematch’s data gathering procedure: users sometime rate tens of movies on a single day. David Bessis The Netflix Prize: yet another million dollar problem
  82. 82. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. This is likely caused by Cinematch’s data gathering procedure: users sometime rate tens of movies on a single day. This causes an insanely huge bias within the dataset (movies are perceived differently when rated individually or within a rating spree), not fully exploited by the winners. David Bessis The Netflix Prize: yet another million dollar problem
  83. 83. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. This is likely caused by Cinematch’s data gathering procedure: users sometime rate tens of movies on a single day. This causes an insanely huge bias within the dataset (movies are perceived differently when rated individually or within a rating spree), not fully exploited by the winners. Netflix, do you read me? David Bessis The Netflix Prize: yet another million dollar problem
  84. 84. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. This is likely caused by Cinematch’s data gathering procedure: users sometime rate tens of movies on a single day. This causes an insanely huge bias within the dataset (movies are perceived differently when rated individually or within a rating spree), not fully exploited by the winners. Netflix, do you read me? Some movies were rated by hundreds of thousands viewers, some by just a few (long-tail distribution). David Bessis The Netflix Prize: yet another million dollar problem
  85. 85. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. This is likely caused by Cinematch’s data gathering procedure: users sometime rate tens of movies on a single day. This causes an insanely huge bias within the dataset (movies are perceived differently when rated individually or within a rating spree), not fully exploited by the winners. Netflix, do you read me? Some movies were rated by hundreds of thousands viewers, some by just a few (long-tail distribution). Similarly, a user rated all the movies, and many just a few. David Bessis The Netflix Prize: yet another million dollar problem
  86. 86. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. This is likely caused by Cinematch’s data gathering procedure: users sometime rate tens of movies on a single day. This causes an insanely huge bias within the dataset (movies are perceived differently when rated individually or within a rating spree), not fully exploited by the winners. Netflix, do you read me? Some movies were rated by hundreds of thousands viewers, some by just a few (long-tail distribution). Similarly, a user rated all the movies, and many just a few. Let F be the set of all final 9 ratings for all individual users. David Bessis The Netflix Prize: yet another million dollar problem
  87. 87. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. This is likely caused by Cinematch’s data gathering procedure: users sometime rate tens of movies on a single day. This causes an insanely huge bias within the dataset (movies are perceived differently when rated individually or within a rating spree), not fully exploited by the winners. Netflix, do you read me? Some movies were rated by hundreds of thousands viewers, some by just a few (long-tail distribution). Similarly, a user rated all the movies, and many just a few. Let F be the set of all final 9 ratings for all individual users. Then F = Q P, with P ⊂ T publicly tagged by Netflix. David Bessis The Netflix Prize: yet another million dollar problem
  88. 88. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. This is likely caused by Cinematch’s data gathering procedure: users sometime rate tens of movies on a single day. This causes an insanely huge bias within the dataset (movies are perceived differently when rated individually or within a rating spree), not fully exploited by the winners. Netflix, do you read me? Some movies were rated by hundreds of thousands viewers, some by just a few (long-tail distribution). Similarly, a user rated all the movies, and many just a few. Let F be the set of all final 9 ratings for all individual users. Then F = Q P, with P ⊂ T publicly tagged by Netflix. Q is a random draw of 2/3 of F . David Bessis The Netflix Prize: yet another million dollar problem
  89. 89. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Remarks About 200 ratings per users. This is likely caused by Cinematch’s data gathering procedure: users sometime rate tens of movies on a single day. This causes an insanely huge bias within the dataset (movies are perceived differently when rated individually or within a rating spree), not fully exploited by the winners. Netflix, do you read me? Some movies were rated by hundreds of thousands viewers, some by just a few (long-tail distribution). Similarly, a user rated all the movies, and many just a few. Let F be the set of all final 9 ratings for all individual users. Then F = Q P, with P ⊂ T publicly tagged by Netflix. Q is a random draw of 2/3 of F . Q resembles P but is very dissimilar from T . David Bessis The Netflix Prize: yet another million dollar problem
  90. 90. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Algorithms The machine learning toolbox consists of many methods: Clustering methods. David Bessis The Netflix Prize: yet another million dollar problem
  91. 91. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Algorithms The machine learning toolbox consists of many methods: Clustering methods. Regressions. David Bessis The Netflix Prize: yet another million dollar problem
  92. 92. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Algorithms The machine learning toolbox consists of many methods: Clustering methods. Regressions. Latent parameters methods (SVD). David Bessis The Netflix Prize: yet another million dollar problem
  93. 93. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Algorithms The machine learning toolbox consists of many methods: Clustering methods. Regressions. Latent parameters methods (SVD). Neural networks. David Bessis The Netflix Prize: yet another million dollar problem
  94. 94. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Algorithms The machine learning toolbox consists of many methods: Clustering methods. Regressions. Latent parameters methods (SVD). Neural networks. SVM David Bessis The Netflix Prize: yet another million dollar problem
  95. 95. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Algorithms The machine learning toolbox consists of many methods: Clustering methods. Regressions. Latent parameters methods (SVD). Neural networks. SVM ... David Bessis The Netflix Prize: yet another million dollar problem
  96. 96. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Beginner’s mistakes Underestimate the volume effect. David Bessis The Netflix Prize: yet another million dollar problem
  97. 97. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Beginner’s mistakes Underestimate the volume effect. Think conceptually and discretely rather than globally and continuously. David Bessis The Netflix Prize: yet another million dollar problem
  98. 98. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Beginner’s mistakes Underestimate the volume effect. Think conceptually and discretely rather than globally and continuously. Put users and movies into categories (clustering introduces unwanted discontinuities). David Bessis The Netflix Prize: yet another million dollar problem
  99. 99. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Beginner’s mistakes Underestimate the volume effect. Think conceptually and discretely rather than globally and continuously. Put users and movies into categories (clustering introduces unwanted discontinuities). Learn from the probe. David Bessis The Netflix Prize: yet another million dollar problem
  100. 100. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Beginner’s mistakes Underestimate the volume effect. Think conceptually and discretely rather than globally and continuously. Put users and movies into categories (clustering introduces unwanted discontinuities). Learn from the probe. Dealing with 100 000 000 data isn’t a logic puzzle. David Bessis The Netflix Prize: yet another million dollar problem
  101. 101. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Beginner’s mistakes Underestimate the volume effect. Think conceptually and discretely rather than globally and continuously. Put users and movies into categories (clustering introduces unwanted discontinuities). Learn from the probe. Dealing with 100 000 000 data isn’t a logic puzzle. It resembles Thermodynamics. David Bessis The Netflix Prize: yet another million dollar problem
  102. 102. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Linear regression Suppose all viewers in X have rated all movies in Y : the rating matrix is (rx,y )(x,y )∈X ×Y . David Bessis The Netflix Prize: yet another million dollar problem
  103. 103. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Linear regression Suppose all viewers in X have rated all movies in Y : the rating matrix is (rx,y )(x,y )∈X ×Y . Suppose you want to model the ratings given to a particular movie y0 based on the ratings given to the movies in Y = Y − {y0 }. David Bessis The Netflix Prize: yet another million dollar problem
  104. 104. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Linear regression Suppose all viewers in X have rated all movies in Y : the rating matrix is (rx,y )(x,y )∈X ×Y . Suppose you want to model the ratings given to a particular movie y0 based on the ratings given to the movies in Y = Y − {y0 }. A linear regression is a natural way to do that. David Bessis The Netflix Prize: yet another million dollar problem
  105. 105. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Linear regression Suppose all viewers in X have rated all movies in Y : the rating matrix is (rx,y )(x,y )∈X ×Y . Suppose you want to model the ratings given to a particular movie y0 based on the ratings given to the movies in Y = Y − {y0 }. A linear regression is a natural way to do that. Write (rx,y ) = (Cy )y ∈Y where the Cy are the column vectors. David Bessis The Netflix Prize: yet another million dollar problem
  106. 106. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Linear regression Suppose all viewers in X have rated all movies in Y : the rating matrix is (rx,y )(x,y )∈X ×Y . Suppose you want to model the ratings given to a particular movie y0 based on the ratings given to the movies in Y = Y − {y0 }. A linear regression is a natural way to do that. Write (rx,y ) = (Cy )y ∈Y where the Cy are the column vectors. Performing the linear regression consists of approximating Cy0 by ˆ its orthogonal projection Cy0 on the hyperplane generated by the (Cy )y ∈Y . David Bessis The Netflix Prize: yet another million dollar problem
  107. 107. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Linear regression Suppose all viewers in X have rated all movies in Y : the rating matrix is (rx,y )(x,y )∈X ×Y . Suppose you want to model the ratings given to a particular movie y0 based on the ratings given to the movies in Y = Y − {y0 }. A linear regression is a natural way to do that. Write (rx,y ) = (Cy )y ∈Y where the Cy are the column vectors. Performing the linear regression consists of approximating Cy0 by ˆ its orthogonal projection Cy0 on the hyperplane generated by the (Cy )y ∈Y . Clearly, there exists a unique solution. David Bessis The Netflix Prize: yet another million dollar problem
  108. 108. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Linear regression Suppose all viewers in X have rated all movies in Y : the rating matrix is (rx,y )(x,y )∈X ×Y . Suppose you want to model the ratings given to a particular movie y0 based on the ratings given to the movies in Y = Y − {y0 }. A linear regression is a natural way to do that. Write (rx,y ) = (Cy )y ∈Y where the Cy are the column vectors. Performing the linear regression consists of approximating Cy0 by ˆ its orthogonal projection Cy0 on the hyperplane generated by the (Cy )y ∈Y . Clearly, there exists a unique solution. It optimizes RMSE. David Bessis The Netflix Prize: yet another million dollar problem
  109. 109. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Linear regression Suppose all viewers in X have rated all movies in Y : the rating matrix is (rx,y )(x,y )∈X ×Y . Suppose you want to model the ratings given to a particular movie y0 based on the ratings given to the movies in Y = Y − {y0 }. A linear regression is a natural way to do that. Write (rx,y ) = (Cy )y ∈Y where the Cy are the column vectors. Performing the linear regression consists of approximating Cy0 by ˆ its orthogonal projection Cy0 on the hyperplane generated by the (Cy )y ∈Y . Clearly, there exists a unique solution. It optimizes RMSE. Write the formula! David Bessis The Netflix Prize: yet another million dollar problem
  110. 110. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 1: missing data Not all viewers have seen all movies. David Bessis The Netflix Prize: yet another million dollar problem
  111. 111. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 1: missing data Not all viewers have seen all movies. Worse, there are virtually no complete rectangular blocks within the dataset. David Bessis The Netflix Prize: yet another million dollar problem
  112. 112. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 1: missing data Not all viewers have seen all movies. Worse, there are virtually no complete rectangular blocks within the dataset. Regression by viewers or by movies? David Bessis The Netflix Prize: yet another million dollar problem
  113. 113. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 1: missing data Not all viewers have seen all movies. Worse, there are virtually no complete rectangular blocks within the dataset. Regression by viewers or by movies? It is better to do regression by movies. David Bessis The Netflix Prize: yet another million dollar problem
  114. 114. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 1: missing data Not all viewers have seen all movies. Worse, there are virtually no complete rectangular blocks within the dataset. Regression by viewers or by movies? It is better to do regression by movies. Normalize ratings: David Bessis The Netflix Prize: yet another million dollar problem
  115. 115. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 1: missing data Not all viewers have seen all movies. Worse, there are virtually no complete rectangular blocks within the dataset. Regression by viewers or by movies? It is better to do regression by movies. Normalize ratings: replace the rating rv ,m by the meaningful signal, i.e., the difference r v ,m between rv ,m and the average rating for m. David Bessis The Netflix Prize: yet another million dollar problem
  116. 116. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 1: missing data Not all viewers have seen all movies. Worse, there are virtually no complete rectangular blocks within the dataset. Regression by viewers or by movies? It is better to do regression by movies. Normalize ratings: replace the rating rv ,m by the meaningful signal, i.e., the difference r v ,m between rv ,m and the average rating for m. Then it becomes natural to set r v ,m to 0 when v hasn’t rated m. David Bessis The Netflix Prize: yet another million dollar problem
  117. 117. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 1: missing data Not all viewers have seen all movies. Worse, there are virtually no complete rectangular blocks within the dataset. Regression by viewers or by movies? It is better to do regression by movies. Normalize ratings: replace the rating rv ,m by the meaningful signal, i.e., the difference r v ,m between rv ,m and the average rating for m. Then it becomes natural to set r v ,m to 0 when v hasn’t rated m. Actually, whether or not v has rated m is a meaningful information! David Bessis The Netflix Prize: yet another million dollar problem
  118. 118. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 1: missing data Not all viewers have seen all movies. Worse, there are virtually no complete rectangular blocks within the dataset. Regression by viewers or by movies? It is better to do regression by movies. Normalize ratings: replace the rating rv ,m by the meaningful signal, i.e., the difference r v ,m between rv ,m and the average rating for m. Then it becomes natural to set r v ,m to 0 when v hasn’t rated m. Actually, whether or not v has rated m is a meaningful information! Add normalized bit columns to account for that. David Bessis The Netflix Prize: yet another million dollar problem
  119. 119. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 2: the curse of dimensionality We all know that Lagrange interpolators are not to be used on noisy data. Rather, one should look at best-fitting polynomials of a given low degree. David Bessis The Netflix Prize: yet another million dollar problem
  120. 120. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 2: the curse of dimensionality We all know that Lagrange interpolators are not to be used on noisy data. Rather, one should look at best-fitting polynomials of a given low degree. Similarly, the curse of dimensionality asserts that: With high-dimensionality datasets, one will always find stupid predictors, making perfect predictions on the dataset, and failing to generalize. David Bessis The Netflix Prize: yet another million dollar problem
  121. 121. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 2: the curse of dimensionality We all know that Lagrange interpolators are not to be used on noisy data. Rather, one should look at best-fitting polynomials of a given low degree. Similarly, the curse of dimensionality asserts that: With high-dimensionality datasets, one will always find stupid predictors, making perfect predictions on the dataset, and failing to generalize. By looking at my audience today, what should I be able to infer? David Bessis The Netflix Prize: yet another million dollar problem
  122. 122. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 2: the curse of dimensionality We all know that Lagrange interpolators are not to be used on noisy data. Rather, one should look at best-fitting polynomials of a given low degree. Similarly, the curse of dimensionality asserts that: With high-dimensionality datasets, one will always find stupid predictors, making perfect predictions on the dataset, and failing to generalize. By looking at my audience today, what should I be able to infer? That having long hair is a reasonably good gender predictor? David Bessis The Netflix Prize: yet another million dollar problem
  123. 123. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 2: the curse of dimensionality We all know that Lagrange interpolators are not to be used on noisy data. Rather, one should look at best-fitting polynomials of a given low degree. Similarly, the curse of dimensionality asserts that: With high-dimensionality datasets, one will always find stupid predictors, making perfect predictions on the dataset, and failing to generalize. By looking at my audience today, what should I be able to infer? That having long hair is a reasonably good gender predictor? That wearing a grey sweater is a reasonably good gender predictor? David Bessis The Netflix Prize: yet another million dollar problem
  124. 124. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Real life problems 2: the curse of dimensionality We all know that Lagrange interpolators are not to be used on noisy data. Rather, one should look at best-fitting polynomials of a given low degree. Similarly, the curse of dimensionality asserts that: With high-dimensionality datasets, one will always find stupid predictors, making perfect predictions on the dataset, and failing to generalize. By looking at my audience today, what should I be able to infer? That having long hair is a reasonably good gender predictor? That wearing a grey sweater is a reasonably good gender predictor? Dilemma: overlearning vs underlearning. David Bessis The Netflix Prize: yet another million dollar problem
  125. 125. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Ridge regression (aka Tikhonov regularization) Linear regression: given vectors x, y1 , . . . , yn ∈ Rm , find λ1 , . . . , λn that minimize ||x − λi yi ||2 . David Bessis The Netflix Prize: yet another million dollar problem
  126. 126. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Ridge regression (aka Tikhonov regularization) Linear regression: given vectors x, y1 , . . . , yn ∈ Rm , find λ1 , . . . , λn that minimize ||x − λi yi ||2 . When n is large (with respect to m), the linear system is overdetermined. Overfitting occurs. David Bessis The Netflix Prize: yet another million dollar problem
  127. 127. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Ridge regression (aka Tikhonov regularization) Linear regression: given vectors x, y1 , . . . , yn ∈ Rm , find λ1 , . . . , λn that minimize ||x − λi yi ||2 . When n is large (with respect to m), the linear system is overdetermined. Overfitting occurs. A telltale sign of overfitting is the presence of λi ’s with huge norms compensating each other. David Bessis The Netflix Prize: yet another million dollar problem
  128. 128. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Ridge regression (aka Tikhonov regularization) Linear regression: given vectors x, y1 , . . . , yn ∈ Rm , find λ1 , . . . , λn that minimize ||x − λi yi ||2 . When n is large (with respect to m), the linear system is overdetermined. Overfitting occurs. A telltale sign of overfitting is the presence of λi ’s with huge norms compensating each other. Ridge regression (Tikhonov regularization): find λ1 , . . . , λn that minimize ||x − λi yi ||2 + ε |λi |2 where ε is a well-adjusted (small) penalty term. David Bessis The Netflix Prize: yet another million dollar problem
  129. 129. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Assigning attributes to movies Assume that movies differ by their amount of certain qualities: David Bessis The Netflix Prize: yet another million dollar problem
  130. 130. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Assigning attributes to movies Assume that movies differ by their amount of certain qualities: Violence. David Bessis The Netflix Prize: yet another million dollar problem
  131. 131. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Assigning attributes to movies Assume that movies differ by their amount of certain qualities: Violence. Sex. David Bessis The Netflix Prize: yet another million dollar problem
  132. 132. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Assigning attributes to movies Assume that movies differ by their amount of certain qualities: Violence. Sex. Anything else? David Bessis The Netflix Prize: yet another million dollar problem
  133. 133. Practical issues The Problem Regressions Strategies Latent factors Some Funny New Science Tuning and Blending Assigning attributes to movies Assume that movies differ by their amount of certain qualities: Violence. Sex. Maybe not. David Bessis The Netflix Prize: yet another million dollar problem

×