Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Movie vs movie

1,648 views

Published on

What are your top ten favorite movies of all time? This is a very difficult question. But why? Irmak Sirer explains the challenges of measuring how much we like movies, books, songs, or products; combining insights from diverse sources like the Netflix Prize, Duncan Watts' social experiments, or the beginnings of Facebook. The better we get at measuring and ranking levels of enjoyment, the better we can customize websites, sort search results, find other people with similar tastes, and recommend products, so can we overcome these challenges? Drumroll... Yes, we can.

  • Be the first to comment

Movie vs movie

  1. 1. Irmak Sirer movievsmovie.datasco.pe
  2. 2. How much do we like things?
  3. 3. AGE 7 Oh cool. Pretty good. Space and stuff.
  4. 4. AGE 14 Omigod Omigod Omigod. Epic masterpiece is epic!!!!1! I'm in love with Leia.
  5. 5. AGE 17 WTF?
  6. 6. AGE 30 When you think about it, it's not that good.
  7. 7. AGE 30 When you think about it, it's not that good. Ah, who am I kidding? It's amazing. I'm still in love with Leia.
  8. 8. I mean... look at her.
  9. 9. What determines how much I like a movie?
  10. 10. What determines how much I like a movie? Is my reaction to a movie / book / song predictable?
  11. 11. How much will I like The Book of Eli?
  12. 12. 2006 Cinematch 1 billion user ratings 55,000 movies
  13. 13. Cinematch I have a soulmate in taste Irmak
  14. 14. Cinematch I have a soulmate in taste Irmak Frrmack
  15. 15. Cinematch I have a soulmate in taste Watched the same movies Irmak Frrmack
  16. 16. Cinematch I have a soulmate in taste Watched the same movies Gave the exact same ratings Irmak Frrmack
  17. 17. Cinematch I have a soulmate in taste Watched the same movies Gave the exact same ratings Except The Book of Eli Irmak Frrmack
  18. 18. Cinematch I have a soulmate in taste Frrmack watched The Book of Eli Irmak Frrmack
  19. 19. Cinematch I have a soulmate in taste Oh man, it was… Irmak Frrmack
  20. 20. Cinematch I have a soulmate in taste Oh man, it was… FANTASTIC! Irmak Frrmack
  21. 21. Cinematch I have a soulmate in taste Oh man, it was… FANTASTIC! Predict Irmak Frrmack
  22. 22. No perfect soulmates in real life Irmak
  23. 23. No perfect soulmates in real life Almost soulmate 1 Irmak
  24. 24. No perfect soulmates in real life Almost soulmate 1 Irmak Almost soulmate 2
  25. 25. No perfect soulmates in real life Almost soulmate 1 Irmak Almost soulmate 3 Almost soulmate 2
  26. 26. No perfect soulmates in real life Almost soulmate 1 Irmak Almost soulmate 2 Almost soulmate 3 Almost soulmate 4
  27. 27. No perfect soulmates in real life 87% soulmate Irmak 74% soulmate 82% soulmate 95% soulmate
  28. 28. No perfect soulmates in real life Irmak
  29. 29. No perfect soulmates in real life Irmak
  30. 30. Cinematch Works well for movies that everybody rates
  31. 31. Cinematch Quite bad with movies that only few people rate
  32. 32. Cinematch Some movies are especially difficult to predict Biggest error source: popular but weird 15% of all errors from ONE movie
  33. 33. Trivial: Mean score of everyone
  34. 34. Trivial: Mean score of everyone Error: (RMSE) 1.0540 stars
  35. 35. Trivial: Mean score of everyone Error: (RMSE) 1.0540 stars Cinematch Error: (RMSE) 0.9525 stars
  36. 36. Trivial: Mean score of everyone Error: (RMSE) 1.0540 stars 9.6% Cinematch Error: (RMSE) 0.9525 stars
  37. 37. Trivial: Mean score of everyone Error: (RMSE) 1.0540 stars 9.6% Cinematch Error: (RMSE) 0.9525 stars Better rankings  Better recommendations
  38. 38. Trivial: Mean score of everyone Error: (RMSE) 1.0540 stars 9.6% Cinematch Error: (RMSE) 0.9525 stars Better rankings  Better recommendations + 8.6%  + 1200% people watch top recommendation BigChaos Netflix Prize Report
  39. 39. Cinematch Error: 0.9525 stars
  40. 40. Cinematch Error: 0.9525 stars $1,000,000 for a 10% improvement 2006
  41. 41. Cinematch Error: 0.9525 stars Bring it down to: Error: 0.8563 stars $1,000,000 for a 10% improvement 2006
  42. 42. BellKor’s Pragmatic Chaos
  43. 43. How did they do it?
  44. 44. How did they do it?
  45. 45. How did they do it? Before: Solid assumptions You have a certain taste. Your taste dictates a hidden rating for Book of Eli. When you watch it, this rating is revealed to you.
  46. 46. How did they do it? Before: Solid assumptions G N O R W You have a certain taste. Your taste dictates a hidden rating for Book of Eli. When you watch it, this rating is revealed to you.
  47. 47. How did they do it? After: Your rating changes with time.
  48. 48. How did they do it? After: Your rating changes with time. It depends on...
  49. 49. How did they do it? After: Your rating changes with time. It depends on... how many you rated that day your average rating for the day which movies you rated on this day shown Netflix prediction
  50. 50. Trivial: Mean score of everyone Error: 1.0540 stars Cinematch Error: 0.9525 stars Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
  51. 51. Trivial: Mean score of everyone Error: 1.0540 stars Cinematch Error: 0.9525 stars Your time dependent rating tendencies Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
  52. 52. Trivial: Mean score of everyone Error: 1.0540 stars Cinematch Error: 0.9525 stars Your time dependent rating tendencies Error: 0.9278 stars Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
  53. 53. Trivial: Mean score of everyone Error: 1.0540 stars Cinematch Error: 0.9525 stars 12.0% Your time dependent rating tendencies Error: 0.9278 stars Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
  54. 54. Trivial: Mean score of everyone Error: 1.0540 stars Cinematch Error: 0.9525 stars 12.0% Your time dependent rating tendencies Error: 0.9278 stars without looking at which movies you like/hate! Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
  55. 55. What does this suggest?
  56. 56. What does this suggest? We cannot compare a movie with all others we've seen.
  57. 57. What does this suggest? We cannot compare a movie with all others we've seen. We compare it to a limited set.
  58. 58. What does this suggest? We cannot compare a movie with all others we've seen. We compare it to a limited set. Liking (real time & remembered) depends on time and mood.
  59. 59. What does this suggest? We cannot compare a movie with all others we've seen. We compare it to a limited set. Liking (real time & remembered) depends on time and mood. Other people's opinions affect our own (followers / hipsters)
  60. 60. What does this suggest? We cannot compare Book of Eli with all movies we've seen. We compare it to a limited set. Liking (real time & remembered) depends on time and mood. Other people's opinions affect our own (followers / hipsters)
  61. 61. An experiment Music Lab: A website for downloading music
  62. 62. An experiment Same website: Music download and rating M.J. Salganik, P.S. Dodds, D.J. Watts. Science, 311:854-856, 2006
  63. 63. An experiment Music Lab: A website for downloading music Alternative A: Other people's ratings invisible
  64. 64. An experiment Music Lab: A website for downloading music Alternative A: Other people's ratings invisible More or less equal ratings
  65. 65. An experiment Music Lab: A website for downloading music Alternative A: Other people's ratings invisible Alternative B: All ratings visible More or less equal ratings
  66. 66. An experiment Music Lab: A website for downloading music Alternative A: Other people's ratings invisible Alternative B: All ratings visible More or less equal ratings Several songs snowball in popularity
  67. 67. An experiment Music Lab: A website for downloading music Alternative A: Other people's ratings invisible Alternative B: All ratings visible More or less equal ratings Several songs snowball in popularity It's different songs for each trial
  68. 68. Social influence plays a big part in determining hits and misses
  69. 69. Problems with rating movies We cannot compare a movie with all others we've seen. We compare it to a limited set. Liking (real time & remembered) depends on time and mood. Other people's opinions affect our own.
  70. 70. Degree of liking is sensitive and vague Amazing! Tuesday 3am Total garbage Sunday 12pm
  71. 71. Degree of liking is sensitive and vague Liking (real time & remembered) depends on time and mood. Other people's opinions affect our own.
  72. 72. Degree of liking is sensitive and vague Dependent on many other environmental factors besides our taste
  73. 73. Degree of liking is sensitive and vague We cannot compare a movie with all others we've seen. We compare it to a limited set.
  74. 74. Degree of liking is sensitive and vague Difficult to describe accurately and consistently with a number
  75. 75. Predicting aside, can I even reliably rate & rank movies I’ve seen in terms of enjoyment?
  76. 76. What are your top twenty movies? Irmak Frrmack
  77. 77. What are your top twenty movies? Well… Ummm… Irmak Frrmack
  78. 78. What are your top twenty movies? Well… Ummm… I like Star Wars. Irmak Frrmack
  79. 79. Degree of liking is sensitive and vague Can’t we do something about this?
  80. 80. Degree of liking is sensitive and vague
  81. 81. “Enjoyment” from a movie is very high dimensional information
  82. 82. “Enjoyment” from a movie is very high dimensional information Rating means projecting this onto a single dimension
  83. 83. ?
  84. 84. But sometimes you just want to do the best projection you can What is my top twenty?
  85. 85. Degree of liking is sensitive and vague We cannot compare a movie with all others we've seen. We compare it to a limited set.
  86. 86. Trying to rate Star Wars
  87. 87. Trying to rate Star Wars
  88. 88. Trying to rate Star Wars 1 Map enjoyment to a specific scale
  89. 89. Trying to rate Star Wars 1 Map enjoyment to a specific scale
  90. 90. Trying to rate Star Wars 1 Map enjoyment to a specific scale
  91. 91. Trying to rate Star Wars 2 rating ose corresponding cho king for this degree of li
  92. 92. Trying to rate Star Wars But we cannot keep this entire history of enjoyment in mind
  93. 93. Trying to rate Star Wars But we cannot keep this entire history of enjoyment in mind We fuzzily remember a small subset
  94. 94. Trying to rate Star Wars But we cannot keep this entire history of enjoyment in mind We fuzzily remember a small subset We map based on this subset
  95. 95. Trying to rate Star Wars But we cannot keep this entire history of enjoyment in mind We fuzzily remember a small subset We map based on this subset
  96. 96.   SAMPLING
  97. 97. BIASED SAMPLING
  98. 98. Tuesday
  99. 99. Tuesday
  100. 100. Friday
  101. 101. Friday
  102. 102. Degree of liking is sensitive and vague Can’t we do something about this?
  103. 103. We can certainly handle single comparisons ?
  104. 104. We can certainly handle single comparisons
  105. 105. We can certainly handle single comparisons less vague
  106. 106. We can certainly handle single comparisons little information
  107. 107. I can manually compare it with all others
  108. 108. And find exactly where it belongs Jo n e right after Indiana s ce ss ht before The Prin rig Bride
  109. 109. Full ranking: Compare all pairs
  110. 110. 1,000,000 comparisons? That’s a bit too much effort for me
  111. 111. We don’t need all of them
  112. 112. We don’t need all of them If
  113. 113. We don’t need all of them If ,
  114. 114. We don’t need all of them If , I have some information about
  115. 115. Compare a random sample of pairs
  116. 116. Use a ranking algorithm that utilizes all the information Good idea!
  117. 117. Elo rating system
  118. 118. Elo rating system
  119. 119. Elo rating system
  120. 120. Elo rating system 7.00 “hotness”
  121. 121. Elo rating system -1.50 7.00 “hotness” range +1.50
  122. 122. Elo rating system -1.50 7.00 +1.50 -1.50 8.00 +1.50
  123. 123. Elo rating system -1.50 7.00 7.12 +1.50 -1.50 8.00 7.68 +1.50
  124. 124. Elo rating system -1.50 7.00 7.12 +1.50 -1.50 8.00 7.68 +1.50
  125. 125. Elo rating system -1.50 7.00 7.12 +1.50 -1.50 8.00 7.68 +1.50
  126. 126. Elo rating system -150 7.00 36% to win +150 -150 8.00 64% to win +150
  127. 127. Elo rating system How do we find out what these ranges are?
  128. 128. Elo rating system 5.00 5.00 5.00 5.00 5.00 5.00 Start with the same guess for every contender
  129. 129. Elo rating system ? 5.00 5.00
  130. 130. Elo rating system 5.00 5.00
  131. 131. Elo rating system 5.12 4.88 Update the best guesses accordingly
  132. 132. Elo rating system ? 5.12 5.00
  133. 133. Elo rating system 5.24 4.88
  134. 134. Elo rating system ? 5.24 5.00
  135. 135. Elo rating system 5.14 5.10
  136. 136. We don’t need all comparisons If , I have some information about
  137. 137. Elo rating system ? 7.61 4.02
  138. 138. Elo rating system ? 7.61 4.02 89% to win 11% to win
  139. 139. Elo rating system 7.61 4.02 +.02 -.02 89% to win 11% to win
  140. 140. Elo rating system 7.61 4.02 -.53 +.53 89% to win 11% to win
  141. 141. Elo rating system 9.07 8.42 6.40 4.88 4.20 We now have scores on a single scale 3.03
  142. 142. Elo rating system 9.07 8.42 6.40 4.88 4.20 We now have scores on a single scale (estimates of people’s appreciation levels) 3.03
  143. 143. Elo rating system 9.07 1 8.42 2 6.40 3 4.88 4 and a ranking 3.03 4.20 5 6
  144. 144. Degree of liking is sensitive and vague Can we somehow apply this to movies, then?
  145. 145. We can do better
  146. 146. We can do better Bayesian ranking algorithms
  147. 147. We can do better Bayesian ranking algorithms Glicko (The Elo Killer) 1999
  148. 148. We can do better Bayesian ranking algorithms Glicko TrueSkill™ (The Elo Killer) 1999 2007
  149. 149. Bayesian ranking + - 4.46 + - 4.01
  150. 150. Degree of liking is sensitive and vague Liking (real time & remembered) depends on time and mood. Other people's opinions affect our own.
  151. 151. Bayesian ranking + - 4.46 + - 4.01
  152. 152. Bayesian ranking + - 4.01 4.46 82% to win + - 3% to draw 15% to win
  153. 153. Bayesian ranking ?
  154. 154. Bayesian ranking Elo: ? Best guess for the center 4.3
  155. 155. Bayesian ranking Bayesian: ? It could be centered around 4.3
  156. 156. Bayesian ranking Bayesian: ? It could also be centered around 4.2
  157. 157. Bayesian ranking Bayesian: ? or centered around 4.4
  158. 158. Bayesian ranking Bayesian: ? Less likely but even around 4.5
  159. 159. Probability Bayesian ranking 3.5 ? 4 4.5 4.3 5
  160. 160. Probability Bayesian ranking 3.5 4 4.5 5 uncertainty ? 4.3
  161. 161. Probability 2.0 2.5 3.0 3.5 4 4.5 5 Few comparisons: Lots of uncertainty (anything from 2.3 to 4.5 is quite possible)
  162. 162. Probability 2.0 2.5 3.0 3.5 4 4.5 5 After many comparisons: Quite sure (pretty much between 4.11 to 4.18)
  163. 163. Bayesian ranking ?
  164. 164. Bayesian ranking Lord of the Rings Star Wars 2.0 3.0 4.0 5.0
  165. 165. Bayesian ranking Lord of the Rings Star Wars 2.0 3.0 4.0 5.0
  166. 166. How did they do it? After: A small, constant increase in uncertainty before each comparison Probability Your rating changes with time. 3.5 4 4.5 5 uncertainty
  167. 167. Degree of liking is sensitive and vague Great! We have a system!
  168. 168. How many is too many? I don’t want to spend too much time on this
  169. 169. Minimum Effort Maximum Information
  170. 170. Minimum Effort Maximum Information 1 3 5 1 3 5 1 3 5 1 3 5 1 3 5
  171. 171. Minimum Effort Maximum Information
  172. 172. Minimum Effort Maximum Information
  173. 173. Minimum Effort Maximum Information Not reliable by itself Still carries a lot of information
  174. 174. Minimum Effort Maximum Information 1 3 5
  175. 175. Minimum Effort Maximum Information 1 3 5 1 3 5
  176. 176. What else can we do? I don’t want to spend too much time on this
  177. 177. Minimum Effort Maximum Information ?
  178. 178. Minimum Effort Maximum Information ? 98% to win 1% to draw 1% to win
  179. 179. Minimum Effort Maximum Information ? 98% Did not learn to win anything new
  180. 180. Minimum Effort Maximum Information ? Quite a bit of 2% new information to win
  181. 181. Minimum Effort Maximum Information ? I can calculate the expected amount of information from a comparison!
  182. 182. Minimum Effort Maximum Information
  183. 183. Minimum Effort Maximum Information Certain about both movies Won’t learn a lot
  184. 184. Minimum Effort Maximum Information Certain about both movies Won’t learn a lot
  185. 185. Minimum Effort Maximum Information Certain about both movies Won’t learn a lot Don’t know much about either Will learn a lot regardless of outcome
  186. 186. What are your top twenty movies? Irmak Frrmack
  187. 187. movievsmovie.datasco.pe
  188. 188. Quantifying human reactions are hard books celebrities songs tv shows food importance of issues politicans what to spend ‘fun’ budget on products teams in different sports
  189. 189. Degree of liking is sensitive and vague Amazing! Tuesday 3am Total garbage Sunday 12pm
  190. 190. Quantifying reactions is very useful
  191. 191. Quantifying reactions is very useful customized websites sorting search results recommendations connecting with other people of similar tastes identifying meaningful groups of similar products / people understanding your own preferences
  192. 192. Quantifying human reactions are hard Start with a rating, pose the correct comparisons
  193. 193. Quantifying human reactions are hard Start with a rating, pose the correct comparisons Every decision gets us closer
  194. 194. Degree of liking is sensitive and vague Amazing! Tuesday 3am Total garbage Sunday 12pm
  195. 195. Many comparisons for a movie over different days averages out mood and other factors
  196. 196. Many comparisons for a movie over different days averages out mood and other factors We can’t do much about social influence, but we should just accept that as natural part of how much we like things
  197. 197. Degree of liking is sensitive and vague Amazing! Tuesday 3am Total garbage Sunday 12pm
  198. 198. A great way of collecting desired data is to make it fun
  199. 199. movievsmovie.datasco.pe
  200. 200. Thanks

×