The wonders of Big DataHow Big Data will put the personal back in e-commerceThe Perils of Big DataHow overfitting and a lack of domain knowledge can lead to suboptimal solutionsUser ExperimentsHow user evaluations can be used to create meaningful experiencesA Note on PrivacyHow to avoid this looming danger of our Big Data future
Improvement means reducing the error in predicting user ratingerror = root mean square error between system rating and user rating
Older movies have higher average rating.
Averages are understandable.Bayes and multinomial maybe. Leaders’ models not at all!
Nobody will use these hybrids in a real system
We have a “ground truth” problem. Easy to overfit models on some quirk in the data. We want to make sure we adapt to general human behavior, and ultimately, that we make our users happy.Framework for user centric evaluation, using the example of recommender systems.
If we just have more accurate algorithms, our recommendations will automatically be better!
Also link to Xavier’s blog posts about NetflixAsk who knows A/B testing
But even that is not enough
Also add the Target horror story
I think transparency and control will not help because people are kind of broken.Transparency should make people avoid bad privacy practices and endorse good privacy practices
Control is an illusion, because we can easily influence people’s decisions
People are boundedly rational. Here is another example:
This idea is interesting, because if people don’t choose what is best for them, then why don’t we just push them in the right direction?
Big data - A critical appraisal
+Thomas Debeauvaistdebeauv@uci.eduBart Knijnenburgbart.firstname.lastname@example.org Big Data A critical appraisal
+ 2 Outline The wonders of Big Data The Perils of Big Data User Experiments A Note on Privacy
+The Wondersof Big DataHow Big Data will putthe personal backin e-commerce
+ 4 Large vs small datasets Everything is significant! Data from most/all of your customers More than just an educated guess This is what really happens! Large datasets can improve business intelligence
+ 5 The Netflix challenge Recommendations seen as $1M prize if 10% better than Netflix’ strongest asset Netflix’s Moviematch 2006-2009 Data: 18k movies, 500k users, 100M ratings
+ 6 The Netflix challenge Netflix’s rational: “Improve our ability to connect people to the movies they love” Improve recommendations = improve satisfaction and retention Small R&D team, slow progress $1M will pay for itself Based on Padhraic Smyth’s report at http://www.ics.uci.edu/~smyth/courses/cs277/slides/netflix_over view.pdf
+ 7 Matrix approximation Distinguish noise from signal: variance and eigenvalues Singular value decomposition Ratings(m*n) = U(m*n) E(n*n) V(n*n) Rank-k approximation Ratings(m*n) ≈ U(m*k) E(k*k) V(k*n) n movies k k n movies E V k k m usersm users Ratings = U
independent, quirky, critically acclaimed 8 Plot of V with k=2Lowbrow Drama,comedies, seriousHorror, comedy,Male or Strongadolescent femaleaudience lead mainstream, formulaic [Koren et al. 2009]
+ 10 Take-aways Matrix decomposition Meaningful movie categories! For example: lowbrow, quirky, indie, strong female lead Older movies are rated higher So ...? Should recommend older movies more often or less often? Why are they rated higher?
+The Perilsof Big DataHow overfitting anda lack of domain knowledgecan lead to suboptimal solutions
+ 12 What about random? “We were demonstrating our new recommender to a client. They were amazed by how well it predicted their preferences!” “Later we found out that we forgot to activate the algorithm: the system was giving completely random recommendations.”
+ 14 Model complexity “Our winning entries consist of more than 100 different predictor sets” [Koren et al 2009] Only 10% better than Netflix Why? Intrinsic noise Example: children watch cartoons, Mum is recommended cartoons Should Netflix implement a “switch user” feature? Domain knowledge!
+ 15 More gotchas Obvious truisms and correlation fallacies Still present in large datasets Domain knowledge! Overfitting: simple models that make sense vs complex models that fit the data
+User ExperimentsHow user evaluationscan be used to createmeaningful experiences
+ 17 Offline evaluations Calibration/Evaluation Gather rating data Remove 10% of the ratings of each user Optimize the algorithm to predict those 10% Execution Predict the rating of unknown items Recommend items with highest predicted rating
+ 18 Offline evaluations http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html Problems Solutions Offline evaluations may not Test with real users give the same outcome as (A/B testing) online evaluations (Cosley et al., 2002; McNee et al., 2002) Higher rating does not mean Consider other behaviors good recommendation (McNee (consumption, retention) et al., 2006) The algorithm counts for only A/B test other aspects 5% of the relevance of a (interaction, presentation) recommender system (Francisco Martin, 2009)
+ 19 Online evaluations Testing a recommender against a random videoclip system (A/B test) number of clips watched Expectation: Consumption from beginning to end total number of + viewing time clips clicked will increase Reality: The number of personalized recommendations − − clicked clips and total viewing OSA time went down! perceived system effectiveness + EXP + Insight: Recommender is more perceived recommendation quality effective SSA + More clips watched from choice satisfaction beginning to end EXP Users browse less, consume more
+ 20 Behavior vs Questionnaires Behavior is hard to interpret Relationship between behavior and satisfaction is not always trivial Questionnaires are a better predictor of long-term retention With behavior only, you will need to run for a long time Questionnaire data is more robust Fewer participants needed
+ 21 A guide to user experiments http://bit.ly/recsys2011short http://bit.ly/recsystutorialhandout “Is my system good?” What does good mean? We need to define measures “Does my system score high on this satisfaction scale?” What does high mean? We need to compare it against something “Does my system score higher than this other system?” Say we find that it scores higher on satisfaction... why does it? Apply the concept of ceteris paribus
+ 22 An example… We compared three recommender systems Three different algorithms System effectiveness scale: The system has no real benefit for me. I would recommend the system to others. The system is useful. I can save time using the system. I can find better TV programs without the help of the system.
+ 23 An example… The mediating variables tell the entire story
+ 24 An example… Matrix Factorization recommender with Matrix Factorization recommender with explicit feedback (MF-E) implicit feedback (MF-I) (versus generally most popular; GMP) (versus most popular; GMP) OSA OSA + + perceived recommendation perceived recommendation perceived system variety + quality + effectiveness SSA SSA EXP
+A Note on PrivacyHow to avoidthis looming dangerof our Big Data future
+ 27 Privacy concerns Second Netflix challenge Anonymized dataset Lawsuit from Californian closeted lesbian Mum Netflix withdraws their second challenge http://arstechnica.com/tech-policy/2012/07/class-action-lawsuit- settlement-forces-netflix-privacy-changes/
+ 28 Privacy directive Transparency “companies should provide clear descriptions of [...] why they need the data, how they will use it” Informed consent Control “companies should offer consumers clear and simple choices [...] about personal data collection, use, and disclosure” User empowerment
+ 30 Control Paradox “bewildering tangle of options” (New York Times, 2010) “labyrinthian controls” (U.S. Consumer Magazine, 2012) Researchers asked: “what do your privacy settings mean?” 86% of Facebook users got it wrong!
+ 31 Control Paradox http://bit.ly/chi2013privacy Introducing an “extreme” E sharing option Nothing - City - Blockbenefits B Add the option Exact Expected: C Some will choose Exact instead of Block N Unexpected: privacy Sharing increases across the board!
+ 33 Idea: nudging People do not always choose what is best for them Idea: use defaults to “nudge” users in the right direction
+ 34 What is the right direction? “More information = better, e.g. for personalization” Techniques to increase disclosure cause reactance in the more privacy-minded users “Privacy is an absolute right“ More difficult for less privacy-minded users to enjoy the benefits that disclosure would provide
+ 35 It depends on the user! “What is best for consumers depends upon characteristics of the consumer An outcome that maximizes consumer welfare may be suboptimal for some consumers in a context where there is heterogeneity in preferences” (Smith, Goldstein & Johnson, 2009)
+ 36 Privacy Adaptation Procedure http://bit.ly/privdim Idea: Personalize users’ privacy settings! Automatic defaults in line with “disclosure profile” Using big data to improve big data privacy Relieves some of the burden of the privacy decision: The right privacy-related information The right amount of control “Realistic empowerment”
+ The wonders of Big Data Big Data can be used to create powerful personalized e-commerce experiences The Perils of Big Data Big Data solutions will only work if the developers have an adequate amount of domain knowledge User Experiments Big Data solutions need to be tested onConclusions real users, with a focus on user experience A Note on Privacy Big Data can raise privacy concerns, but it can at the same time be used to alleviate these concerns
+ The wonders of Big Data Big Data can be used to create powerful personalized e-commerce experiences The Perils of Big Data Big Data solutions will only work if the developers have an adequate amount of domain knowledge User ExperimentsQuestions? Big Data solutions need to be tested on real users, with a focus on user experience A Note on Privacy Big Data can raise privacy concerns, but it can at the same time be used to alleviate these concerns