Big data - A critical appraisal

719 views

Published on

Invited talk by Bart Knijnenburg and Thomas Debeauvais at the IIBA OC dinner meeting

Published in: Education
1 Comment
4 Likes
Statistics
Notes
  • Some links:

    Bart Knijnenburg: http://www.usabart.nl/ - twitter: @usabart
    Thomas Debeauvais: http://www.ics.uci.edu/~tdebeauv/

    Padhraic Smyth’s report on the Netflix challenge: http://www.ics.uci.edu/~smyth/courses/cs277/slides/netflix_over view.pdf

    Xavier Amatriain's blog on Netflix data science: http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

    Bart's work on user experiments: http://bit.ly/recsys2011short - http://bit.ly/recsystutorialhandout - http://bit.ly/tedxbart

    Bart's work on privacy: http://bit.ly/chi2013privacy - http://bit.ly/privdim

    The Nudge book: http://www.amazon.com/Nudge-Improving-Decisions-Health-Happiness/dp/014311526X
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
719
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
9
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide
  • The wonders of Big DataHow Big Data will put the personal back in e-commerceThe Perils of Big DataHow overfitting and a lack of domain knowledge can lead to suboptimal solutionsUser ExperimentsHow user evaluations can be used to create meaningful experiencesA Note on PrivacyHow to avoid this looming danger of our Big Data future
  • Improvement means reducing the error in predicting user ratingerror = root mean square error between system rating and user rating
  • Older movies have higher average rating.
  • ASK QUESTIONS?
  • Averages are understandable.Bayes and multinomial maybe. Leaders’ models not at all!
  • Nobody will use these hybrids in a real system
  • ASK QUESTIONS?
  • We have a “ground truth” problem. Easy to overfit models on some quirk in the data. We want to make sure we adapt to general human behavior, and ultimately, that we make our users happy.Framework for user centric evaluation, using the example of recommender systems.
  • If we just have more accurate algorithms, our recommendations will automatically be better!
  • Also link to Xavier’s blog posts about NetflixAsk who knows A/B testing
  • But even that is not enough
  • ASK QUESTIONS?
  • Also add the Target horror story
  • I think transparency and control will not help because people are kind of broken.Transparency should make people avoid bad privacy practices and endorse good privacy practices
  • Control is an illusion, because we can easily influence people’s decisions
  • People are boundedly rational. Here is another example:
  • This idea is interesting, because if people don’t choose what is best for them, then why don’t we just push them in the right direction?
  • ASK QUESTIONS?
  • ASK QUESTIONS?
  • Big data - A critical appraisal

    1. 1. +Thomas Debeauvaistdebeauv@uci.eduBart Knijnenburgbart.k@uci.edu Big Data A critical appraisal
    2. 2. + 2 Outline  The wonders of Big Data  The Perils of Big Data  User Experiments  A Note on Privacy
    3. 3. +The Wondersof Big DataHow Big Data will putthe personal backin e-commerce
    4. 4. + 4 Large vs small datasets  Everything is significant!  Data from most/all of your customers  More than just an educated guess  This is what really happens!  Large datasets can improve business intelligence
    5. 5. + 5 The Netflix challenge  Recommendations seen as  $1M prize if 10% better than Netflix’ strongest asset Netflix’s Moviematch  2006-2009  Data: 18k movies, 500k users, 100M ratings
    6. 6. + 6 The Netflix challenge  Netflix’s rational:  “Improve our ability to connect people to the movies they love”  Improve recommendations = improve satisfaction and retention  Small R&D team, slow progress  $1M will pay for itself  Based on Padhraic Smyth’s report at http://www.ics.uci.edu/~smyth/courses/cs277/slides/netflix_over view.pdf
    7. 7. + 7 Matrix approximation  Distinguish noise from signal: variance and eigenvalues  Singular value decomposition  Ratings(m*n) = U(m*n) E(n*n) V(n*n)  Rank-k approximation  Ratings(m*n) ≈ U(m*k) E(k*k) V(k*n) n movies k k n movies E V k k m usersm users Ratings = U
    8. 8. independent, quirky, critically acclaimed 8 Plot of V with k=2Lowbrow Drama,comedies, seriousHorror, comedy,Male or Strongadolescent femaleaudience lead mainstream, formulaic [Koren et al. 2009]
    9. 9. + 9 Bias is information [Smyth 2010]
    10. 10. + 10 Take-aways  Matrix decomposition  Meaningful movie categories!  For example: lowbrow, quirky, indie, strong female lead  Older movies are rated higher  So ...?  Should recommend older movies more often or less often?  Why are they rated higher?
    11. 11. +The Perilsof Big DataHow overfitting anda lack of domain knowledgecan lead to suboptimal solutions
    12. 12. + 12 What about random?  “We were demonstrating our new recommender to a client. They were amazed by how well it predicted their preferences!”  “Later we found out that we forgot to activate the algorithm: the system was giving completely random recommendations.”
    13. 13. + 13 Tradeoffs
    14. 14. + 14 Model complexity  “Our winning entries consist of more than 100 different predictor sets” [Koren et al 2009]  Only 10% better than Netflix  Why?  Intrinsic noise  Example: children watch cartoons, Mum is recommended cartoons  Should Netflix implement a “switch user” feature?  Domain knowledge!
    15. 15. + 15 More gotchas  Obvious truisms and correlation fallacies  Still present in large datasets  Domain knowledge!  Overfitting: simple models that make sense vs complex models that fit the data
    16. 16. +User ExperimentsHow user evaluationscan be used to createmeaningful experiences
    17. 17. + 17 Offline evaluations  Calibration/Evaluation  Gather rating data  Remove 10% of the ratings of each user  Optimize the algorithm to predict those 10%  Execution  Predict the rating of unknown items  Recommend items with highest predicted rating
    18. 18. + 18 Offline evaluations http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html  Problems  Solutions  Offline evaluations may not  Test with real users give the same outcome as (A/B testing) online evaluations (Cosley et al., 2002; McNee et al., 2002)  Higher rating does not mean  Consider other behaviors good recommendation (McNee (consumption, retention) et al., 2006)  The algorithm counts for only  A/B test other aspects 5% of the relevance of a (interaction, presentation) recommender system (Francisco Martin, 2009)
    19. 19. + 19 Online evaluations  Testing a recommender against a random videoclip system (A/B test) number of clips watched  Expectation: Consumption from beginning to end total number of + viewing time clips clicked will increase  Reality: The number of personalized recommendations − − clicked clips and total viewing OSA time went down! perceived system effectiveness + EXP +  Insight: Recommender is more perceived recommendation quality effective SSA +  More clips watched from choice satisfaction beginning to end EXP  Users browse less, consume more
    20. 20. + 20 Behavior vs Questionnaires  Behavior is hard to interpret  Relationship between behavior and satisfaction is not always trivial  Questionnaires are a better predictor of long-term retention  With behavior only, you will need to run for a long time  Questionnaire data is more robust  Fewer participants needed
    21. 21. + 21 A guide to user experiments http://bit.ly/recsys2011short http://bit.ly/recsystutorialhandout  “Is my system good?”  What does good mean?  We need to define measures  “Does my system score high on this satisfaction scale?”  What does high mean?  We need to compare it against something  “Does my system score higher than this other system?”  Say we find that it scores higher on satisfaction... why does it?  Apply the concept of ceteris paribus
    22. 22. + 22 An example…  We compared three recommender systems  Three different algorithms  System effectiveness scale:  The system has no real benefit for me.  I would recommend the system to others.  The system is useful.  I can save time using the system.  I can find better TV programs without the help of the system.
    23. 23. + 23 An example… The mediating variables tell the entire story
    24. 24. + 24 An example… Matrix Factorization recommender with Matrix Factorization recommender with explicit feedback (MF-E) implicit feedback (MF-I) (versus generally most popular; GMP) (versus most popular; GMP) OSA OSA + + perceived recommendation perceived recommendation perceived system variety + quality + effectiveness SSA SSA EXP
    25. 25. +A Note on PrivacyHow to avoidthis looming dangerof our Big Data future
    26. 26. + 26 Personalization… with control
    27. 27. + 27 Privacy concerns  Second Netflix challenge  Anonymized dataset  Lawsuit from Californian closeted lesbian Mum  Netflix withdraws their second challenge  http://arstechnica.com/tech-policy/2012/07/class-action-lawsuit- settlement-forces-netflix-privacy-changes/
    28. 28. + 28 Privacy directive  Transparency  “companies should provide clear descriptions of [...] why they need the data, how they will use it”  Informed consent  Control  “companies should offer consumers clear and simple choices [...] about personal data collection, use, and disclosure”  User empowerment
    29. 29. + 29 Transparency Paradox
    30. 30. + 30 Control Paradox  “bewildering tangle of options” (New York Times, 2010)  “labyrinthian controls” (U.S. Consumer Magazine, 2012)  Researchers asked: “what do your privacy settings mean?”  86% of Facebook users got it wrong!
    31. 31. + 31 Control Paradox http://bit.ly/chi2013privacy  Introducing an “extreme” E sharing option  Nothing - City - Blockbenefits  B  Add the option Exact  Expected: C  Some will choose Exact instead of Block N  Unexpected: privacy   Sharing increases across the board!
    32. 32. + 32 Bounded rationalityA 25% ?B 37% ?C 53% ?D 0% ?
    33. 33. + 33 Idea: nudging  People do not always choose what is best for them  Idea: use defaults to “nudge” users in the right direction
    34. 34. + 34 What is the right direction?  “More information = better, e.g. for personalization”  Techniques to increase disclosure cause reactance in the more privacy-minded users  “Privacy is an absolute right“  More difficult for less privacy-minded users to enjoy the benefits that disclosure would provide
    35. 35. + 35 It depends on the user!  “What is best for consumers depends upon characteristics of the consumer  An outcome that maximizes consumer welfare may be suboptimal for some consumers in a context where there is heterogeneity in preferences” (Smith, Goldstein & Johnson, 2009)
    36. 36. + 36 Privacy Adaptation Procedure http://bit.ly/privdim  Idea:  Personalize users’ privacy settings!  Automatic defaults in line with “disclosure profile”  Using big data to improve big data privacy   Relieves some of the burden of the privacy decision:  The right privacy-related information  The right amount of control  “Realistic empowerment”
    37. 37. +  The wonders of Big Data Big Data can be used to create powerful personalized e-commerce experiences  The Perils of Big Data Big Data solutions will only work if the developers have an adequate amount of domain knowledge  User Experiments Big Data solutions need to be tested onConclusions real users, with a focus on user experience  A Note on Privacy Big Data can raise privacy concerns, but it can at the same time be used to alleviate these concerns
    38. 38. +  The wonders of Big Data  Big Data can be used to create powerful personalized e-commerce experiences  The Perils of Big Data  Big Data solutions will only work if the developers have an adequate amount of domain knowledge  User ExperimentsQuestions?  Big Data solutions need to be tested on real users, with a focus on user experience  A Note on Privacy  Big Data can raise privacy concerns, but it can at the same time be used to alleviate these concerns

    ×