Successfully reported this slideshow.
The User Experience    of Recommender Systems
IntroductionWhat I want to do today
IntroductionBart Knijnenburg - Current PhD candidate,     -     Informatics         UC Irvine     -   Research intern,    ...
IntroductionBart Knijnenburg - UMUAI paper Experience    -Explaining the User        of Recommender Systems        (first U...
IntroductionMy goal:   Introduce my framework for user-centric evaluation of   recommender systems   Explain how the frame...
Introduction        “What is a user experiment?”  “A user experiment is a scientific method to    investigate how and why s...
IntroductionWhat I want to do todayBeyond the 5 starsThe Netflix approachTowards user experienceHow to improve upon the Net...
Beyond the 5 stars    The Netflix approach
Beyond the 5 starsOffline evaluations may not give the same outcome asonline evaluations   Cosley et al., 2002; McNee et al...
Beyond the 5 starsSystem                         Interactionalgorithm                            rating                   ...
Beyond the 5 starsHigher accuracy does not always mean higher satisfaction   McNee et al., 2006Solution: Consider other be...
Beyond the 5 starsSystem                         Interactionalgorithm                            rating                   ...
Beyond the 5 starsThe algorithm counts for only 5% of the relevance of arecommender system   Francisco Martin - RecSys 200...
Beyond the 5 stars  System                          Interaction algorithm                              ratinginteraction  ...
Beyond the 5 starsStart with a hypothesis   Algorithm/feature/design X will increase member   engagement and ultimately me...
Beyond the 5 starsLet data speak for itself   Many different metrics   We ultimately trust member engagement (e.g. hours o...
Towards user experience   How to improve upon the Netflix approach
User experience   “Testing a recommender against a staticvideoclip system, the number of clicked clips     and total viewi...
User experience             number of          clips watched          from beginning               to end                 ...
User experienceBehavior is hard to interpret   Relationship between behavior and satisfaction is not   always trivialUser ...
User experienceMeasure subjective valuations with questionnaires   Perception and experienceTriangulate these data with be...
User experiencePerception: do users notice the manipulation? - Understandability, perceived control - Perceived recommenda...
User experience  System        Perception   Experience         Interaction algorithm       usability     system           ...
User experiencePersonal and situational characteristics may have animportant impact   Adomavicius et al., 2005; Knijnenbur...
User experience                         Situational Characteristics                   routine         system trust        ...
Evaluation framework  A framework for user-centric evaluation
FrameworkFramework for user-centric evaluation of recommenders                                      Situational Characteri...
Framework                        Situational Characteristics                   routine      system trust        choice goa...
Framework                         Situational Characteristics                   routine  User Experience (EXP)          sy...
Framework                         Situational Characteristics                   routine        system trust Subjectivegoal...
Framework                           Situational Characteristics                     routine         system trust        ch...
Framework                          Situational Characteristics                    routine         system trust        choi...
Framework                           Situational Characteristics                     routine       system trust        choi...
Framework                         Situational Characteristics                   routine         system trust        choice...
FrameworkHas suggestions for measurement scales   e.g. How can we measure something like “satisfaction”?Provides a good st...
Framework“Statisticians, like artists, have the bad habit    of falling in love with their models.”                 George...
MeasurementMeasuring subjective valuations
Measurement“To measure satisfaction, we asked users     whether they liked the system      (on a 5-point rating scale).”  ...
MeasurementDoes the question mean the same to everyone? - John likes the system because it is convenient - Mary likes the ...
MeasurementMeasure at least 5 items per concept   At least 2-3 should remain after removing bad itemsIt is most convenient...
MeasurementUse both positively and negatively phrased items - They make the questionnaire less “leading” - They help filter...
MeasurementChoose simple over specialized words   Participants may have no idea they are using a   “recommender system”Use...
MeasurementUse design techniques that help to improve recall andprovide appropriate time references   Bad: “I liked the si...
Measurement“We asked users ten 5-point scale questions       and summed the answers.”                                  INF...
MeasurementIs the scale really measuring a single thing? - 5 items measure satisfaction, the other 5 convenience - The ite...
MeasurementSolution: factor analysis - Define latent factors, specify how items “load” on them - Factor analysis will deter...
Measurementcollct22                                                                      ctrl28collct23       gipc16   gip...
Measurementcollct22                gipc17     gipc18   gipc21collct24collct25         1                                   ...
AnalysisStatistical evaluation of results
Analysis                                         Perceived qualityManipulation -> perception:          1   Do these two al...
Analysis                                      System effectiveness                            2Perception -> experience:  ...
AnalysisManipulation -> perception-> experience                  personalized                                recommendatio...
Analysis                                                  Perceived quality                                   0.6Two manip...
Analysis                                          System usefulness                                2Manipulation x persona...
AnalysisUse only one method: structural equation models - A combination of Factor Analysis and Path Models - The statistic...
AnalysisWe compared threerecommender systems  Three different algorithmsMF-I and MF-E algorithmsare more effective     Why...
AnalysisKnijnenburg et al.: “Explaining the user experience of recommender systems”, UMUAI 2012The mediating variables sho...
Analysis                                                                                                            partic...
Analysis“All models are wrong, but some are useful.”                George Box                                   INFORMATI...
Some examplesFindings based on the framework
Some examplesGiving users many options: - increases the chance they find something nice - also increases the difficulty of m...
Some examples                              Top-20                                                 Lin-20                  ...
Some examples           INFORMATION AND COMPUTER SCIENCES
Some examples      Perceived quality                 Choice difficulty                Choice satisfaction0.6              ...
Some examplesExperts and novices differ in their decision-making process - Experts want to leverage their domain knowledge...
Some examples           INFORMATION AND COMPUTER SCIENCES
Some examples           System usefulness 2 case-based PE         attribute-based PE  1 0 -1 -2      -3           0       ...
Some examples                                                     2!                                                      ...
Some examples                             2!                 *!          1!                                               ...
Some examples                                  2!                                  1!                                     ...
Some examples                                           45!                                           40!User interface sa...
Some examples                                                    2!Perceived system effectiveness!                        ...
Some examples                                    2!                                                         *!            ...
Some examplesSocial recommenders limit the neighbors to the users’friendsResult: - More control: friends can be rated as w...
Some examplesRate likes   Weigh friends                    INFORMATION AND COMPUTER SCIENCES
Some examples           INFORMATION AND COMPUTER SCIENCES
Some examples                                                                                                 Personal Cha...
IntroductionI discussed how to do user experiments with my frameworkBeyond the 5 starsThe Netflix approach is a step in the...
See you in Dublin! bart.k@uci.edu www.usabart.nl    @usabart
ResourcesUser-centric evaluation   Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C.:   Explaining th...
ResourcesPreference elicitation methods   Knijnenburg, B.P., Reijmer, N.J.M., Willemsen, M.C.: Each to His Own: How   Diff...
ResourcesUser feedback and privacy   Knijnenburg, B.P., Kobsa, A: Making Decisions about Privacy: Information   Disclosure...
Upcoming SlideShare
Loading in …5
×

Explaining the User Experience of Recommender Systems with User Experiments

3,084 views

Published on

A talk I gave at the Netflix offices on July 2nd, 2012.

Please do not use any of the slides or their contents without my explicit permission (bart@usabart.nl for inquiries).

Published in: Technology

Explaining the User Experience of Recommender Systems with User Experiments

  1. 1. The User Experience of Recommender Systems
  2. 2. IntroductionWhat I want to do today
  3. 3. IntroductionBart Knijnenburg - Current PhD candidate, - Informatics UC Irvine - Research intern, Samsung R&D - Past -Researcher & teacher, TU Eindhoven - MSc Human Technology Interaction, TU Eindhoven - M Human-Computer Interaction, Carnegie Mellon INFORMATION AND COMPUTER SCIENCES
  4. 4. IntroductionBart Knijnenburg - UMUAI paper Experience -Explaining the User of Recommender Systems (first UX framework in RecSys) - Founder ofonUCERSTI -workshop user-centric evaluation of recommender systems and their interfaces - this year: tutorial on user experiments INFORMATION AND COMPUTER SCIENCES
  5. 5. IntroductionMy goal: Introduce my framework for user-centric evaluation of recommender systems Explain how the framework can help in conducting user experimentsMy approach: - Start with “beyond the five stars” - Explain how to improve upon this - Introduce a 21st century approach to user experiments INFORMATION AND COMPUTER SCIENCES
  6. 6. Introduction “What is a user experiment?” “A user experiment is a scientific method to investigate how and why system aspectsinfluence the users’ experience and behavior.” INFORMATION AND COMPUTER SCIENCES
  7. 7. IntroductionWhat I want to do todayBeyond the 5 starsThe Netflix approachTowards user experienceHow to improve upon the Netflix approachEvaluation frameworkA framework for user-centric evaluationMeasurementMeasuring subjective valuationsAnalysisStatistical evaluation of resultsSome examplesFindings based on the framework
  8. 8. Beyond the 5 stars The Netflix approach
  9. 9. Beyond the 5 starsOffline evaluations may not give the same outcome asonline evaluations Cosley et al., 2002; McNee et al., 2002Solution: Test with real users INFORMATION AND COMPUTER SCIENCES
  10. 10. Beyond the 5 starsSystem Interactionalgorithm rating INFORMATION AND COMPUTER SCIENCES
  11. 11. Beyond the 5 starsHigher accuracy does not always mean higher satisfaction McNee et al., 2006Solution: Consider other behaviors “Beyond the five stars” INFORMATION AND COMPUTER SCIENCES
  12. 12. Beyond the 5 starsSystem Interactionalgorithm rating consumption retention INFORMATION AND COMPUTER SCIENCES
  13. 13. Beyond the 5 starsThe algorithm counts for only 5% of the relevance of arecommender system Francisco Martin - RecSys 2009 keynoteSolution: test those other aspects “Beyond the five stars” INFORMATION AND COMPUTER SCIENCES
  14. 14. Beyond the 5 stars System Interaction algorithm ratinginteraction consumptionpresentation retention INFORMATION AND COMPUTER SCIENCES
  15. 15. Beyond the 5 starsStart with a hypothesis Algorithm/feature/design X will increase member engagement and ultimately member retentionDesign a test Develop a solution or prototype Think about dependent & independent variables, control, significance…Execute the test INFORMATION AND COMPUTER SCIENCES
  16. 16. Beyond the 5 starsLet data speak for itself Many different metrics We ultimately trust member engagement (e.g. hours of play) and retentionIs this good enough? I think it is not INFORMATION AND COMPUTER SCIENCES
  17. 17. Towards user experience How to improve upon the Netflix approach
  18. 18. User experience “Testing a recommender against a staticvideoclip system, the number of clicked clips and total viewing time went down!” INFORMATION AND COMPUTER SCIENCES
  19. 19. User experience number of clips watched from beginning to end total number of + viewing time clips clicked personalized − − recommendations OSA perceived system effectiveness + EXP + perceived recommendation quality SSA + choice satisfaction EXPKnijnenburg et al.: “Receiving Recommendations and Providing Feedback”, EC-Web 2010 INFORMATION AND COMPUTER SCIENCES
  20. 20. User experienceBehavior is hard to interpret Relationship between behavior and satisfaction is not always trivialUser experience is a better predictor of long-term retention With behavior only, you will need to run for a long timeQuestionnaire data is more robust Fewer participants needed INFORMATION AND COMPUTER SCIENCES
  21. 21. User experienceMeasure subjective valuations with questionnaires Perception and experienceTriangulate these data with behavior Find out how they mediate the effect on behaviorMeasure every step in your theory Create a chain of mediating variables INFORMATION AND COMPUTER SCIENCES
  22. 22. User experiencePerception: do users notice the manipulation? - Understandability, perceived control - Perceived recommendation diversity and qualityExperience: self-relevant evaluations of the system - Process-related (e.g. choice difficulty) - System-related (e.g. system effectiveness) - Outcome related (e.g. choice satisfaction) INFORMATION AND COMPUTER SCIENCES
  23. 23. User experience System Perception Experience Interaction algorithm usability system ratinginteraction quality process consumptionpresentation appeal outcome retention INFORMATION AND COMPUTER SCIENCES
  24. 24. User experiencePersonal and situational characteristics may have animportant impact Adomavicius et al., 2005; Knijnenburg et al., 2012Solution: measure those as well INFORMATION AND COMPUTER SCIENCES
  25. 25. User experience Situational Characteristics routine system trust choice goal System Perception Experience Interaction algorithm usability system ratinginteraction quality process consumptionpresentation appeal outcome retention Personal Characteristics gender privacy expertise INFORMATION AND COMPUTER SCIENCES
  26. 26. Evaluation framework A framework for user-centric evaluation
  27. 27. FrameworkFramework for user-centric evaluation of recommenders Situational Characteristics routine system trust choice goal System Perception Experience Interaction algorithm usability system rating interaction quality process consumption presentation appeal outcome retention Personal Characteristics gender privacy expertise Knijnenburg et al.: “Explaining the user experience of recommender systems”, UMUAI 2012 INFORMATION AND COMPUTER SCIENCES
  28. 28. Framework Situational Characteristics routine system trust choice goal System Perception System Aspects (OSA) Objective Experience Interaction algorithm - usability / interaction design visual system ratinginteraction - recommenderprocess quality algorithm consumptionpresentation - presentation of recommendations appeal outcome retention - additional features Personal Characteristics gender privacy expertise INFORMATION AND COMPUTER SCIENCES
  29. 29. Framework Situational Characteristics routine User Experience (EXP) system trust choice goal Different aspects may System different things influence Perception Experience Interaction algorithm usability system rating - interface -> systeminteraction evaluations quality process consumption -presentation appeal preference elicitation outcome retention method -> choice process - algorithm -> quality of Personal Characteristics the final choice gender privacy expertise INFORMATION AND COMPUTER SCIENCES
  30. 30. Framework Situational Characteristics routine system trust Subjectivegoal Perception or choice System Aspects (SSA) System Perception Experienceto EXP Link OSA Interaction algorithm usability system (mediation) ratinginteraction quality process consumption Increase the robustnesspresentation appeal of the effects of OSAs on outcome retention EXP How and why OSAs Personal Characteristics affect EXP gender privacy expertise INFORMATION AND COMPUTER SCIENCES
  31. 31. Framework Situational Characteristics routine system trust choice goal System Perception Experience Interaction Personal characteristics (PC) algorithm usability system rating Users’ general dispositioninteraction quality process consumption Beyond the influence of the systempresentation appeal outcome retention Typically stable across tasks Personal Characteristics gender privacy expertise INFORMATION AND COMPUTER SCIENCES
  32. 32. Framework Situational Characteristics routine system trust choice goal System Perception Experience Interaction Situational Characteristics (SC) algorithm usability system rating How the task influences the experienceinteraction quality process consumption Also beyond the influence of the systempresentation appeal outcome retention Here used for evaluation, not for augmenting algorithms! Personal Characteristics gender privacy expertise INFORMATION AND COMPUTER SCIENCES
  33. 33. Framework Situational Characteristics routine system trust choice goal Interaction (INT) System Perception Experience Interaction Observable behavior algorithm usability system rating - browsing, viewing, log-insinteraction quality process consumption Final step of evaluationpresentation appeal outcome retention But not the goal of the evaluation Triangulate with EXP Personal Characteristics gender privacy expertise INFORMATION AND COMPUTER SCIENCES
  34. 34. Framework Situational Characteristics routine system trust choice goal System Perception Experience Interaction algorithm usability system ratinginteraction quality process consumptionpresentation appeal outcome retention Personal Characteristics gender privacy expertise INFORMATION AND COMPUTER SCIENCES
  35. 35. FrameworkHas suggestions for measurement scales e.g. How can we measure something like “satisfaction”?Provides a good starting point for causal relations e.g. How and why do certain system aspects influence the user experience?Useful for integrating existing work e.g. How do recommendation list length, diversity and presentation each have an influence on user experience? INFORMATION AND COMPUTER SCIENCES
  36. 36. Framework“Statisticians, like artists, have the bad habit of falling in love with their models.” George Box INFORMATION AND COMPUTER SCIENCES
  37. 37. MeasurementMeasuring subjective valuations
  38. 38. Measurement“To measure satisfaction, we asked users whether they liked the system (on a 5-point rating scale).” INFORMATION AND COMPUTER SCIENCES
  39. 39. MeasurementDoes the question mean the same to everyone? - John likes the system because it is convenient - Mary likes the system because it is easy to use - Dave likes it because the recommendations are goodA single question is not enough to establish content validity We need a multi-item measurement scale INFORMATION AND COMPUTER SCIENCES
  40. 40. MeasurementMeasure at least 5 items per concept At least 2-3 should remain after removing bad itemsIt is most convenient to statements to which the participantcan agree/disagree on a 5- or 7-point scale: “I am satisfied with this system.” Completely disagree, somewhat disagree, neutral, somewhat agree, completely agree INFORMATION AND COMPUTER SCIENCES
  41. 41. MeasurementUse both positively and negatively phrased items - They make the questionnaire less “leading” - They help filtering out bad participants - They explore the “flip-side” of the scaleThe word “not” is easily overlooked Bad: “The recommendations were not very novel” Better: “The recommendations were not very novel” Best: “The recommendations felt outdated” INFORMATION AND COMPUTER SCIENCES
  42. 42. MeasurementChoose simple over specialized words Participants may have no idea they are using a “recommender system”Use as few words as possible But always use complete sentences INFORMATION AND COMPUTER SCIENCES
  43. 43. MeasurementUse design techniques that help to improve recall andprovide appropriate time references Bad: “I liked the signup process” (one week ago) Good: Provide a screenshot Best: Ask the question immediately afterwardsAvoid double-barreled items Bad: “The recommendations were relevant and fun” INFORMATION AND COMPUTER SCIENCES
  44. 44. Measurement“We asked users ten 5-point scale questions and summed the answers.” INFORMATION AND COMPUTER SCIENCES
  45. 45. MeasurementIs the scale really measuring a single thing? - 5 items measure satisfaction, the other 5 convenience - The items are not related enough to make a reliable scaleAre two scales really measuring different things? - They are so closely related that they actually measure the same thingWe need to establish convergent and discriminant validity This makes sure the scales are unidimensional INFORMATION AND COMPUTER SCIENCES
  46. 46. MeasurementSolution: factor analysis - Define latent factors, specify how items “load” on them - Factor analysis will determine how well the items “fit” - It will give you suggestions for improvementBenefits of factor analysis: - Establishes convergent and discriminant validity - Outcome is a normally distributed measurement scale - The scale captures the “shared essence” of the items INFORMATION AND COMPUTER SCIENCES
  47. 47. Measurementcollct22 ctrl28collct23 gipc16 gipc17 gipc18 gipc19 gipc21 ctrl29collct24 ctrl30collct25 1 1 ctrl31 General Control Collection ctrl32collct26 privacy 1 concerns concerns concernscollct27 INFORMATION AND COMPUTER SCIENCES
  48. 48. Measurementcollct22 gipc17 gipc18 gipc21collct24collct25 1 1 ctrl28 General Control Collectioncollct26 privacy 1 concerns concerns ctrl29 concernscollct27 INFORMATION AND COMPUTER SCIENCES
  49. 49. AnalysisStatistical evaluation of results
  50. 50. Analysis Perceived qualityManipulation -> perception: 1 Do these two algorithms 0.5 lead to a different level of 0 perceived quality? -0.5T-test -1 A B INFORMATION AND COMPUTER SCIENCES
  51. 51. Analysis System effectiveness 2Perception -> experience: 1 Does perceived quality 0 influence system effectiveness? -1Linear regression -2 -3 0 3 Recommendation quality INFORMATION AND COMPUTER SCIENCES
  52. 52. AnalysisManipulation -> perception-> experience personalized recommendations OSA Does the algorithm influence effectiveness + via perceived quality? perceived recommendation quality perceived system effectiveness SSA + EXPPath model INFORMATION AND COMPUTER SCIENCES
  53. 53. Analysis Perceived quality 0.6Two manipulations -> low diversificationperception: 0.5 high diversification What is the combined 0.4 effect of list diversity and 0.3 list length on perceived 0.2 recommendation quality? 0.1Factorial ANOVA 0 5 items 10 items 20 items Willemsen et al.: “Not just more of the same”, submitted to TiiS INFORMATION AND COMPUTER SCIENCES
  54. 54. Analysis System usefulness 2Manipulation x personal case-based PE attribute-based PEcharacteristic -> outcome 1 Do experts and novices 0 rate these two interaction methods differently in -1 terms of usefulness? -2 -3 0 3ANCOVA Domain knowledge Knijnenburg & WIllemsen: “Understanding the Effect of Adaptive Preference Elicitation Methods“, RecSys2009 INFORMATION AND COMPUTER SCIENCES
  55. 55. AnalysisUse only one method: structural equation models - A combination of Factor Analysis and Path Models - The statistical method of the 21st century - Statistical evaluation of causal effects - Report as causal paths INFORMATION AND COMPUTER SCIENCES
  56. 56. AnalysisWe compared threerecommender systems Three different algorithmsMF-I and MF-E algorithmsare more effective Why? INFORMATION AND COMPUTER SCIENCES
  57. 57. AnalysisKnijnenburg et al.: “Explaining the user experience of recommender systems”, UMUAI 2012The mediating variables show the entire story INFORMATION AND COMPUTER SCIENCES
  58. 58. Analysis participants ageMatrix Factorization recommender with Matrix Factorization recommender with PC 1.352 (.222)explicit feedback (MF-E) implicit feedback (MF-I) p < .001(versus generally most popular; GMP) (versus most popular; GMP) OSA OSA + Number of items rated .356 (.175) .310 (.148) + p < .05 + p < .05 20.094 (7.600) p < .01 + 1.250 (.183) .846 (.127) p < .001 p < .001 perceived recommendation perceived recommendation perceived system variety + quality + effectiveness SSA SSA EXP .856 (.132) + .314 (.111) 1.865 (.729) + p < .001 p < .01 + p < .05 Number of items Number of items Rating clicked in the rated higher than of clicked items player the predicted rating Knijnenburg et al.: “Explaining the user experience of recommender systems”, UMUAI 2012 INFORMATION AND COMPUTER SCIENCES
  59. 59. Analysis“All models are wrong, but some are useful.” George Box INFORMATION AND COMPUTER SCIENCES
  60. 60. Some examplesFindings based on the framework
  61. 61. Some examplesGiving users many options: - increases the chance they find something nice - also increases the difficulty of making a decisionResult: choice overloadSolution: - Provide short, diversified lists of recommendations INFORMATION AND COMPUTER SCIENCES
  62. 62. Some examples Top-20 Lin-20 vs Top-5 recommendations vs Top-5 recommendations OSA OSA .455 (.211) .612 (.220) .879 (.265) -.804 (.230) p < .05 p < .01 p < .001 p < .001 − + + +perceived recommendation + perceived recommendation + choice variety quality difficulty .894 (.287) .503 (.090) .336 (.089) p < .005 SSA p < .001 SSA p < .001 EXP + + 1.151 (.161) -.417 (.125) .181 (.075) .205 (.083) p < .001 p < .005 p < .05 p < .05 + − movie choice expertise satisfaction PC EXP INFORMATION AND COMPUTER SCIENCES
  63. 63. Some examples INFORMATION AND COMPUTER SCIENCES
  64. 64. Some examples Perceived quality Choice difficulty Choice satisfaction0.6 1.5 20.5 1 1.50.4 0.50.3 1 00.2 0.50.1 -0.5 0 -1 0 5 items 10 items 20 items 5 items 10 items 20 items 5 items 10 items 20 items low diversification high diversification INFORMATION AND COMPUTER SCIENCES
  65. 65. Some examplesExperts and novices differ in their decision-making process - Experts want to leverage their domain knowledge - Novices make decisions by choosing from examplesResult: they want different preference elicitation methodsSolution: Tailor the preference elicitation method to the user A needs-based approach seems to work well for everyone INFORMATION AND COMPUTER SCIENCES
  66. 66. Some examples INFORMATION AND COMPUTER SCIENCES
  67. 67. Some examples System usefulness 2 case-based PE attribute-based PE 1 0 -1 -2 -3 0 3 Domain knowledge INFORMATION AND COMPUTER SCIENCES
  68. 68. Some examples 2! 2! 45! *! 40! User interface satisfaction! 35! 1! 1! *! ***! Understandability! 30! TopN! TopN! TopN!Control! Sort! Sort! 25! Sort! 0! 0! -2! -1! 0! 1! 2! Explicit! -2! -1! 0! 1! 2! Explicit! Explicit! 20! **! Implicit! Implicit! Implicit! Hybrid! Hybrid! 15! Hybrid! -1! -1! 10! 5! -2! -1! 0! 1! 2! -2! -2! Domain Knowledge! Domain Knowledge! Domain Knowledge! 2! 2! ***! *! 1! Perceived system effectiveness! 1! 1! **! 1! *! Choice satisfaction! TopN! TopN! Sort! Sort! 0! 0! -2! -1! 0! 1! 2! Explicit! -2! -1! 0! 1! 2! Explicit! Implicit! Implicit! Hybrid! Hybrid! -1! -1! -2! -2! Domain Knowledge! Domain Knowledge! INFORMATION AND COMPUTER SCIENCES
  69. 69. Some examples 2! *! 1! TopN!Control! Sort! 0! -2! -1! 0! 1! 2! Explicit! Implicit! Hybrid! -1! -2! Domain Knowledge! INFORMATION AND COMPUTER SCIENCES
  70. 70. Some examples 2! 1! *!Understandability! TopN! Sort! 0! -2! -1! 0! 1! 2! Explicit! Implicit! Hybrid! -1! -2! Domain Knowledge! INFORMATION AND COMPUTER SCIENCES
  71. 71. Some examples 45! 40!User interface satisfaction! 35! 30! ***! TopN! 25! Sort! Explicit! 20! **! Implicit! 15! Hybrid! 10! 5! -2! -1! 0! 1! 2! Domain Knowledge! INFORMATION AND COMPUTER SCIENCES
  72. 72. Some examples 2!Perceived system effectiveness! ***! 1! 1! **! TopN! Sort! 0! -2! -1! 0! 1! 2! Explicit! Implicit! Hybrid! -1! -2! Domain Knowledge! INFORMATION AND COMPUTER SCIENCES
  73. 73. Some examples 2! *! 1! 1! *!Choice satisfaction! TopN! Sort! 0! -2! -1! 0! 1! 2! Explicit! Implicit! Hybrid! -1! -2! Domain Knowledge! INFORMATION AND COMPUTER SCIENCES
  74. 74. Some examplesSocial recommenders limit the neighbors to the users’friendsResult: - More control: friends can be rated as well - Better inspectability: Recommendations can be tracedWe show that each of these effects exist They both increase overall satisfaction INFORMATION AND COMPUTER SCIENCES
  75. 75. Some examplesRate likes Weigh friends INFORMATION AND COMPUTER SCIENCES
  76. 76. Some examples INFORMATION AND COMPUTER SCIENCES
  77. 77. Some examples Personal Characteristics (PC) Familiarity with Music Trusting recommenders expertise propensity 0.166 (0.077)* −0.332 (0.088)***Objective System Aspects + Subjective System Aspects (SSA) − User Experience (EXP) (OSA) + Perceived 0.375 !2(2) = 10.70** (0.094)*** + Understandability control 0.205 0.257 item: 0.428 (0.207)* (R2 = .153) 0.377 friend: 0.668 (0.206)** (R2 = .311) (0.100)* (0.124)* (0.074)*** 0.955 Control + + (0.148)*** item/friend vs. no control 0.459 + 0.770 + + + + + (0.148)** (0.094)*** !2(2) = 10.81** 0.231 0.249 Perceived Satisfaction item: −0.181 (0.097)1 (0.114)* (0.049)*** recommendation with the system friend: −0.389 (0.125)** quality 0.410 + (R2 = .696) (R2 = .512) (0.092)*** + − 0.148 Inspectability (0.051)** −0.152 (0.063)* full graph vs. list only − Interaction (INT) 0.288 Inspection 0.323 (0.091)** + time (min) (0.031)*** (R2 = .092) + + number of known + Average rating recommendations (R2 = .508) 0.695 (0.304)* (R2 = .044) 0.067 (0.022)** INFORMATION AND COMPUTER SCIENCES
  78. 78. IntroductionI discussed how to do user experiments with my frameworkBeyond the 5 starsThe Netflix approach is a step in the right directionTowards user experienceWe need to measure subjective valuations as wellEvaluation frameworkWith the framework you can put this all togetherMeasurementMeasure subjective valuations with questionnaires and factor analysisAnalysisUse structural equation models to test comprehensive causal modelsSome examplesI demonstrated the findings of some studies that used this approach
  79. 79. See you in Dublin! bart.k@uci.edu www.usabart.nl @usabart
  80. 80. ResourcesUser-centric evaluation Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C.: Explaining the User Experience of Recommender Systems. UMUAI 2012. Knijnenburg, B.P., Willemsen, M.C., Kobsa, A.: A Pragmatic Procedure to Support the User-Centric Evaluation of Recommender Systems. RecSys 2011.Choice overload Willemsen, M.C., Graus, M.P., Knijnenburg, B.P., Bollen, D.: Not just more of the same: Preventing Choice Overload in Recommender Systems by Offering Small Diversified Sets. Submitted to TiiS. Bollen, D.G.F.M., Knijnenburg, B.P., Willemsen, M.C., Graus, M.P.: Understanding Choice Overload in Recommender Systems. RecSys 2010. INFORMATION AND COMPUTER SCIENCES
  81. 81. ResourcesPreference elicitation methods Knijnenburg, B.P., Reijmer, N.J.M., Willemsen, M.C.: Each to His Own: How Different Users Call for Different Interaction Methods in Recommender Systems. RecSys 2011. Knijnenburg, B.P., Willemsen, M.C.: The Effect of Preference Elicitation Methods on the User Experience of a Recommender System. CHI 2010. Knijnenburg, B.P., Willemsen, M.C.: Understanding the effect of adaptive preference elicitation methods on user satisfaction of a recommender system. RecSys 2009.Social recommenders Knijnenburg, B.P., Bostandjiev, S., ODonovan, J., Kobsa, A.: Inspectability and Control in Social Recommender Systems. RecSys 2012. INFORMATION AND COMPUTER SCIENCES
  82. 82. ResourcesUser feedback and privacy Knijnenburg, B.P., Kobsa, A: Making Decisions about Privacy: Information Disclosure in Context-Aware Recommender Systems. Submitted to TiiS. Knijnenburg, B.P., Willemsen, M.C., Hirtbach, S.: Getting Recommendations and Providing Feedback: The User-Experience of a Recommender System. EC- Web 2010. INFORMATION AND COMPUTER SCIENCES

×