Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Improving user experience in
recommender systems
How latent feature diversification
can decrease choice difficulty and
imp...
Information overload
Recommendation
Choice Overload in Recommenders
Recommenders reduce information
overload…
But large personalized sets cause choice
overloa...
Choice Overload
Seminal example of choice overload
Satisfaction decreases with larger sets as increased
attractiveness is ...
Choice Overload in Recommenders
(Bollen, Knijnenburg, Willemsen & Graus, RecSys 2010)
perceived recommendation
variety
per...
A solution: diversification
Tradeoff between similarity and diversity (Smyth &
McClave, 2001) in finding relevant items
Di...
Goal of the current research
Extend existing work by:
Diversification based on psychological mechanisms
Test the algorithm...
User-Centric Framework
Computers Scientists (and marketing researchers) like to study
behavior…. (they hate asking the use...
User-Centric Framework
Psychologists and HCI people are mostly interested in experience…
User-Centric Framework
Though it helps to triangulate experience with behavior…
User-Centric Framework
Our framework adds the intermediate construct of perception that explains
why behavior and experien...
User-Centric Framework
And adds personal
and situational
characteristics
Relations modeled
using factor analysis
and SEM
K...
Latent feature diversification
from Psy to CS
Joint work with Mark Graus and
Bart Knijnenburg
Psychology behind choice overload
More options provide more benefits in terms of finding the
right option…
…but result in ...
Research on Choice overload
Choice overload is not omnipresent
Meta-analysis (Scheibehenne et al., JCR 2010)
suggests an o...
Diversification and attractiveness
Camera:
Suppose Peter thinks
resolution (MP) and Zoom
are equally important
user vector...
Choice Difficulty and Diversity
Larger sets are often more difficult because of the
increased uniformity of these sets (Fa...
Choice difficulty and trade-offs
As item sets become more diverse (less dense) tradeoff
size increases
Tradeoffs are effor...
Double Mediation Model for difficulty
(Scholten and Sherman, JEP:G 2006)
U-shaped relation between diversity and difficult...
Features in Matrix Factorization
Latent Features as means
of diversification!
“Latent features are Preference
dimensions r...
Explaining Matrix Factorization
Map users and items to a joint latent factor space of
dimensionality f
Each item is a vect...
Matrix Factorization algorithms
each user a vector pu Each item is a vector qi
Usual
Suspects
Titanic
DieHard
Godfather
Ja...
Two-dimensional Latent feature space and
diversification
Jack
Mark
Olivia Dylan
Greedy Diversity Algorithm
10-dimensional MF model
Take personalized top-200
Low: closest to centroid
Greedy algorithm
Sel...
Greedy Diversity Algorithm
10-dimensional MF model
Take personalized top-200
Low: closest to centroid
Greedy algorithm
Sel...
The algorithm
Measure of density: AFSR
Density/tradeoffs on the features: capture how items are distributed
in the feature...
Selection of initial set
Top-200 was selected as a balanced initial set:
Large differences in AFSR scores for the 3 levels...
System characteristics
MF recommender based on MyMedia project
10M MovieLens dataset: movies from 1994
5.6M ratings for 70...
Study 1a: Check diversification algorithm
Does our diversification affects the subjective experiences
with the recommendat...
Study 1a: design/procedure
Pre-questionnaire
Personal characteristics
Rating task to train the system (10 ratings)
Assess ...
Study 1a: Participants and Manipulation
checks
97 Participants from an online database
Paid for participation
Mean age: 29...
Study 1a: Structural Equation Model
Study 1a: Structural Equation Model
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
low mid high
standardizedscore
diversification
attractiv...
Study 1a: Conclusions & Discussion
Diversifying on latent features
Increases attractiveness/diversity
Reduces trade-off di...
Study 1b: No choice satisfaction
In Study 1a, no actual choice was made
Explains limited effect of number of items
We coul...
Study 1b: design/procedure
Pre-questionnaire
Personal characteristics
Rating task to train the system (10 ratings)
Choose ...
Questionnaire-items
Perceived recommendation diversity
5 items, e.g. “The list of movies was varied”
Perceived recommendat...
Study 1b: Structural Equation Model
Study 1b: Structural Equation Model
-0.5
0.0
0.5
1.0
1.5
2.0
5 10 20
standardizedscore
list length
Diversity
low diversity...
Study 1b: Results
Long list is more difficult (cognitive and objective effort
(hovers)) but also more satisfying in itself...
But…
Our studies show that low or high diversification from the
centroid of Top-200 works
However, these sets were optimiz...
Properties of the item sets
Set size diversification Avg. AFSR Avg. Rank
5
High (α = 1) 1.380 78.284
Medium (α = 0.3) 1.09...
Design/procedure Study 2
159 Participants from an online database
Rating task to train the system (15 ratings)
Choose one ...
Structural Equation Model
Perceived Diversity & attractiveness
Perceived Diversity increases with
Diversification
Similarly for 5 and 20 items
Perc....
Choice Characteristics
Chosen option (mean and std. err)
Set
Diversity
List Position Rating Rank
5 items
None (top 5) 3.60...
Conclusions
Reducing Choice difficulty and overload
Diversity reduces choice difficulty
Less uniform sets are easier to ch...
What you should take away…
Psychological theory can inform new ways of diversifying
algorithm output or eliciting preferen...
Questions?
Contact:
Martijn Willemsen
@MCWillemsen
M.C.Willemsen@tue.nl
www.martijnwillemsen.nl
Thanks to my co-authors:
M...
Upcoming SlideShare
Loading in …5
×

Improving user experience in recommender systems

752 views

Published on

How latent feature diversification can decrease choice difficulty and improve choice satisfaction
Talk at Graz University of Technology - Nov 3, 2015

Published in: Engineering
  • Be the first to comment

Improving user experience in recommender systems

  1. 1. Improving user experience in recommender systems How latent feature diversification can decrease choice difficulty and improve choice satisfaction Martijn Willemsen Talk at Institute for Software technology, Nov 3, 2015 Graz University of Technology
  2. 2. Information overload Recommendation
  3. 3. Choice Overload in Recommenders Recommenders reduce information overload… But large personalized sets cause choice overload! Top-N of all highly ranked items What should I choose? These are all very attractive!
  4. 4. Choice Overload Seminal example of choice overload Satisfaction decreases with larger sets as increased attractiveness is counteracted by choice difficulty More attractive 3% sales Less attractive 30% sales Higher purchase satisfaction From Iyengar and Lepper (2000)
  5. 5. Choice Overload in Recommenders (Bollen, Knijnenburg, Willemsen & Graus, RecSys 2010) perceived recommendation variety perceived recommendation quality Top-20 vs Top-5 recommendations movie expertise choice satisfaction choice difficulty + + + + -+ .401 (.189) p < .05 .170 (.069) p < .05 .449 (.072) p < .001 .346 (.125) p < .01 .445 (.102) p < .001 -.217 (.070) p < .005 Objective System Aspects (OSA) Subjective System Aspects (SSA) Experience (EXP) Personal Characteristics (PC) Interaction (INT) Lin-20 vs Top-5 recommendations + + - + .172 (.068) p < .05 .938 (.249) p < .001 -.540 (.196) p < .01 -.633 (.177) p < .001 .496 (.152) p < .005 -0.1 0 0.1 0.2 0.3 0.4 0.5 Top-5 Top-20 Lin-20 Choice satisfaction
  6. 6. A solution: diversification Tradeoff between similarity and diversity (Smyth & McClave, 2001) in finding relevant items Diversification remedies high similarity of Top-N lists But diversification reduces the overall quality (accuracy) of the list Many studies only look at data not at real users Simulated users interact more efficiently with a diverse set (Bridge & Kelly 2006) Some studies look at how actual users perceive and evaluate diversity Ziegler et al. (2005): diversification based on external ontology Diversity reduced the accuracy of the recommendations But coverage and satisfaction increased!
  7. 7. Goal of the current research Extend existing work by: Diversification based on psychological mechanisms Test the algorithm with real users and measure their perceptions and experiences with the algorithm Outline of the talk Our user-centric evaluation framework Psychology behind choice overload Latent Feature diversification algorithm Two user studies to test our diversification
  8. 8. User-Centric Framework Computers Scientists (and marketing researchers) like to study behavior…. (they hate asking the user or just cannot (AB tests))
  9. 9. User-Centric Framework Psychologists and HCI people are mostly interested in experience…
  10. 10. User-Centric Framework Though it helps to triangulate experience with behavior…
  11. 11. User-Centric Framework Our framework adds the intermediate construct of perception that explains why behavior and experiences changes due to our manipulations
  12. 12. User-Centric Framework And adds personal and situational characteristics Relations modeled using factor analysis and SEM Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C. (2012). Explaining the User Experience of Recommender Systems. User Modeling and User-Adapted Interaction (UMUAI), vol 22, p. 441-504 http://bit.ly/umuai
  13. 13. Latent feature diversification from Psy to CS Joint work with Mark Graus and Bart Knijnenburg
  14. 14. Psychology behind choice overload More options provide more benefits in terms of finding the right option… …but result in higher costs/effort Objective effort: More comparisons required Cognitive effort: Increased potential regret Larger expectations for larger sets Many tradeoffs
  15. 15. Research on Choice overload Choice overload is not omnipresent Meta-analysis (Scheibehenne et al., JCR 2010) suggests an overall effect size of zero Choice overload stronger when: No strong prior preferences Little difference in attractiveness items Prior studies did not control for the diversity of the item set Can we reduce choice difficulty and overload by using personalized diversified item sets? While controlling for attractiveness…
  16. 16. Diversification and attractiveness Camera: Suppose Peter thinks resolution (MP) and Zoom are equally important user vector shows preference direction Equi-preference line: Set of equally attractive options (orthogonal on user vector) Diversify on the equi- preference line!
  17. 17. Choice Difficulty and Diversity Larger sets are often more difficult because of the increased uniformity of these sets (Fasolo et al., 2009; Reutskaja et al., 2009) Larger item sets have many similar options small inter-product distances and small tradeoffs High density! Choice Difficulty related to lack of justification 0 5 10 15 20 0 10 20Resolution(MP) Zoom High Density small tradeoffs
  18. 18. Choice difficulty and trade-offs As item sets become more diverse (less dense) tradeoff size increases Tradeoffs are effortful… give up one aspect for another But can be justified very easily! 0 5 10 15 20 0 10 20 Resolution(MP) Zoom High Density small tradeoffs 0 5 10 15 20 0 10 20 Resolution(MP) Zoom Low Density large tradeoffs
  19. 19. Double Mediation Model for difficulty (Scholten and Sherman, JEP:G 2006) U-shaped relation between diversity and difficulty: Choosing from uniform set is hard to justify but has no difficult tradeoffs Choosing from a diverse set encompasses difficult tradeoffs but is easy to justify Does this also apply to personalized item sets? How can we apply this to recommender system algorithms? What features to diversify on? Difficulty diversity uniform diverse
  20. 20. Features in Matrix Factorization Latent Features as means of diversification! “Latent features are Preference dimensions related to real world concepts (e.g. Escapist/serious)” (Koren, Bell and Volinsky,2009) Users and items described as vectors of latent features Parallel to how choice sets are described in MAUT(multi-attribute utility theory) in consumer psychology
  21. 21. Explaining Matrix Factorization Map users and items to a joint latent factor space of dimensionality f Each item is a vector qi each user a vector pu Predicted rating r of user u for item i: How to find the dimensions? SVD: singular value decomposition u T iui pqr ˆ
  22. 22. Matrix Factorization algorithms each user a vector pu Each item is a vector qi Usual Suspects Titanic DieHard Godfather Jack Dylan Olivia Mark ? ? ? ? ? ? ? pu Dim1 Dim2 Jack 3 -1 Dylan 1.4 .2 Olivia -2.5 -.8 Mark -2 -1.5 qi Usual Suspects Titanic DieHard Godfather Dim 1 1.6 -1 5 0.2 Dim 2 1 1 .3 -.2
  23. 23. Two-dimensional Latent feature space and diversification Jack Mark Olivia Dylan
  24. 24. Greedy Diversity Algorithm 10-dimensional MF model Take personalized top-200 Low: closest to centroid Greedy algorithm Select K items with highest inter-item distance (using city-block) Medium: select maximally diverse from 100 items closest to centroid High: from all items in top-200
  25. 25. Greedy Diversity Algorithm 10-dimensional MF model Take personalized top-200 Low: closest to centroid Greedy algorithm Select K items with highest inter-item distance (using city-block) Medium: select maximally diverse from 100 items closest to centroid High: from all items in top-200
  26. 26. The algorithm Measure of density: AFSR Density/tradeoffs on the features: capture how items are distributed in the feature space. Average Factor Score Range (AFSR) based on the density metric used by Fasolo et al. (2009) X is set of items i D is number of features Captures the distribution of items in the feature space and their tradeoffs better than standard similarity measures
  27. 27. Selection of initial set Top-200 was selected as a balanced initial set: Large differences in AFSR scores for the 3 levels of diversification High average predicted rating: 4.48 Range in predicted ratings (0.546) lower than error of MF model: MAE = .656 (RMSE = .854) Final check: How does attractiveness vary within and between the sets? more diverse sets are by nature more likely to capture high-ranked items… Set Size Diversification Rating AFSR 5 Low 4.505 0.295 Medium 4.529 0.634 High 4.561 1.210 20 Low 4.537 0.586 Medium 4.558 1.005 High 4.604 1.615
  28. 28. System characteristics MF recommender based on MyMedia project 10M MovieLens dataset: movies from 1994 5.6M ratings for 70k users and 5.4k movies RMSE of 0.854, MAE of 0.656 Movies shown with title and predicted rating: hovering the mouse over the title reveals additional information: short synopsis, cast, director and image
  29. 29. Study 1a: Check diversification algorithm Does our diversification affects the subjective experiences with the recommendations? Do people perceive the diversity? Does it affects attractiveness and choice difficulty? Each participant inspects 3 lists (low, mid and high diversification), order counterbalanced No choices made from the list
  30. 30. Study 1a: design/procedure Pre-questionnaire Personal characteristics Rating task to train the system (10 ratings) Assess three lists of recommendations Within subjects: low / mid / high diversification Between subjects: number of items (5,10,15,20,25) After each list we measured: Perceived Diversity & Attractiveness Expected Trade-Off Difficulty & Choice Difficulty
  31. 31. Study 1a: Participants and Manipulation checks 97 Participants from an online database Paid for participation Mean age: 29 years, 52 females and 45 males Low, medium and high diversification differed in the feature score range Average predicted ratings of the sets were not different! diversity Average AFSR (SE) Avg. Predicted rating (SE) Low 0.959 (0.015) 4.486 (0.042) Medium 1.273 (0.016) 4.486 (0.041) High 1.744 (0.024) 4.527 (0.039)
  32. 32. Study 1a: Structural Equation Model
  33. 33. Study 1a: Structural Equation Model 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 low mid high standardizedscore diversification attractiveness diversity -1 -0.8 -0.6 -0.4 -0.2 0 0.2 low mid high scaledifference diversification choice diff. tradeoff diff.
  34. 34. Study 1a: Conclusions & Discussion Diversifying on latent features Increases attractiveness/diversity Reduces trade-off difficulty (high) Reduces choice difficulty (linearly) No evidence for U-shaped difficulty model High diversity does not result in trade-off conflicts (perhaps due to the nature of the domain/MF?) No effect of number on items Small sets benefit as much from diversification Diversification on MF features seems promising to increase attractiveness! -1 -0.8 -0.6 -0.4 -0.2 0 0.2 low mid high scaledifference diversification choice diff. tradeoff diff.
  35. 35. Study 1b: No choice satisfaction In Study 1a, no actual choice was made Explains limited effect of number of items We could not measure choice satisfaction or justification-based processes Diversification and list length as two factors in a new experiment with choice (and choice satisfaction) Item size: 5, 10 and 20 Low and high diversity (no medium) We expect choice difficulty to be more prominent for low diversity sets
  36. 36. Study 1b: design/procedure Pre-questionnaire Personal characteristics Rating task to train the system (10 ratings) Choose one item from a list of recommendations Between subjects: 2 levels (low / high diversification) X 3 lengths (5, 10 or 20 items) Afterwards we measured: Perceived Diversity & Attractiveness Choice Difficulty and Choice satisfaction 87 Participants from an online database Paid for participation Mean age: 29Y, 41F, 46M
  37. 37. Questionnaire-items Perceived recommendation diversity 5 items, e.g. “The list of movies was varied” Perceived recommendation attractiveness 5 items, e.g. “The list of recommendations was attractive” Choice satisfaction 6 items, e.g. “I think I would enjoy watching the chosen movie” Choice difficulty 5 items, e.g.: “It was easy to select a movie”
  38. 38. Study 1b: Structural Equation Model
  39. 39. Study 1b: Structural Equation Model -0.5 0.0 0.5 1.0 1.5 2.0 5 10 20 standardizedscore list length Diversity low diversity high diversity 0.0 0.5 1.0 1.5 2.0 2.5 5 10 20 standardizedscore list length Satisfaction low diversity high diversity
  40. 40. Study 1b: Results Long list is more difficult (cognitive and objective effort (hovers)) but also more satisfying in itself Diversity influences choice satisfaction in important ways Diversity increases attractiveness and reduces difficulty These can increase satisfaction (but only for short lists) Our diversification improved the 5-item list 5 diverse items are as satisfactory as 10 or 20 items and less difficult! Less effort needed (hovers) Using latent feature diversification one does not need long item lists… 0.0 0.5 1.0 1.5 2.0 2.5 5 10 20 standardizedscore list length Satisfaction low diversity high diversity
  41. 41. But… Our studies show that low or high diversification from the centroid of Top-200 works However, these sets were optimized for diversity, not for prediction accuracy Most item sets thus not contain the best predicted items So how does this compare to standard Top-N lists? Slight modification of our algorithm: diversify starting from the best predicted item in the top-N set (Top-1) rather than the centroid Diversification and list length as two factors in a choice overload experiment list sizes: 5 and 20 Diversification: none (top 5/20), medium, high
  42. 42. Properties of the item sets Set size diversification Avg. AFSR Avg. Rank 5 High (α = 1) 1.380 78.284 Medium (α = 0.3) 1.096 7.484 Top-N (α = 0) 0.774 3.000 20 High (α = 1) 1.793 89.380 Medium (α = 0.3) 1.486 17.849 Top-N (α = 0) 1.270 10.500 Algorithm balances between max diversity and highest rank Every iteration: weigh (1-α) highest rank against highest diversification (α) α=0: top-N, α=1: max diverse. (α=0.3 gives good medium diversity)
  43. 43. Design/procedure Study 2 159 Participants from an online database Rating task to train the system (15 ratings) Choose one item from a list of recommendations Between subjects: 3 levels of diversification, 2 lengths Afterwards we measured: Perceptions: Perceived Diversity & Attractiveness Experience: Choice Difficulty and Choice satisfaction Behavior: total views / unique items considered
  44. 44. Structural Equation Model
  45. 45. Perceived Diversity & attractiveness Perceived Diversity increases with Diversification Similarly for 5 and 20 items Perc. Diversity increases attractiveness Perceived difficulty goes down with diversification Perceived attractiveness goes up with diversification Diverse 5 item set excels… Just as satisfying as 20 items Less difficult to choose from Less cognitive load…! -0.5 0 0.5 1 1.5 none med high standardizedscore diversification Perc. Diversity 5 items 20 items -0.2 0 0.2 0.4 0.6 0.8 1 none med high standardizedscore diversification Choice Satisfaction 5 items 20 items
  46. 46. Choice Characteristics Chosen option (mean and std. err) Set Diversity List Position Rating Rank 5 items None (top 5) 3.60 (0.27) 4.51 (0.07) 3.60 (0.27) Medium 4.41 (0.59) 4.41 (0.07) 14.52 (5.37) High 4.19 (0.27) 4.30 (0.07) 77.59 (12.76) 20 items None (top 20) 10.15 (0.92) 4.45 (0.05) 10.15 (0.92) Medium 10.33 (1.18) 4.40 (0.08) 17.7 (2.68) high 9.93 (1.07) 4.16 (0.07) 72.22 (11.84) With higher diversity, no difference in position of chosen option Resulting in less ‘optimal’ choice in terms of predicted rating Without a reduction in choice satisfaction!
  47. 47. Conclusions Reducing Choice difficulty and overload Diversity reduces choice difficulty Less uniform sets are easier to choose from Latent feature diversification easy to implement Diversity can improve choice satisfaction Even when the diversified list has movies with lower predicted ratings than standard top-N lists No need for larger item sets Offering personalized diversified small items sets might be the key to help decision makers cope with too much choice! Psychological theory can inform how to improve the output of Recommender algorithms
  48. 48. What you should take away… Psychological theory can inform new ways of diversifying algorithm output or eliciting preferences But also: working with recommenders and algorithms we could enhance psychological theory: personalized item sets gives control User-centric evaluation helps to assess the effectiveness Lot of work… and we need user studies… But linking subjective to objective measures might help future studies that cannot do user studies User-centric framework allows us to understand WHY particular approaches work or not Concept of mediation: user perception helps understanding..
  49. 49. Questions? Contact: Martijn Willemsen @MCWillemsen M.C.Willemsen@tue.nl www.martijnwillemsen.nl Thanks to my co-authors: Mark Graus Bart Knijnenburg

×