Recommendations and Feedback - The user-experience of a recommender system

Recommendations
and feedback
The user-experience of a
recommender system

Where innovation starts

Acknowledgements
Martijn Willemsen
Eindhoven University of Technology

Stefan Hirtbach
European Microsoft Innovation Center GmbH

MyMedia
European Commission FP7 project

Beyond algorithms
Two premises for succesful
recommender systems


Recommender systems
Recommend items to users
based on their stated preferences
(e.g. books, movies, laptops)

Users indicate preferences
by rating presented items
(e.g. from one to ﬁve stars)

Predict the users’ rating value of new items...
then present items with the highest predicted rating

Current situation

More Better Better
experience

Two premises
Premise 1 | Users want to receive
recommendations
Do recommendations have any effect on the user experience at all?
Compare a system with vs. without recommendations

Premise 2 | Users will provide preference
feedback
Without feedback, no recommendations
What causes - and inhibits - them to do this?
Analyze users’ feedback behavior and intentions

Evaluating the
user experience
Hypotheses based on
existing research


Effect of
Premise 1 | Users want to receive
recommendations

Users are able to notice differences in prediction
accuracy
But... higher accuracy can lead to lower usefulness of
recommendations

Distinction between perception and evaluation
of recommendation quality

Constructs and
Perception
Perceived recommendation
quality
User experience

Evaluation Personalized vs.
random
H2a + Choice
satisfaction
Choice satisfaction H1 + Perceived recom-
Perceived system effectiveness mendation quality

Perceived system
H2b + e ectiveness

Questionnaires and
process data

Feedback
Premise 2 | Users will provide preference
feedback

Satisfaction increases feedback intentions
However, only a minority is willing to give up personal information
in return for a personalized experience (Teltzrow & Kobsa)

Privacy decreases feedback intentions
However, most people are usually or always comfortable disclosing
personal taste preferences (Ackerman et al.)

Constructs and
Feedback
Willingness to provide feedback
User experience

H3a
Privacy Choice
satisfaction

System-speciﬁc privacy
concerns +
Perceived system Intention to
Trust in technology e ectiveness H3b + provide feedback

Process data General trust
in technology
H4 System-speciﬁc
privacy concerns
H5

Actual feedback behavior

A model of user

User experience

Personalized vs.
random
H2a + Choice H3a
satisfaction
H1 + Perceived recom-
mendation quality +
H2b + e ectiveness H3b + provide feedback

General trust H4 System-speciﬁc H5
in technology privacy concerns

Experiment
Test
with actual recommender system

Two versions of the Personalized vs.
User experience

random
H2a + Choice H3a

system: H1 + Perceived recom-
satisfaction

mendation quality +
One that provides personalized Perceived system Intention to
+
recommendations H2b + e ectiveness H3b provide feedback

One that provides random clips General trust H4 System-speciﬁc H5
as ‘recommendations’ in technology privacy concerns

An online
experiment
Testing the hypotheses using
the Microsoft ClipClub
system


Setup
Online experiment
Conducted by EMIC in Germany,
September and October, 2009
Two slightly modiﬁed versions
of
the MSN ClipClub system

43 participants
25 in the random and 18 in the
personalized condition
65% male, all German
Average age of 31 (SD = 9.45)

System
Microsoft ClipClub
Lifestyle & entertainment video
clips

Changes
Recommendations section
highlighted
Pre-experimental instruction

Rating probe
No rating for ﬁve minutes: ask
user to rate the current item

Employed algorithm
Vector Space Model Engine
Use the tags associated to a clip to create a vector of each clip
Create a tag vector for the subset of clips rated by the user
Recommends clips with a tag vector similar to the created tag vector
Older ratings are logarithmically discounted, as are older items

Experimental procedure
Each participant:
entered demographic details
was shown an instruction on how to use the system
used the system freely for at least 30 minutes
completed the questionnaires
entered an email address for the raffle

Rating items
Users could perpetually rate items and inspect recommendations in
any given order
Rating probe: at least 6 ratings unless ignored

Questionnaires
40 statements Choice satisfaction
9 items, e.g. “The videos I chose fitted my
Agree or disagree on a 5-point preference”
scale
General trust in technology
Factor Analysis in two batches 4 items, e.g. “I’m less confident when I use
technology”, reverse-coded

System-specific privacy concern
6 factors 5 items, e.g. “I feel confident that ClipClub
Recommendation set quality respects my privacy”
7 items, e.g. “The recommended videos fitted Intention to rate items
my preference”
5 items, e.g. “I like to give feedback on the
System effectiveness items I’m watching”
6 items, e.g. “The recommender is useless”,
reverse-coded

Process data
All clicks were logged
In order to link subjective metrics to observable behavior

Process data measures
Total viewing-time
Number of clicked clips
Number of completed clips
Number of self-initiated ratings
Number of canceled rating requests

Results
Back to the path model


Path model results

Personalized vs.
.572 (.125)*** Choice .346 (.125)**
random
H2a satisfaction H3a
.696 (.276)* Perceived recom-
H1 mendation quality

.515 (.135)*** e ectiveness .296 (.123)* provide feedback
H2b H3b

General trust -.268 (.156)1 System-speciﬁc -.255 (.113)*
in technology H4 privacy concerns H5

Effect of
Personalized vs.
.572 (.125)*** Choice
random
Users notice .696 (.276)*
H2a satisfaction

Perceived recom-
personalization H1 mendation quality

Perceived system
Personalized recommendations .515 (.135)*** e ectiveness
increase perceived H2b

recommendation quality (H1)
Users browse less, but
Users like better watch more
Number of clips watched
recommendations entirely is higher in the
Higher perceived quality personalized condition
increases choice satisfaction Number of clicked clips and
(H2a) and system effectiveness total viewing time are negatively
(H2b) correlated with system

Feedback
Choice .346 (.125)**

Better experience satisfaction H3a

increases feedback
Choice satisfaction and system e ectiveness .296 (.123)* provide feedback
effectiveness increase feedback H3b

intentions (H3a,b)

Privacy decreases Effect of trust in
feedback technology
Users with a higher system- Privacy concerns increase when
speciﬁc privacy concern have a users have a lower trust in
lower feedback intention (H5) technology (H4).

Intention-behavior gap
Number of canceled rating probes
Signiﬁcantly lower in the personalized condition
Negatively correlated with intention to provide feedback

Total number of provided ratings
Not signiﬁcantly correlated with users’ intention to provide feedback

To summarize...

Personalized vs.
.572 (.125)*** Choice .346 (.125)**
random
H2a satisfaction H3a
.696 (.276)* Perceived recom-
H1 mendation quality

.515 (.135)*** e ectiveness .296 (.123)* provide feedback
H2b H3b


L%3'-&)M&%<("80") N+%-*")0%'()

9"+':-%#)
;%<),+")$=$,"3)-&>2"&*"$)3=)-&,"#'*.%&)'&8)3=)?"#*"?.%&),+"#"%@)

IJ?"#-"&*")
Future work
A21B"*.:")$=$,"3) ;%<)C)?"#*"-:"),+")
'$?"*,$) -&,"#'*.%&) C&,"#'*.%&)
;%<)C)?"#*"-:"),+")$=$,"3) /+")%1B"*.:")"E"*,)%@)
;"8%&-*)"J?"#-"&*")
2$-&0),+")$=$,"3)
C&,"#'*.%&)2$'1-(-,=)
Lessons learned, new ideas
K$"@2(&"$$) !2#*+'$"7:-"<)
!"#*"-:"8)D2'(-,=)
/#2$,) A=$,"3)2$")
O??"'()
F2,*%3")":'(2'.%&)

!"#$%&'()*+'#'*,"#-$.*$)
/+-&0$)'1%2,)3")4,+',)3'5"#6)

/#2$,78-$,#2$,) Where innovation starts N%&,#%()
A%*-'()@'*,%#$)

Remaining questions
True for all recommender systems?
Results should be conﬁrmed in several other systems and with a
higher number and a more diverse range of participants

Other inﬂuences?
Incorporate other aspects to get a more detailed understanding of
the mechanisms underlying the user-recommender interaction

Other algorithms?
Test differences between algorithms that only moderately differ in
accuracy

Consider a framework
A-,2'.%&'()*+'#'*,"#-$.*$)
/+-&0$)'1%2,),+")$-,2'.%&)4,+',)3'5"#6)

L%3'-&)M&%<("80") N+%-*")0%'()

9"+':-%#)
;%<),+")$=$,"3)-&>2"&*"$)3=)-&,"#'*.%&)'&8)3=)?"#*"?.%&),+"#"%@)

F1B"*.:")$=$,"3) IJ?"#-"&*")
'$?"*,$) A21B"*.:")$=$,"3) ;%<)C)?"#*"-:"),+")
G+',),+")$=$,"3)8%"$) '$?"*,$) -&,"#'*.%&) C&,"#'*.%&)
;%<)C)?"#*"-:"),+")$=$,"3) /+")%1B"*.:")"E"*,)%@)
P"*%33"&8'.%&$) ;"8%&-*)"J?"#-"&*")
2$-&0),+")$=$,"3)
C&,"#'*.%&)2$'1-(-,=)
C&,"#'*.%&) K$"@2(&"$$) !2#*+'$"7:-"<)
!"#*"-:"8)D2'(-,=)
N'?'1-(-."$) /#2$,) A=$,"3)2$")
O??"'()
H2'(-,=)%@)'$$",$) F2,*%3")":'(2'.%&)

!"#$%&'()*+'#'*,"#-$.*$)
/+-&0$)'1%2,)3")4,+',)3'5"#6)

/#2$,78-$,#2$,) A%*-'()@'*,%#$) N%&,#%()

Field trails
Full-scale test of the framework
Four different partners, three different countries
Trials are conducted over a longer time-period
Each compares at least three systems (mainly different algorithms)
Questionnaires and process data

Core of evaluation is the same
Algorithm -> perceived recommendation quality -> system
effectiveness
Each partner adds measures of personal interest

Want more?
RecSys’10 workshop
User-Centric Evaluation of Recommender

attending
Systems and their Interfaces (UCERSTI)
Barcelona, September 26-30

I am
Line-up:
7 paper presentations !"#$%&'
2 keynotes (Francisco Martin, Pearl Pu)
Panel discussion with 5 prominent researchers
1st internation
al workshop on
User-Centric E
valuation of
Recommender
Systems
and Their Inte
rfaces

Recommendations and Feedback - The user-experience of a recommender system

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Recently uploaded

Recently uploaded (20)

Recommendations and Feedback - The user-experience of a recommender system

Editor's Notes