SlideShare a Scribd company logo
Recommendations
and feedback
The user-experience of a
recommender system




                           Where innovation starts
Acknowledgements
Martijn Willemsen
Eindhoven University of Technology


Stefan Hirtbach
European Microsoft Innovation Center GmbH



MyMedia
European Commission FP7 project
Beyond algorithms
Two premises for succesful
recommender systems




                             Where innovation starts
Recommender systems
Recommend items to users
based on their stated preferences
(e.g. books, movies, laptops)


Users indicate preferences
by rating presented items
(e.g. from one to five stars)


Predict the users’ rating value of new items...
then present items with the highest predicted rating
Current situation



 More   Better   Better
                 experience
Two premises
Premise 1 | Users want to receive
recommendations
Do recommendations have any effect on the user experience at all?
Compare a system with vs. without recommendations


Premise 2 | Users will provide preference
feedback
Without feedback, no recommendations
What causes - and inhibits - them to do this?
Analyze users’ feedback behavior and intentions
Evaluating the
user experience
Hypotheses based on
existing research




                      Where innovation starts
Effect of
Premise 1 | Users want to receive
recommendations

Users are able to notice differences in prediction
accuracy
But... higher accuracy can lead to lower usefulness of
recommendations


Distinction between perception and evaluation
of recommendation quality
Constructs and
Perception
Perceived recommendation
quality
                                                                                User experience


Evaluation                       Personalized vs.
                                     random
                                                                 H2a        +      Choice
                                                                                 satisfaction
Choice satisfaction                       H1    +   Perceived recom-
Perceived system effectiveness                      mendation quality

                                                                            Perceived system
                                                                 H2b    +     e ectiveness

Questionnaires and
process data
Feedback
Premise 2 | Users will provide preference
feedback

Satisfaction increases feedback intentions
However, only a minority is willing to give up personal information
in return for a personalized experience (Teltzrow & Kobsa)


Privacy decreases feedback intentions
However, most people are usually or always comfortable disclosing
personal taste preferences (Ackerman et al.)
Constructs and
Feedback
Willingness to provide feedback
                                                 User experience


                                                                         H3a
Privacy                                             Choice
                                                  satisfaction

System-specific privacy
concerns                                                                               +
                                                Perceived system                Intention to
Trust in technology                               e ectiveness     H3b   +    provide feedback




Process data               General trust
                           in technology
                                           H4   System-specific
                                                privacy concerns
                                                                         H5

Actual feedback behavior
A model of user

                                               User experience

Personalized vs.
    random
                                H2a        +      Choice               H3a
                                                satisfaction
         H1    +   Perceived recom-
                   mendation quality                                                 +
                                           Perceived system                   Intention to
                                H2b    +     e ectiveness        H3b   +    provide feedback




              General trust     H4         System-specific              H5
              in technology                privacy concerns
Experiment
Test
with actual recommender system


Two versions of the              Personalized vs.
                                                                                User experience


                                     random
                                                                 H2a        +      Choice               H3a

system:                                   H1    +   Perceived recom-
                                                                                 satisfaction


                                                    mendation quality                                                 +
One that provides personalized                                              Perceived system                   Intention to
                                                                                                        +
recommendations                                                  H2b    +     e ectiveness        H3b        provide feedback



One that provides random clips                 General trust     H4         System-specific              H5
as ‘recommendations’                           in technology                privacy concerns
An online
experiment
Testing the hypotheses using
the Microsoft ClipClub
system




                           Where innovation starts
Setup
Online experiment
Conducted by EMIC in Germany,
September and October, 2009
Two slightly modified versions
of
the MSN ClipClub system


43 participants
25 in the random and 18 in the
personalized condition
65% male, all German
Average age of 31 (SD = 9.45)
System
Microsoft ClipClub
Lifestyle & entertainment video
clips


Changes
Recommendations section
highlighted
Pre-experimental instruction


Rating probe
No rating for five minutes: ask
user to rate the current item
Employed algorithm
Vector Space Model Engine
Use the tags associated to a clip to create a vector of each clip
Create a tag vector for the subset of clips rated by the user
Recommends clips with a tag vector similar to the created tag vector
Older ratings are logarithmically discounted, as are older items
Experimental procedure
Each participant:
entered demographic details
was shown an instruction on how to use the system
used the system freely for at least 30 minutes
completed the questionnaires
entered an email address for the raffle


Rating items
Users could perpetually rate items and inspect recommendations in
any given order
Rating probe: at least 6 ratings unless ignored
Questionnaires
40 statements                                 Choice satisfaction
                                              9 items, e.g. “The videos I chose fitted my
Agree or disagree on a 5-point                preference”
scale
                                              General trust in technology
Factor Analysis in two batches                4 items, e.g. “I’m less confident when I use
                                              technology”, reverse-coded

                                              System-specific privacy concern
6 factors                                     5 items, e.g. “I feel confident that ClipClub
Recommendation set quality                    respects my privacy”
7 items, e.g. “The recommended videos fitted   Intention to rate items
my preference”
                                              5 items, e.g. “I like to give feedback on the
System effectiveness                          items I’m watching”
6 items, e.g. “The recommender is useless”,
reverse-coded
Process data
All clicks were logged
In order to link subjective metrics to observable behavior


Process data measures
Total viewing-time
Number of clicked clips
Number of completed clips
Number of self-initiated ratings
Number of canceled rating requests
Results
Back to the path model




                         Where innovation starts
Path model results

Personalized vs.
                              .572 (.125)***      Choice             .346 (.125)**
    random
                                   H2a          satisfaction             H3a
         .696 (.276)*   Perceived recom-
             H1         mendation quality

                                               Perceived system                Intention to
                              .515 (.135)***     e ectiveness   .296 (.123)* provide feedback
                                   H2b                             H3b


              General trust    -.268 (.156)1   System-specific        -.255 (.113)*
              in technology        H4          privacy concerns           H5
Effect of
                                 Personalized vs.
                                                               .572 (.125)***      Choice
                                     random
Users notice                              .696 (.276)*
                                                                    H2a          satisfaction

                                                         Perceived recom-
personalization                               H1         mendation quality

                                                                                Perceived system
Personalized recommendations                                   .515 (.135)***     e ectiveness
increase perceived                                                  H2b

recommendation quality (H1)
                                    Users browse less, but
Users like better                   watch more
                                    Number of clips watched
recommendations                     entirely is higher in the
Higher perceived quality            personalized condition
increases choice satisfaction       Number of clicked clips and
(H2a) and system effectiveness      total viewing time are negatively
(H2b)                               correlated with system
Feedback
                                                               Choice             .346 (.125)**

Better experience                                            satisfaction             H3a


increases feedback
                                                            Perceived system                Intention to
Choice satisfaction and system                                e ectiveness   .296 (.123)* provide feedback
effectiveness increase feedback                                                 H3b

intentions (H3a,b)
                            General trust   -.268 (.156)1   System-specific        -.255 (.113)*
                            in technology       H4          privacy concerns           H5

Privacy decreases                       Effect of trust in
feedback                                technology
Users with a higher system-             Privacy concerns increase when
specific privacy concern have a          users have a lower trust in
lower feedback intention (H5)           technology (H4).
Intention-behavior gap
Number of canceled rating probes
Significantly lower in the personalized condition
Negatively correlated with intention to provide feedback


Total number of provided ratings
Not significantly correlated with users’ intention to provide feedback
To summarize...

Personalized vs.
                              .572 (.125)***      Choice             .346 (.125)**
    random
                                   H2a          satisfaction             H3a
         .696 (.276)*   Perceived recom-
             H1         mendation quality

                                               Perceived system                Intention to
                              .515 (.135)***     e ectiveness   .296 (.123)* provide feedback
                                   H2b                             H3b


              General trust    -.268 (.156)1   System-specific        -.255 (.113)*
              in technology        H4          privacy concerns           H5
L%3'-&)M&%<("80")            N+%-*")0%'()




                                                         9"+':-%#)
                             ;%<),+")$=$,"3)-&>2"&*"$)3=)-&,"#'*.%&)'&8)3=)?"#*"?.%&),+"#"%@)


                                 IJ?"#-"&*")
Future work
A21B"*.:")$=$,"3)              ;%<)C)?"#*"-:"),+")
     '$?"*,$)                     -&,"#'*.%&)                                C&,"#'*.%&)
;%<)C)?"#*"-:"),+")$=$,"3)                                                /+")%1B"*.:")"E"*,)%@)
                              ;"8%&-*)"J?"#-"&*")
                                                                            2$-&0),+")$=$,"3)
   C&,"#'*.%&)2$'1-(-,=)
Lessons learned, new ideas
                    K$"@2(&"$$)                                                  !2#*+'$"7:-"<)
    !"#*"-:"8)D2'(-,=)
                                      /#2$,)                                      A=$,"3)2$")
         O??"'()
                              F2,*%3")":'(2'.%&)




                                                !"#$%&'()*+'#'*,"#-$.*$)
                                                 /+-&0$)'1%2,)3")4,+',)3'5"#6)

                              /#2$,78-$,#2$,)         Where innovation starts N%&,#%()
                                                       A%*-'()@'*,%#$)
Remaining questions
True for all recommender systems?
Results should be confirmed in several other systems and with a
higher number and a more diverse range of participants


Other influences?
Incorporate other aspects to get a more detailed understanding of
the mechanisms underlying the user-recommender interaction


Other algorithms?
Test differences between algorithms that only moderately differ in
accuracy
Consider a framework
                                                                        A-,2'.%&'()*+'#'*,"#-$.*$)
                                                                    /+-&0$)'1%2,),+")$-,2'.%&)4,+',)3'5"#6)

                                                                 L%3'-&)M&%<("80")                 N+%-*")0%'()




                                                                                 9"+':-%#)
                                                     ;%<),+")$=$,"3)-&>2"&*"$)3=)-&,"#'*.%&)'&8)3=)?"#*"?.%&),+"#"%@)

F1B"*.:")$=$,"3)                                         IJ?"#-"&*")
    '$?"*,$)            A21B"*.:")$=$,"3)              ;%<)C)?"#*"-:"),+")
G+',),+")$=$,"3)8%"$)        '$?"*,$)                     -&,"#'*.%&)                                     C&,"#'*.%&)
                        ;%<)C)?"#*"-:"),+")$=$,"3)                                                     /+")%1B"*.:")"E"*,)%@)
 P"*%33"&8'.%&$)                                      ;"8%&-*)"J?"#-"&*")
                                                                                                         2$-&0),+")$=$,"3)
                           C&,"#'*.%&)2$'1-(-,=)
     C&,"#'*.%&)                                           K$"@2(&"$$)                                    !2#*+'$"7:-"<)
                            !"#*"-:"8)D2'(-,=)
    N'?'1-(-."$)                                              /#2$,)                                        A=$,"3)2$")
                                 O??"'()
  H2'(-,=)%@)'$$",$)                                  F2,*%3")":'(2'.%&)




                                                                        !"#$%&'()*+'#'*,"#-$.*$)
                                                                         /+-&0$)'1%2,)3")4,+',)3'5"#6)

                                                      /#2$,78-$,#2$,)            A%*-'()@'*,%#$)                  N%&,#%()
Field trails
Full-scale test of the framework
Four different partners, three different countries
Trials are conducted over a longer time-period
Each compares at least three systems (mainly different algorithms)
Questionnaires and process data


Core of evaluation is the same
Algorithm -> perceived recommendation quality -> system
effectiveness
Each partner adds measures of personal interest
Want more?
RecSys’10 workshop
User-Centric Evaluation of Recommender




                                                             attending
Systems and their Interfaces (UCERSTI)
Barcelona, September 26-30




                                                                  I am
Line-up:
7 paper presentations                             !"#$%&'
2 keynotes (Francisco Martin, Pearl Pu)
Panel discussion with 5 prominent researchers
                                                1st internation
                                                               al workshop on
                                                 User-Centric E
                                                                 valuation of
                                                  Recommender
                                                                   Systems
                                                     and Their Inte
                                                                    rfaces

More Related Content

Viewers also liked

Big data - A critical appraisal
Big data - A critical appraisalBig data - A critical appraisal
Big data - A critical appraisal
Bart Knijnenburg
 
Privacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Privacy in Mobile Personalized Systems - The Effect of Disclosure JustificationsPrivacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Privacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Bart Knijnenburg
 
Information Disclosure Profiles for Segmentation and Recommendation
Information Disclosure Profiles for Segmentation and RecommendationInformation Disclosure Profiles for Segmentation and Recommendation
Information Disclosure Profiles for Segmentation and Recommendation
Bart Knijnenburg
 
Simplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Simplifying Privacy Decisions: Towards Interactive and Adaptive SolutionsSimplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Simplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Bart Knijnenburg
 
Explaining the User Experience of Recommender Systems with User Experiments
Explaining the User Experience of Recommender Systems with User ExperimentsExplaining the User Experience of Recommender Systems with User Experiments
Explaining the User Experience of Recommender Systems with User Experiments
Bart Knijnenburg
 
Counteracting the negative effect of form auto-completion on the privacy calc...
Counteracting the negative effect of form auto-completion on the privacy calc...Counteracting the negative effect of form auto-completion on the privacy calc...
Counteracting the negative effect of form auto-completion on the privacy calc...
Bart Knijnenburg
 
Preference-based Location Sharing: Are More Privacy Options Really Better?
Preference-based Location Sharing: Are More Privacy Options Really Better?Preference-based Location Sharing: Are More Privacy Options Really Better?
Preference-based Location Sharing: Are More Privacy Options Really Better?
Bart Knijnenburg
 
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
Bart Knijnenburg
 

Viewers also liked (9)

Big data - A critical appraisal
Big data - A critical appraisalBig data - A critical appraisal
Big data - A critical appraisal
 
Privacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Privacy in Mobile Personalized Systems - The Effect of Disclosure JustificationsPrivacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
Privacy in Mobile Personalized Systems - The Effect of Disclosure Justifications
 
Information Disclosure Profiles for Segmentation and Recommendation
Information Disclosure Profiles for Segmentation and RecommendationInformation Disclosure Profiles for Segmentation and Recommendation
Information Disclosure Profiles for Segmentation and Recommendation
 
Simplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Simplifying Privacy Decisions: Towards Interactive and Adaptive SolutionsSimplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
Simplifying Privacy Decisions: Towards Interactive and Adaptive Solutions
 
Explaining the User Experience of Recommender Systems with User Experiments
Explaining the User Experience of Recommender Systems with User ExperimentsExplaining the User Experience of Recommender Systems with User Experiments
Explaining the User Experience of Recommender Systems with User Experiments
 
Counteracting the negative effect of form auto-completion on the privacy calc...
Counteracting the negative effect of form auto-completion on the privacy calc...Counteracting the negative effect of form auto-completion on the privacy calc...
Counteracting the negative effect of form auto-completion on the privacy calc...
 
Preference-based Location Sharing: Are More Privacy Options Really Better?
Preference-based Location Sharing: Are More Privacy Options Really Better?Preference-based Location Sharing: Are More Privacy Options Really Better?
Preference-based Location Sharing: Are More Privacy Options Really Better?
 
Cohabitation
CohabitationCohabitation
Cohabitation
 
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
FYI: Communication Style Preferences Underlie Differences in Location-Sharing...
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 

Recommendations and Feedback - The user-experience of a recommender system

  • 1. Recommendations and feedback The user-experience of a recommender system Where innovation starts
  • 2. Acknowledgements Martijn Willemsen Eindhoven University of Technology Stefan Hirtbach European Microsoft Innovation Center GmbH MyMedia European Commission FP7 project
  • 3. Beyond algorithms Two premises for succesful recommender systems Where innovation starts
  • 4. Recommender systems Recommend items to users based on their stated preferences (e.g. books, movies, laptops) Users indicate preferences by rating presented items (e.g. from one to five stars) Predict the users’ rating value of new items... then present items with the highest predicted rating
  • 5. Current situation More Better Better experience
  • 6. Two premises Premise 1 | Users want to receive recommendations Do recommendations have any effect on the user experience at all? Compare a system with vs. without recommendations Premise 2 | Users will provide preference feedback Without feedback, no recommendations What causes - and inhibits - them to do this? Analyze users’ feedback behavior and intentions
  • 7. Evaluating the user experience Hypotheses based on existing research Where innovation starts
  • 8. Effect of Premise 1 | Users want to receive recommendations Users are able to notice differences in prediction accuracy But... higher accuracy can lead to lower usefulness of recommendations Distinction between perception and evaluation of recommendation quality
  • 9. Constructs and Perception Perceived recommendation quality User experience Evaluation Personalized vs. random H2a + Choice satisfaction Choice satisfaction H1 + Perceived recom- Perceived system effectiveness mendation quality Perceived system H2b + e ectiveness Questionnaires and process data
  • 10. Feedback Premise 2 | Users will provide preference feedback Satisfaction increases feedback intentions However, only a minority is willing to give up personal information in return for a personalized experience (Teltzrow & Kobsa) Privacy decreases feedback intentions However, most people are usually or always comfortable disclosing personal taste preferences (Ackerman et al.)
  • 11. Constructs and Feedback Willingness to provide feedback User experience H3a Privacy Choice satisfaction System-specific privacy concerns + Perceived system Intention to Trust in technology e ectiveness H3b + provide feedback Process data General trust in technology H4 System-specific privacy concerns H5 Actual feedback behavior
  • 12. A model of user User experience Personalized vs. random H2a + Choice H3a satisfaction H1 + Perceived recom- mendation quality + Perceived system Intention to H2b + e ectiveness H3b + provide feedback General trust H4 System-specific H5 in technology privacy concerns
  • 13. Experiment Test with actual recommender system Two versions of the Personalized vs. User experience random H2a + Choice H3a system: H1 + Perceived recom- satisfaction mendation quality + One that provides personalized Perceived system Intention to + recommendations H2b + e ectiveness H3b provide feedback One that provides random clips General trust H4 System-specific H5 as ‘recommendations’ in technology privacy concerns
  • 14. An online experiment Testing the hypotheses using the Microsoft ClipClub system Where innovation starts
  • 15. Setup Online experiment Conducted by EMIC in Germany, September and October, 2009 Two slightly modified versions of the MSN ClipClub system 43 participants 25 in the random and 18 in the personalized condition 65% male, all German Average age of 31 (SD = 9.45)
  • 16. System Microsoft ClipClub Lifestyle & entertainment video clips Changes Recommendations section highlighted Pre-experimental instruction Rating probe No rating for five minutes: ask user to rate the current item
  • 17. Employed algorithm Vector Space Model Engine Use the tags associated to a clip to create a vector of each clip Create a tag vector for the subset of clips rated by the user Recommends clips with a tag vector similar to the created tag vector Older ratings are logarithmically discounted, as are older items
  • 18. Experimental procedure Each participant: entered demographic details was shown an instruction on how to use the system used the system freely for at least 30 minutes completed the questionnaires entered an email address for the raffle Rating items Users could perpetually rate items and inspect recommendations in any given order Rating probe: at least 6 ratings unless ignored
  • 19. Questionnaires 40 statements Choice satisfaction 9 items, e.g. “The videos I chose fitted my Agree or disagree on a 5-point preference” scale General trust in technology Factor Analysis in two batches 4 items, e.g. “I’m less confident when I use technology”, reverse-coded System-specific privacy concern 6 factors 5 items, e.g. “I feel confident that ClipClub Recommendation set quality respects my privacy” 7 items, e.g. “The recommended videos fitted Intention to rate items my preference” 5 items, e.g. “I like to give feedback on the System effectiveness items I’m watching” 6 items, e.g. “The recommender is useless”, reverse-coded
  • 20. Process data All clicks were logged In order to link subjective metrics to observable behavior Process data measures Total viewing-time Number of clicked clips Number of completed clips Number of self-initiated ratings Number of canceled rating requests
  • 21. Results Back to the path model Where innovation starts
  • 22. Path model results Personalized vs. .572 (.125)*** Choice .346 (.125)** random H2a satisfaction H3a .696 (.276)* Perceived recom- H1 mendation quality Perceived system Intention to .515 (.135)*** e ectiveness .296 (.123)* provide feedback H2b H3b General trust -.268 (.156)1 System-specific -.255 (.113)* in technology H4 privacy concerns H5
  • 23. Effect of Personalized vs. .572 (.125)*** Choice random Users notice .696 (.276)* H2a satisfaction Perceived recom- personalization H1 mendation quality Perceived system Personalized recommendations .515 (.135)*** e ectiveness increase perceived H2b recommendation quality (H1) Users browse less, but Users like better watch more Number of clips watched recommendations entirely is higher in the Higher perceived quality personalized condition increases choice satisfaction Number of clicked clips and (H2a) and system effectiveness total viewing time are negatively (H2b) correlated with system
  • 24. Feedback Choice .346 (.125)** Better experience satisfaction H3a increases feedback Perceived system Intention to Choice satisfaction and system e ectiveness .296 (.123)* provide feedback effectiveness increase feedback H3b intentions (H3a,b) General trust -.268 (.156)1 System-specific -.255 (.113)* in technology H4 privacy concerns H5 Privacy decreases Effect of trust in feedback technology Users with a higher system- Privacy concerns increase when specific privacy concern have a users have a lower trust in lower feedback intention (H5) technology (H4).
  • 25. Intention-behavior gap Number of canceled rating probes Significantly lower in the personalized condition Negatively correlated with intention to provide feedback Total number of provided ratings Not significantly correlated with users’ intention to provide feedback
  • 26. To summarize... Personalized vs. .572 (.125)*** Choice .346 (.125)** random H2a satisfaction H3a .696 (.276)* Perceived recom- H1 mendation quality Perceived system Intention to .515 (.135)*** e ectiveness .296 (.123)* provide feedback H2b H3b General trust -.268 (.156)1 System-specific -.255 (.113)* in technology H4 privacy concerns H5
  • 27. L%3'-&)M&%<("80") N+%-*")0%'() 9"+':-%#) ;%<),+")$=$,"3)-&>2"&*"$)3=)-&,"#'*.%&)'&8)3=)?"#*"?.%&),+"#"%@) IJ?"#-"&*") Future work A21B"*.:")$=$,"3) ;%<)C)?"#*"-:"),+") '$?"*,$) -&,"#'*.%&) C&,"#'*.%&) ;%<)C)?"#*"-:"),+")$=$,"3) /+")%1B"*.:")"E"*,)%@) ;"8%&-*)"J?"#-"&*") 2$-&0),+")$=$,"3) C&,"#'*.%&)2$'1-(-,=) Lessons learned, new ideas K$"@2(&"$$) !2#*+'$"7:-"<) !"#*"-:"8)D2'(-,=) /#2$,) A=$,"3)2$") O??"'() F2,*%3")":'(2'.%&) !"#$%&'()*+'#'*,"#-$.*$) /+-&0$)'1%2,)3")4,+',)3'5"#6) /#2$,78-$,#2$,) Where innovation starts N%&,#%() A%*-'()@'*,%#$)
  • 28. Remaining questions True for all recommender systems? Results should be confirmed in several other systems and with a higher number and a more diverse range of participants Other influences? Incorporate other aspects to get a more detailed understanding of the mechanisms underlying the user-recommender interaction Other algorithms? Test differences between algorithms that only moderately differ in accuracy
  • 29. Consider a framework A-,2'.%&'()*+'#'*,"#-$.*$) /+-&0$)'1%2,),+")$-,2'.%&)4,+',)3'5"#6) L%3'-&)M&%<("80") N+%-*")0%'() 9"+':-%#) ;%<),+")$=$,"3)-&>2"&*"$)3=)-&,"#'*.%&)'&8)3=)?"#*"?.%&),+"#"%@) F1B"*.:")$=$,"3) IJ?"#-"&*") '$?"*,$) A21B"*.:")$=$,"3) ;%<)C)?"#*"-:"),+") G+',),+")$=$,"3)8%"$) '$?"*,$) -&,"#'*.%&) C&,"#'*.%&) ;%<)C)?"#*"-:"),+")$=$,"3) /+")%1B"*.:")"E"*,)%@) P"*%33"&8'.%&$) ;"8%&-*)"J?"#-"&*") 2$-&0),+")$=$,"3) C&,"#'*.%&)2$'1-(-,=) C&,"#'*.%&) K$"@2(&"$$) !2#*+'$"7:-"<) !"#*"-:"8)D2'(-,=) N'?'1-(-."$) /#2$,) A=$,"3)2$") O??"'() H2'(-,=)%@)'$$",$) F2,*%3")":'(2'.%&) !"#$%&'()*+'#'*,"#-$.*$) /+-&0$)'1%2,)3")4,+',)3'5"#6) /#2$,78-$,#2$,) A%*-'()@'*,%#$) N%&,#%()
  • 30. Field trails Full-scale test of the framework Four different partners, three different countries Trials are conducted over a longer time-period Each compares at least three systems (mainly different algorithms) Questionnaires and process data Core of evaluation is the same Algorithm -> perceived recommendation quality -> system effectiveness Each partner adds measures of personal interest
  • 31. Want more? RecSys’10 workshop User-Centric Evaluation of Recommender attending Systems and their Interfaces (UCERSTI) Barcelona, September 26-30 I am Line-up: 7 paper presentations !"#$%&' 2 keynotes (Francisco Martin, Pearl Pu) Panel discussion with 5 prominent researchers 1st internation al workshop on User-Centric E valuation of Recommender Systems and Their Inte rfaces

Editor's Notes

  1. First I want to thank my co-authors and sponsor
  2. Your typical recommender system works like this:
  3. Right now, researchers seem to focus on the algorithmic performance. They believe that better algorithms lead to a better experience. Is that really true?
  4. It can only be true under two assumptions: 1. users want to get personalized recommendations, and 2. they will provide enough feedback to make this possible In order to answer these questions, we need to evaluate the user experience, not the algorithm!
  5. What existing evidence do we have? Increased recommendation accuracy is noticeable, but doesn&amp;#x2019;t always lead to a better UX McNee et al.: algorithm with best predictions was rated least helpful Torres et al.: algorithm with lowest accuracy resulted in highest satisfaction Ziegler et al.: diversifying recommendation set resulted in lower accuracy but a more positive evaluation
  6. Let&amp;#x2019;s say we have two systems, one with personalized recommendations, and one without: Perception tests whether we are able to notice the difference Evaluation tests whether this increases our satisfaction with the system and, ultimately, our choices These are measures by questionnaires, but we can also look at process data: Effective systems may show decreased browsing and overall viewing time In better systems, users will watch more clips from beginning to end
  7. The more beneficial it seems to be, the more feedback users will provide (Spiekermann et al.; Brodie Karat &amp; Karat; Kobsa &amp; Teltzrow) Minority = Between 40 and 50% in an overview of privacy surveys Privacy concerns reduce users&amp;#x2019; willingness to disclose personal information (Metzger et al.; Teltzrow &amp; Kobsa) Most people = 80% of the respondents of a detailed survey Users&amp;#x2019; actual feedback behavior may be different from their intentions (Spiekermann et al.)
  8. So now we look at why users provide preference information We already know choice satisfaction and perceived system effectiveness, and we hypothesize that a better experience increase the intention to provide feedback However, privacy concerns may reduce feedback intention, and privacy concerns may be higher for those who don&amp;#x2019;t trust technology in general Process data: Due to the intention-behavior gap actual feedback may only be moderately correlated to feedback intentions
  9. So let&amp;#x2019;s review the hypotheses (laser-point): Personalized recommendations should have a perceivably higher quality This should in turn increase the user experience of the system and the outcome (choices) A better experience in turn increases their intention to provide feedback However...
  10. Tip: use two conditions to control the causal relations and to single out the effect Also: log behavioral data and triangulate this with the constructs
  11. Content and system are in German To explain the rating feature and its effect on recommendations Opening recommendations before rating any items showed a similar explanation Pps were allowed to close this pop-up without rating After rating, participants were transported to the recommendations
  12. (the length of the vector depends on the impact the tags have) (in terms of cosine similarity)
  13. Allowing ample opportunity to let their feedback behavior be influenced by their user experience Unless they ignored the rating-probe The median number of ratings per user was 15
  14. Tip for UX researchers: you cannot measure UX concepts with a single question. Measurement is far more robust if you construct a scale based on several questions Exploratory Factor Analysis validates the intended conceptual structure Finally, test the model with path analysis (mediation on steroids)
  15. 1,2: browsing (bad) 3: consumption (good) 4, 5: feedback
  16. The model has a good fit, with a non-significant &amp;#x3C7;2 of 13.210 (df = 13, p = .4317), a CFI of .996 and an RMSEA between 0 and 0.153 (90% confidence interval)
  17. Let&amp;#x2019;s review that one more time:
  18. We&amp;#x2019;ve been developing a framework for this type of research, and validated it in several field trials --&gt;
  19. E.g. Advertisement (MS): Less clips clicked (fewer ads started) but maybe a higher retention (more ads full watched)? Watch out for our future papers!
  20. Advantages of fitting a model: steps in between reduce variability!