SlideShare a Scribd company logo
1 of 13
A User-centric Evaluation   of Recommender Algorithms for   an Event Recommendation System Simon   Dooms, Toon De Pessemier, Luc Martens @sidooms
Introduction ,[object Object],[object Object],[object Object],10/23/2011 Simon Dooms - Ghent University - UCERSTI 2  Recommender System? Concl. Results Questions Algorithms Feedback Experiment Intro
The Experiment Concl. Results Questions Algorithms Feedback Intro Experiment 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2  Invitation mail DAY 1 Closed questionnaire DAY 56 DAY 41 End tracking DAY 45 Send out recs DAY 50 Reminder mail Reminder mail DAY 28
Feedback Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2
Algorithms ,[object Object],[object Object],[object Object],[object Object],[object Object],Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2
Questionnaire Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2  Accuracy Familiarity Novelty Diversity Transparency Satisfaction Trust Usefulness
Results ,[object Object],Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2  232 users 193 users
Average scores per question and algorithm Concl. Results Questions Algorithms Feedback Experiment Intro Accuracy Familiarity Novelty  Diversity  Trans.  Satisfaction  Trust  Usefulness 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2
Average scores per question and algorithm Concl. Results Questions Algorithms Feedback Experiment Intro Accuracy Satisfaction 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2
Correlations Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2  Accuracy Familiarity Novelty Diversity Transparency Satisfaction Trust Usefulness
Regressions Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2
Simon   Dooms, Toon De Pessemier, Luc Martens A User-centric Evaluation of Recommender Algorithms for  an Event Recommendation System With the support of  IWT Vlaanderen and FWO Vlaanderen

More Related Content

More from Simon Dooms

PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
PhD Defense: Dynamic Generation of Personalized Hybrid Recommender SystemsPhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
PhD Defense: Dynamic Generation of Personalized Hybrid Recommender SystemsSimon Dooms
 
An online evaluation of explicit feedback mechanisms for recommender systems
An online evaluation of explicit feedback mechanisms for recommender systemsAn online evaluation of explicit feedback mechanisms for recommender systems
An online evaluation of explicit feedback mechanisms for recommender systemsSimon Dooms
 
Dynamic generation of personalized hybrid recommender systems
Dynamic generation of personalized hybrid recommender systemsDynamic generation of personalized hybrid recommender systems
Dynamic generation of personalized hybrid recommender systemsSimon Dooms
 
Improving IMDb Movie Recommendations with Interactive Settings and Filters
Improving IMDb Movie Recommendations with Interactive Settings and FiltersImproving IMDb Movie Recommendations with Interactive Settings and Filters
Improving IMDb Movie Recommendations with Interactive Settings and FiltersSimon Dooms
 
RecSys Challenge 2014 Workshop Introduction
RecSys Challenge 2014 Workshop IntroductionRecSys Challenge 2014 Workshop Introduction
RecSys Challenge 2014 Workshop IntroductionSimon Dooms
 
Mining Cross-Domain Rating Datasets from Structured Data on Twitter
Mining Cross-Domain Rating Datasets from Structured Data on TwitterMining Cross-Domain Rating Datasets from Structured Data on Twitter
Mining Cross-Domain Rating Datasets from Structured Data on TwitterSimon Dooms
 
MovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterMovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterSimon Dooms
 
Caching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsCaching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsSimon Dooms
 
A File-Based Approach for Recommender Systems in High-Performance Computing E...
A File-Based Approach for Recommender Systems in High-Performance Computing E...A File-Based Approach for Recommender Systems in High-Performance Computing E...
A File-Based Approach for Recommender Systems in High-Performance Computing E...Simon Dooms
 

More from Simon Dooms (9)

PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
PhD Defense: Dynamic Generation of Personalized Hybrid Recommender SystemsPhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
 
An online evaluation of explicit feedback mechanisms for recommender systems
An online evaluation of explicit feedback mechanisms for recommender systemsAn online evaluation of explicit feedback mechanisms for recommender systems
An online evaluation of explicit feedback mechanisms for recommender systems
 
Dynamic generation of personalized hybrid recommender systems
Dynamic generation of personalized hybrid recommender systemsDynamic generation of personalized hybrid recommender systems
Dynamic generation of personalized hybrid recommender systems
 
Improving IMDb Movie Recommendations with Interactive Settings and Filters
Improving IMDb Movie Recommendations with Interactive Settings and FiltersImproving IMDb Movie Recommendations with Interactive Settings and Filters
Improving IMDb Movie Recommendations with Interactive Settings and Filters
 
RecSys Challenge 2014 Workshop Introduction
RecSys Challenge 2014 Workshop IntroductionRecSys Challenge 2014 Workshop Introduction
RecSys Challenge 2014 Workshop Introduction
 
Mining Cross-Domain Rating Datasets from Structured Data on Twitter
Mining Cross-Domain Rating Datasets from Structured Data on TwitterMining Cross-Domain Rating Datasets from Structured Data on Twitter
Mining Cross-Domain Rating Datasets from Structured Data on Twitter
 
MovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterMovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitter
 
Caching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsCaching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systems
 
A File-Based Approach for Recommender Systems in High-Performance Computing E...
A File-Based Approach for Recommender Systems in High-Performance Computing E...A File-Based Approach for Recommender Systems in High-Performance Computing E...
A File-Based Approach for Recommender Systems in High-Performance Computing E...
 

Recently uploaded

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Recently uploaded (20)

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

A User-centric Evaluation of Recommender Algorithms for an Event Recommendation System

  • 1. A User-centric Evaluation of Recommender Algorithms for an Event Recommendation System Simon Dooms, Toon De Pessemier, Luc Martens @sidooms
  • 2.
  • 3. The Experiment Concl. Results Questions Algorithms Feedback Intro Experiment 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2 Invitation mail DAY 1 Closed questionnaire DAY 56 DAY 41 End tracking DAY 45 Send out recs DAY 50 Reminder mail Reminder mail DAY 28
  • 4. Feedback Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2
  • 5.
  • 6. Questionnaire Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2 Accuracy Familiarity Novelty Diversity Transparency Satisfaction Trust Usefulness
  • 7.
  • 8. Average scores per question and algorithm Concl. Results Questions Algorithms Feedback Experiment Intro Accuracy Familiarity Novelty Diversity Trans. Satisfaction Trust Usefulness 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2
  • 9. Average scores per question and algorithm Concl. Results Questions Algorithms Feedback Experiment Intro Accuracy Satisfaction 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2
  • 10. Correlations Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2 Accuracy Familiarity Novelty Diversity Transparency Satisfaction Trust Usefulness
  • 11. Regressions Concl. Results Questions Algorithms Feedback Experiment Intro 10/23/2011 Simon Dooms - Ghent University - UCERSTI 2
  • 12.
  • 13. Simon Dooms, Toon De Pessemier, Luc Martens A User-centric Evaluation of Recommender Algorithms for an Event Recommendation System With the support of IWT Vlaanderen and FWO Vlaanderen

Editor's Notes

  1. I am Simon Dooms and I work at Ghent University in Belgium together with Toon De Pessemier and our supervisor Luc Martens. Our presentation is about a user-centric evaluation experiment where we wanted to find an optimal recommender system for a big events website in Belgium.
  2. So we had this Belgium website that contained over 30,000 cultural events including movie releases, theater shows, exhibitions, fairs and so on. How do we help people that are browsing the website to find the stuff they are really interested in without searching for it themselves? For that we use a recommender system. But as Guy Shani put it on the tutorial in recsys last year (2010), there is no silver bullet. There is no universal recommender system that we can put into this website that is guaranteed to deliver the best result. So to find out what system and more specifically, what algorithm would be best for this situation, we need to try out some and evaluate them in some way. We opted for a user-centric evaluation with 5 different recommender algorithms and this presentation is about how we did this and what results we came up with.
  3. Let me first show you the timeline of the experiment. Suppose that this arrow reflects the duration of the experiment from beginning till the end. First thing we did was recruiting users for our experiment. We did this by sending out an invitation mail to all subscribers of the newsletter and we put banners on the website to attract people’s attention. If someone wanted to take part in the experiment, they had to create an account on the website and click a checkbox that indicated we could use their browsing data as a bases for recommendations. We explained them that we would track their behavior on the website for at least 30 days and would then generate some recommendations based on that data. They would then be asked to fill out a questionnaire regarding the quality of the recommendations. After we started recruiting users for the experiment, we started tracking them on the website and logged any data we found relevant. 28 days after the first invitation we send out a reminder to the users that registered for the experiment, but had not been active on the website so far. 41 days after the first mail we wrapped up our data and … used this as input to 5 different recommendation algorithms. Some days later we alerted the users of the availability of their recommendations and they were asked to complete a questionnaire about the quality of them. Again we reminded the non-responsive users by mail. 56 days after the start of the experiment, we closed the online questionnaire and started analyzing the results. But before I get into that, first some more details about the setup of the experiment.
  4. Feedback. So we tracked our users over a period of 41 days. In that period we wanted to learn about their preferences and behavioral patterns, so let me show you exactly what we logged. This is what an event detail page looks like. Every one of the 30,000 events has a page like this. It contains some detailed information about the event itself, like the title, short description, date, location, prices and so on. Now, some activities you can do on this page, actually indicate a user preference for this event. Like for example, clicking the like button, the share on Facebook and Twitter buttons, mailing this event to a friend, printing this event, looking at the itinerary, asking for more dates and locations of this event and finally clicking this link which will show even more detailed info about the event. Every one of these activities indicates some user preference towards this event. If you click on like we are absolutely certain that a user likes this, but what if he mails or prints this event? We have tried to put a value on every possible feedback indicator ranging from 1 if we are absolutely sure that a user likes the event, down to .3 if we are far less sure. We assigned this .3 value to the activity of browsing to this event page. To aggregate multiple feedback values that a user may have expressed on the same event, we used the max function. This means that if a user first browses to this event page, the system will log an interest of .3 for this event, if he also prints this event, the system will log a .6 feedback value. So the maximum value is always the final value. So that’s what we did the first 41 days. We logged all this data.
  5. Then we wanted to use this data as input for some recommendation algorithms. Remember that we wanted to try out multiple algorithms and compare them in a user-centric way. So we took 5 very common recommendation algorithms and provided each of these with the collected input. Because the setting of this online experiment allowed the gathering of both user feedback and item description data, we were able to implement both content-based and collaborative filtering algorithms and that also gave us the opportunity to implement a simple hybrid recommender which combines the best recommendations of both. Every algorithm was asked to generate 8 recommendations for every user in the experiment. When this was finished we matched every user randomly to any of the algorithms that was able to generate a list of at least 8 recommendations. We had to be careful, because not every algorithm was able to generate such recommendations for every user. If for example a user did not show a sufficient amount of overlap with other users, then user-based collaborative filtering will have a hard time recommending something to this user. By involving the random recommender in the experiment, there was always at least 1 algorithm that was able to provide recommendations for every user, and it of course also provided a nice baseline for the comparison with the other algorithms.
  6. When the recommendations were in place, the participants of the experiment were asked to fill out a questionnaire about the quality of the recommendations they got. To come up with good questions about the quality of the recommendation system, we looked into the ResQue framework that Pearl Pu presented last year (2010) in this very workshop. We selected a total of 14 questions regarding various topics that we found to be of interest. Specifically for this research there were 8 relevant ones. You can find them in the paper. Here we show the qualitative attributes that the questions are addressing. Our main goal was to found out which recommender would have the highest user satisfaction and if possible why? This is something that we cannot learn from offline analysis using metrics like precision and recall.
  7. First about the number of users. We sent out invitations for this experiment to almost 60 thousand registered users of the website. We had an initial response of about 1%. So about 600 users indicated that they wanted to partake in the experiment. We logged all of these users for 41 days and generated recommendations for them. Unfortunately of these 612 users, only 232 actually filled out the questionnaire. To prevent any sort of bias we only wanted to consider users that every algorithm was able to generate recommendations for. In that case every user has the same chance of being matched with any of the 5 available algorithms. The downside of this was that we had to eliminate another 39 users. These 193 users were randomly matched with a recommendation algorithm, and each of these filled out the questionnaire.
  8. A very easy and visual way to see these responses is to show the averages of every question and for every algorithm. So at the x-axis we find the questions, at the y-axis the average score that was given. Note that the questions about Diversity and Transparency were in a reverse scale, so lower means more. By just looking at these average scores we can already distinguish a clear overall winner. The green color reaches the highest values for almost every question except for the familiarity and the diversity. The winner, so to say, for diversity clearly was the random algorithm which is of course all but a shocking fact. If we were to appoint a runner-up second best algorithm, it would probably be the purple/pink color which stands for the user-based collaborative filtering. This makes sense of course, since the hybrid is a merge of UBCF and CB. Another observation we can make is that SVD is actually a clear loser according to this chart. If you pay attention to the turquoise color you can see that it almost always ranks lowest together with the random recommender. More surprisingly in the case of the accuracy question it ranks EVEN LOWER than the random recommender. So people were under the impression that the random recommender was more able to provide accurate suggestions than the SVD algorithm was. I have to add however, that this result was as you can see from the confidence intervals not found to be statistically significant. Still funny though. We looked closer into this observation and found out that the answers to the questions about the SVD algorithm were widely separated between very good and very bad. So by average this will show pretty average (bad) results. The low scores that SVD obtained may be coming from users with limited profiles and so, but we have not yet fully explored this idea.
  9. The last thing I want to show you with this graph is that there is some hint of correlation between the questions. If you look at the Accuracy question and the Satisfaction question for example, these are probably highly correlated because they look almost identical.
  10. Now we thought it would be interesting to find out which questions were correlated and which weren’t and so we computed the complete correlation matrix. This is what it looks like. As a correlation metric we used the two-tailed Pearson correlation, so values are between 1 and minus 1. A zero value indicates no correlation, 1 and -1 indicate positive and negative correlations respectively. Our suspicion that accuracy and satisfaction were highly correlated now turns out to be true. If we look more closely at the correlation values we can in fact see that question Q8 (satisfaction) correlates with most of the other questions except for Q5 which dealt with diversity. A similar trend can be noticed for Q10 and Q13. When we look at the correlation values of the diversity question Q5, we see a different situation. It turns out that the answers to the diversity question were completely unrelated with any other question in the experiment. We found this to be a rather surprising observation. We must be careful to not confuse correlation with causality but still, data suggests a strong relation between user satisfaction, accuracy, and trust. To get one step closer to causality we performed a simple regression analysis.
  11. In such an analysis we try to predict an attribute by using all the other ones as input to the regression function. You can find more details in the paper about the specific method we used, for now I will just show you the results. At the left side of the arrow is the attribute we are trying to predict, at the right side the attributes that the regression method came up with. Each of these attributes had their own coefficients of course, but I left them out to simplify. The R squared between brackets is called the coefficient of determination and it indicates how well the proposed model fits, values between 0 and 1. 1 being a perfect fit. Let’s zoom in on the most relevant line which is the line for question Q8 where the user satisfaction is predicted. This shows that we can in fact predict the user satisfaction based on accuracy and transparency. We consider the Q10 and Q13 questions about trust and usefulness more as a result of Q8 rather than influencers. This would in fact explain the remarkable low scores of the SVD algorithm in our experiment. Because the inner workings of the SVD algorithm are the most obscure (the most black box), this algorithm will have a low transparency and therefore a low user satisfaction. If we look at how the diversity (Q5) was predicted, we notice the same trend as we did on the last slide. It seems that diversity can not be correlated with any other attribute.
  12. Time to conclude what we have learned. We started off describing our online user-centric evaluation experiment where we implemented 5 popular recommendation algorithms and had users evaluate them by means of a questionnaire based on the framework of Pearl Pu. The hybrid recommender which combined content-based and collaborative filtering recommendations turned out to be the overall best algorithm. While SVD surprisingly came up last, sometimes even after the random recommender. We came up with two possible explanations for this observation. One is that opinions about the algorithm were divided between very good and very bad, leaving only an average end result. The very bad opinions may then have been caused by insufficient user profiles. For the second reason, I need my next bullet point which is that a combination of accuracy and transparency seemed to be defining influencers of the user satisfaction in the end. If we keep in mind that SVD is a very black box type of algorithm then it is clear that its transparency will be very low and therefore possibly the user satisfaction linked to that. And finally to conclude or conclusion it seems like the users of the experiment did not value the diversity of a recommendation list a much at the other aspects of the recommendation system. We are planning to explore this further by means of some focus groups allowing us to focus more on the reasoning behind some of the results we have presented today.
  13. And hereby I conclude my presentation, I hope you found it interesting and if you have any questions, feel free to contact me.