Recommender System Challenges such as the Netflix Prize, KDD Cup, etc. have contributed vastly to the development and adoptability of recommender systems. Each year a number of challenges or contests are organized covering different aspects of recommendation. In this tutorial and panel, we present some of the factors involved in successfully organizing a challenge, whether for reasons purely related to research, industrial challenges, or to widen the scope of recommender systems applications.
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
Keynote for the ACM Intelligent User Interface conference in 2016 in Sonoma, CA. I start with the past by talking about the Recommender Problem, and the Netflix Prize. Then I go into the Present and the Future by talking about approaches that go beyond rating prediction and ranking and by finishing with some of the most important lessons learned over the years. Throughout my talk I put special emphasis on the relation between algorithms and the User Interface.
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
Keynote for the ACM Intelligent User Interface conference in 2016 in Sonoma, CA. I start with the past by talking about the Recommender Problem, and the Netflix Prize. Then I go into the Present and the Future by talking about approaches that go beyond rating prediction and ranking and by finishing with some of the most important lessons learned over the years. Throughout my talk I put special emphasis on the relation between algorithms and the User Interface.
Product Recommendations Enhanced with Reviewsmaranlar
Tutorial presented by Muthusamy Chelliah (Flipkart, India) and Sudeshna Sarkar (IIT Kharagpur, India) at ACM RecSys 2017 https://recsys.acm.org/recsys17/tutorials/#content-tab-1-3-tab
E-commerce websites commonly deploy recommender systems that make use of user activity (e.g., ratings, views, and purchases) or content (product descriptions). These recommender systems can benefit enormously by also exploiting the information contained in customer reviews. Reviews capture the experience of multiple customers with diverse preferences, often on the fine-grained level of specific features of products. Reviews can also identify consumers’ preferences for product features and provide helpful explanations. The usefulness of reviews is evidenced by the prevalence of their use by customers to support shopping decisions online. With the appropriate techniques, recommender systems can benefit directly from user reviews.
This tutorial will present a range of techniques that allow recommender systems in e-commerce websites to take full advantage of reviews. Topics covered include text mining methods for feature-specific sentiment analysis of products, topic models and distributed representations that bridge the vocabulary gap between user reviews and product descriptions, and recommender algorithms that use review information to address the cold-start problem.
The tutorial sessions will be interspersed with examples from an online marketplace (i.e., Flipkart) and experience with using data mining and Natural Language Processing techniques (e.g., matrix factorization, LDA, word embeddings) from Web-scale systems.
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
This is part 1 of the tutorial Xavier and Deepak gave at Recsys 2016 this year. You can find the second part http://www.slideshare.net/xamat/recsys-2016-tutorial-lessons-learned-from-building-reallife-recommender-systems
Product Recommendations Enhanced with Reviewsmaranlar
Tutorial presented by Muthusamy Chelliah (Flipkart, India) and Sudeshna Sarkar (IIT Kharagpur, India) at ACM RecSys 2017 https://recsys.acm.org/recsys17/tutorials/#content-tab-1-3-tab
E-commerce websites commonly deploy recommender systems that make use of user activity (e.g., ratings, views, and purchases) or content (product descriptions). These recommender systems can benefit enormously by also exploiting the information contained in customer reviews. Reviews capture the experience of multiple customers with diverse preferences, often on the fine-grained level of specific features of products. Reviews can also identify consumers’ preferences for product features and provide helpful explanations. The usefulness of reviews is evidenced by the prevalence of their use by customers to support shopping decisions online. With the appropriate techniques, recommender systems can benefit directly from user reviews.
This tutorial will present a range of techniques that allow recommender systems in e-commerce websites to take full advantage of reviews. Topics covered include text mining methods for feature-specific sentiment analysis of products, topic models and distributed representations that bridge the vocabulary gap between user reviews and product descriptions, and recommender algorithms that use review information to address the cold-start problem.
The tutorial sessions will be interspersed with examples from an online marketplace (i.e., Flipkart) and experience with using data mining and Natural Language Processing techniques (e.g., matrix factorization, LDA, word embeddings) from Web-scale systems.
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
This is part 1 of the tutorial Xavier and Deepak gave at Recsys 2016 this year. You can find the second part http://www.slideshare.net/xamat/recsys-2016-tutorial-lessons-learned-from-building-reallife-recommender-systems
Value stream mapping for complex processes (innovation, Lean, service design) Teemu Toivonen
A value steam mapping method for complex processes, which integrates many theory of inventive problem solving and service design concepts to traditional Lean thinking. This method is especially suited for services and digital processes.
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Alan Said
Video available here http://www.youtube.com/watch?v=1jHxGCl8RXc
Recommender systems research is often based on comparisons of predictive accuracy: the better the evaluation scores, the better the recommender.
However, it is difficult to compare results from different recommender systems due to the many options in design and implementation of an evaluation strategy.
Additionally, algorithmic implementations can diverge from the standard formulation due to manual tuning and modifications that work better in some situations.
In this work we compare common recommendation algorithms as implemented in three popular recommendation frameworks.
To provide a fair comparison, we have complete control of the evaluation dimensions being benchmarked: dataset, data splitting, evaluation strategies, and metrics.
We also include results using the internal evaluation mechanisms of these frameworks.
Our analysis points to large differences in recommendation accuracy across frameworks and strategies, i.e. the same baselines may perform orders of magnitude better or worse across frameworks.
Our results show the necessity of clear guidelines when reporting evaluation of recommender systems to ensure reproducibility and comparison of results.
Building Search and Personalization at Nordstrom Rack | HautelookLucidworks
Ecommerce customers are accustomed to an accurate, smooth, and timely search experience. Behind the curtains is a chaotic battle to operationalize a hefty strategy, optimize a fickle infrastructure, and rally troops around a single vision. Hear how Nordstrom Rack | Hautelook braved this with help from Lucidworks Fusion.
Advanced Project Data Analytics for Improved Project DeliveryMark Constable
Data Analytics is already beginning to impact how projects are delivered. We can now automate minute taking and capturing actions, we can use Flow to progress chase, Power BI reduces the burden of reporting.
But we are just scratching the surface. It won’t be long before we can leverage the rich dataset of experience to predict what risks are likely to occur, understand which WBS elements will be susceptible to variance, deduce what the optimum resource profile looks like, define a schedule by leveraging data from those projects that have gone before.
The role of a project professional is about to change dramatically. In this webinar we will explore the challenges and opportunities, and how we should respond. It’s a call-to-action for the community to mobilise, help to reshape project delivery and understand the implications for you and your organisation.
Presenter Martin Paver is a Chartered Project Professional, APM Fellow and Chartered Engineer. In December 2017 he established the London Project Data Analytics meetup which has quickly spread across the UK and expanded to 3000+ members. Martin has major project experience including leading a $billion projects with a team of 220 and a multi-billion PMO with a team of 50. He has a detailed grasp of project management and combines this with a broad understanding of recent developments in the field of data science. He is on a mission to ensure that the project management profession readies itself for a transformed future.
Learning outcomes:
- Understand the implications of advanced data analytics on project delivery
- Understand the scope of which functions it is likely to impact
- Help you to develop a strategy for how you engage with it
- Understand how to leverage the benefits and opportunities that will emerge from it
Presenter:
Martin Paver, CEO & Founder, Projecting Success Ltd
Knowledge Management in Healthcare AnalyticsGregory Nelson
The promise of actionable analytics in healthcare poses an inherent challenge as we seek to accelerate the time it takes to go from question to insight to action. The velocity of change, the demand for bigger data, the allure of advanced algorithms, the need for deeper insights, and the cost of inaction make knowledge capture and reuse an all too allusive goal.
In an evolving environment, healthcare organizations need to find ways to make greater use of prior investments in analytics products by reusing the commonalities of proven designs, metadata, business rules, captured learnings, and collaborative insights and applying them to future analytics products. By doing so in a strategic manner, they will be able to create rapid and efficient analytics processes and better manage time to value and reuse.
In this presentation, authors from two very different health systems with two very different patient populations will share their perspectives of the value of knowledge management and discuss the role of analytics in driving towards a learning health system. The authors will highlight opportunities and challenges using examples across clinical, financial, and operational domains.
The future for performance management, quality and true continuous improvement for local council planning services. Uses much of the data that councils already send to government, supplements it with some new approaches to customer and quality feedback, and brings it all together in one tidy, holistic report.
Modern Perspectives on Recommender Systems and their Applications in MendeleyMaya Hristakeva
Presentation given for one of Pearson's Data Research teams. It motivates the use of recommender systems, describes common approaches to building and evaluating them and gives examples of how they are used in Mendeley. Joint work with Kris Jack, Chief Data Scientist at Mendeley.
The goal of this presentation is to give attendees a deeper understanding of usability testing so they can leverage it in their own work. The material will shed light on what is important to the research buyer and will help the research provider to better understand how to plan, moderate, and report on a usability study. It will also provide information on where they can go to learn more about this very practical qualitative method.
Kay will cover what a usability test is and when to use it, the key planning steps, the language around it, and the unique insights this method produces. She will also discuss the various approaches a market researcher can take when running a usability study at different points in a product’s development (e.g., concept, early prototype, released product).
Similar to Best Practices in Recommender System Challenges (20)
Replication of Recommender Systems ResearchAlan Said
Course held at the 2017 ACM RecSys Summer School at the Free University of Bozen-Bolzano by Alejandro Bellogin (@abellogin) and Alan Said (@alansaid).
http://recommenders.net/rsss2017/
The Magic Barrier of Recommender Systems - No Magic, Just RatingsAlan Said
Recommender Systems need to deal with different types of users who represent their preferences in various ways. This difference in user behaviour has a deep impact on the final performance of the recommender system, where some users may receive either better or worse recommendations depending, mostly, on the quantity and the quality of the information the system knows about the user. Specifically, the inconsistencies of the user impose a lower bound on the error the system may achieve when predicting ratings for that particular user.
In this work, we analyse how the consistency of user ratings (coherence) may predict the performance of recommendation methods. More specifically, our results show that our definition of coherence is correlated with the so-called magic barrier of recommender systems, and thus, it could be used to discriminate between easy users (those with a low magic barrier) and difficult ones (those with a high magic barrier).
We report experiments where the rating prediction error for the more coherent users is lower than that of the less coherent ones.
We further validate these results by using a public dataset, where the magic barrier is not available, in which we obtain similar performance improvements.
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsAlan Said
he evaluation of recommender systems is crucial for their development. In today's recommendation landscape there are many standardized recommendation algorithms and approaches, however, there exists no standardized method for experimental setup of evaluation -- not even for widely used measures such as precision and root-mean-squared error. This creates a setting where comparison of recommendation results using the same datasets becomes problematic. In this paper, we propose an evaluation protocol specifically developed with the recommendation use-case in mind, i.e. the recommendation of one or several items to an end user. The protocol attempts to closely mimic a scenario of a deployed (production) recommendation system, taking specific user aspects into consideration and allowing a comparison of small and large scale recommendation systems. The protocol is evaluated on common recommendation datasets and compared to traditional recommendation settings found in research literature. Our results show that the proposed model can better capture the quality of a recommender system than traditional evaluation does, and is not affected by characteristics of the data (e.g. size. sparsity, etc.).
Information Retrieval and User-centric Recommender System EvaluationAlan Said
Poster describing the ERCIM-funded project on IR- and user-centric recommender system evaluation currently being undertaken in the Information Access group at CWI.
Presented at UMAP 2013.
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...Alan Said
Collaborative filtering recommender systems often use nearest neighbor methods to identify candidate items. In this paper we present an inverted neighborhood model, k-Furthest Neighbors, to identify less ordinary neighborhoods for the purpose of creating more diverse recommendations. The approach is evaluated two-fold, once in a traditional information retrieval evaluation setting where the model is trained and validated on a split train/test set, and once through an online user study (N=132) to identify users’ erceived quality of the recommender. A standard k-nearest neighbor recommender is used as a baseline in both evaluation settings. our evaluation shows that even though the proposed furthest neighbor model is outperformed in the traditional evaluation setting, the perceived usefulness of the algorithm shows no significant difference in the results of the user study.
A 3D Approach to Recommender System EvaluationAlan Said
In this work we describe an approach at multi-objective recommender system evaluation based on a previously introduced 3D benchmarking model. The benchmarking model takes user-centric, business-centric and technical constraints into consideration in order to provide a means of comparison of recommender algorithms in similar scenarios. We present a comparison of three recommendation algorithms deployed in a user study using this 3D model and compare to standard evaluation methods. The proposed approach simplifies benchmarking of recommender systems and allows for simple multi-objective comparisons.
Estimating the Magic Barrier of Recommender Systems: A User StudyAlan Said
Recommender systems are commonly evaluated by trying to predict known, withheld, ratings for a set of users. Measures such as the Root-Mean-Square Error are used to estimate the quality of the recommender algorithms. This process does however not acknowledge the inherent rating inconsistencies of users. In this paper we present the first results from a noise measurement user study for estimating the magic barrier of recommender systems conducted on a commercial movie recommendation community. The magic barrier is the expected squared error of the optimal recommendation algorithm, or, the lowest error we can expect from any recommendation algorithm. Our results show that the barrier can be estimated by collecting the opinions of users on already rated items.
Users and Noise: The Magic Barrier of Recommender SystemsAlan Said
Recommender systems are crucial components of most commercial websites to keep users satisfied and to increase revenue. Thus, a lot of effort is made to improve recommendation accuracy. But when is the best possible performance of the recommender reached? The magic barrier, refers to some unknown level of prediction accuracy a recommender system can attain. The magic barrier reveals whether there is still room for improving prediction accuracy or indicates that further improvement is meaningless. In this work, we present a mathematical characterization of the magic barrier based on the assumption that user ratings are afflicted with inconsistencies - noise. In a case study with a commercial movie recommender, we investigate the inconsistencies of the user ratings and estimate the magic barrier in order to assess the actual quality of the recommender system.
Using Social- and Pseudo-Social Networks to Improve Recommendation QualityAlan Said
Short paper presentation at the workshop on Intelligent Techniques from Web Personalization (ITWP2011) at the International Joint Conference on Artificial Intelligence - IJCAI-11, IJCAI2011
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Key Trends Shaping the Future of Infrastructure.pdf
Best Practices in Recommender System Challenges
1. Recommender Systems
Challenges
Best Practices
Tutorial & Panel
ACM RecSys 2012
Dublin
September 10, 2012
2. About us
• Alan Said - PhD Student @ TU-Berlin
o Topics: RecSys Evaluation
o @alansaid
o URL: www.alansaid.com
• Domonkos Tikk - CEO @ Gravity R&D
o Topics: Machine Learning methods for RecSys
o @domonkostikk
o http://www.tmit.bme.hu/tikk.domonkos
• Andreas Hotho - Prof. @ Uni. Würzburg
o Topics: Data Mining, Information Retrieval, Web Science
o http://www.is.informatik.uni-wuerzburg.de/staff/hotho
3. General Motivation
"RecSys is nobody's home conference. We
come from CHI, IUI, SIGIR, etc."
Joe Konstan - RecSys 2010
RecSys is our home conference - we
should evaluate accordingly!
4. Outline
• Tutorial
o Introduction to concepts in challenges
o Execution of a challenge
o Conclusion
• Panel
Experiences of participating in and
organizing challenges
Yehuda Koren
Darren Vengroff
Torben Brodt
5. What is the motivation
for RecSys Challenges?
Part 1
7. Motivation of stakeholders
find relevant content
easy navigation
serendipity, discovery
user service
increase revenue
target user with
recom the right content
engage users
facilitate goals of stakeholders
get recognized
8. Evaluation in terms of the business
business
reporting
Online evaluation
(A/B test)
Casting into a
research problem
9. Context of the contest
• Selection of metrics
• Domain dependent
• Offline vs. online evaluation
• IR centric evaluation
o RMSE
o MAP
o F1
11. Recsys Competition Highlights
• Large scale
• Organization
• RMSE
• 3-stage setup • Prize
• selection by review
• runtime limits
• real traffic
• revenue increase
• offline
• MAP@500
• metadata available
• larger in dimensions
• no ratings
12. Recurring Competitions
• ACM KDD Cup (2007, 2011, 2012)
• ECML/PKDD Discovery Challenge (2008
onwards)
o 2008 and 09: tag recommendation in social
bookmarking (incl. online evaluation task)
o 2011: video lectures
• CAMRa (2010, 2011, 2012)
14. Research & Industry
Important for both
• Industry has the data and research needs
data
• Industry needs better approaches but this
costs
• Research has ideas but has no systems
and/or data to do the evaluation
Don't exploit participants
Don't be too greedy
16. Standard Challenge Setting
• organizer defines the recommender setting e.g.
tag recommendation in BibSonomy
• provide data
o with features or
o raw data
o construct your own data
• fix the way to do the evaluation
• define the goal e.g. reach a certain
improvement (F1)
• motivate people to participate:
e.g. promise a lot of money ;-)
17. Typical contest settings
• offline
o everyone gets access to the dataset
o in principle it is a prediction task, the user can't be influenced
o privacy of the user within the data is a big issue
o results from offline experimentation have limited predictive power
for online user behavior
• online
o after a first learning phase the recommender is plugged into a real
system
o user can be influenced but only by the selected system
o comparison of different system is not completely fair
• further ways
o user study
18. Example online setting
(BibSonomy)
BALBY MARINHO, L. ; HOTHO, A. ; JÄSCHKE, R. ; NANOPOULOS, A. ; RENDLE, S. ; SCHMIDT-THIEME, L. ; STUMME, G. ; SYMEONIDIS, P.:
Recommender Systems for Social Tagging Systems : SPRINGER, 2012 (SpringerBriefs in Electrical and Computer Engineering). - ISBN 978-1-
4614-1893-1
19. Which evaluation measures?
• Root Mean Squared Error (RMSE)
• Mean Absolute Error (MAE)
• Typical IR measures
o precision @ n-items
o recall @ n-items
o False Positive Rate
o F1 @ n-items
o Area Under the ROC Curve (AUC)
• non-quality measures
o server answer time
o understandability of the results
20. Discussion of measures?
RMSE - Precision
• RMSE is not necessarily the king of metrics
as RMSE is easy to optimize on
• What about Top-n?
• but RMSE is not influenced by popularity as
top-n
• What about user-centric stuff?
• Ranking-based measure in KDD Cup 2011,
Track 2
21. Results influenced by ...
• target of the recommendation (user, resources, etc...)
• evaluation methodology (leave-one-out, time based split, random
sample, cross validation)
• evaluation measure
• design of the application (online setting)
• the selected part of the data and its preprocessing (e.g.
p-core vs. long tail)
• scalability vs. quality of the model
• feature and content accessible and usable for the
recommendation
22. Don't forget..
• the effort to organize a challenge is very big
• preparing data takes time
• answering questions takes even more time
• participants are creative, needs for reaction
• time to compute the evaluation and check the
results
• prepare proceedings with the outcome
• ...
24. Challenges are good since they...
• ... are focused on solving a single problem
• ... have many participants
• ... create common evaluation criteria
• ... have comparable results
• ... bring real-world problems to research
• ... make it easy to crown a winner
• ... they are cheap (even with a 1M$ prize)
26. Is that the complete truth?
• Why?
Because using standard information retrieval metrics we
cannot evaluate recommender system concepts like:
• user interaction
• perception
• satisfaction
• usefulness
• any metric not based on accuracy/rating prediction
and negative predictions
• scalability
• engineering
27. We can't catch everything offline
Scalability
Presentation
Interaction
28. The difference between IR and RS
Information retrieval systems answer to a need
A Query
Recommender systems identify the user's needs
29. Should we organize more
challenges?
• Yes - but before we do that, think of
o What is the utility of Yet Another Dataset - aren't
there enough already?
o How do we create a real-world like challenge
o How do we get real user feedback
30. Take home message
• Real needs of users and content providers are better
reflected in online evaluation
• Consider technical limitations as well
• Challenges advance the field a lot
o Matrix factorization & ensemble methods in the
Netflix Prize
o Evaluation measure and objective in the KDD Cup
2011
31. Related events at RecSys
• Workshops
o Recommender Utility Evaluation
o RecSys Data Challenge
• Paper Sessions
o Multi-Objective Recommendation and Human
Factors - Mon. 14:30
o Implicit Feedback and User Preference - Tue. 11:00
o Top-N Recommendation - Wed. 14:30
• More challenges:
o www.recsyswiki.com/wiki/Category:Competition
33. Panel
• Torben Brodt
o Plista
o Organizing Plista Contest
• Yehuda Koren
o Google
o Member of winning team of the Netflix Prize
• Darren Vengroff
o RichRelevance
o Organizer of RecLab Prize
34. Questions
• How does recommendation influence the
user and system?
• How can we quantify the effects of the UI?
• How should we translate what we've
presented into an actual challenge?
• should we focus on the long tail or the short
head?
• Evaluation measures, click rate, wtf@k
• How to evaluate conversion rate?