Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks
1. Fifty Shades of Ratings:
How to Benefit from a Negative Feedback in
Top-N Recommendations Tasks
by Evgeny Frolov1 and Ivan Oseledets1, 2
1Skolkovo Institute of Science and Technology
2Institute of Numerical Mathematics of the Russian Academy of Sciences
2. “Almost” cold-start problem
Recommendations are insensitive to negative “signal”.
Shift of recommendations paradigm:
Is this a good list of recommendations?
new user
Users may share not only what they love, but also what they hate.
3. Why standard approach fails?
𝒑 𝑇
𝒒 𝑇new user row
𝐴 ≈ 𝑈
Σ 𝑉 𝑇
Pure SVD* of matrix of ratings 𝐴users
movies
*P. Cremonesi, Y.Koren, R.Turrin, "Performance of Recommender Algorithms on Top-N Recommendation Tasks“, 2010
𝒓 ≈ 𝑉𝑉 𝑇
𝒑
vector of predicted item scores
approximate update to SVD generated by 𝒑
toprec 𝒑, 𝑛 ≔ arg max 𝒓
𝑛
top-𝒏 recommendations task
𝒓 𝑇 = 𝒒 𝑇Σ𝑉 𝑇 ≈ 𝒑 𝑇 𝑉Σ−1Σ𝑉 𝑇 = 𝒑 𝑇 𝑉𝑉 𝑇folding-in:
arg max 𝑉𝑉 𝑇
0, … , 0, 𝟐, 0, … , 0 𝑇
≡ arg max 𝑉𝑉 𝑇
0, … , 0, 𝟓, 0, … , 0 𝑇
4. How to solve this problem?
Rating elicitation hard to peak most representative items
increases barrier to entry (not effortless for user)
non-personalized user experience
Typical approach:
meaningful recommendations even from a single feedback
respect feedback polarity
no heuristics, no side information
generalize well on other scenarios (not only cold-start)
Requirements:
5. Technique: Matrix factorization
Restating the problem
𝑈𝑠𝑒𝑟 × 𝐼𝑡𝑒𝑚 → 𝑅𝑎𝑡𝑖𝑛𝑔
Users
Items
3
Standard model
Users
3
1 2
54
1
* T. G. Kolda and B. W. Bader, “Tensor Decompositions and Applications”, 2009
𝑈𝑠𝑒𝑟 × 𝐼𝑡𝑒𝑚 × 𝑅𝑎𝑡𝑖𝑛𝑔 → 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑐𝑒 𝑆𝑐𝑜𝑟𝑒
Collaborative Full Feedback model
CoFFee (proposed approach)
Technique: Tensor Factorization
based on Tucker Decomposition*
𝒜 ≈ 𝒢 ×1 𝑈 ×2 𝑉 ×3 𝑊
ratings are cardinal values
6. Recommendations in real-time
𝑃 – matrix of new
user preferences approximate row update𝒒 𝑻
𝑅 ≈ 𝑉𝑉 𝑇 𝑃𝑊𝑊 𝑇 items relevance matrix
Compare to SVD: 𝒓 ≈ 𝑉𝑉 𝑇
𝒑
𝒢
𝑈
𝑊
𝑉
𝒜 ≈
Users 𝒜 ≈ 𝒢 ×1 𝑈 ×2 𝑉 ×3 𝑊
Higher order folding-in: “Shades of ratings”
𝑊 embeds ratings onto
latent feature space!
7. “Shades” of ratings
Model is equally sensitive
to any kind of feedback.
Granular view of user preferences,
concerning all possible ratings.
More dense colors correspond to higher relevance score.
ratings
movies
1 2 3 4 50
rankingtask
𝑅 ≈ 𝑉𝑉 𝑇 𝑃𝑊𝑊 𝑇
rating prediction
9. Undesired positivity bias in evaluation
Precision =
1
#(test users)
test
users
#(recommended items ∩ holdout items)
#(recommended items)
𝐷𝐶𝐺 =
𝑖
2 𝑟𝑒𝑙 𝑖 − 1
log2(𝑖 + 1)
Need to distinguish between relevant and irrelevant recommendations
Implicit assumption: all recommendations are interesting to the user.
𝑟𝑒𝑙𝑖 - true rating of a recommended item at position 𝑖
Low ratings do not express enjoyment!
10. Redefining metrics
2 3 4 5
+ + +
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑡𝑝
𝑡𝑝 + 𝑓𝑝
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑡𝑝
𝑡𝑝 + 𝑓𝑛
Relevance based
Ranking based 𝐷𝐶𝐺 =
𝑝
2 𝑟 𝑝 − 1
log2(𝑝 + 1) 𝑝 ∶ {𝑟𝑝 ≥ positivity threshold}
𝑟𝑝 - value of positive feedback
New metric
Discounted Cumulative Loss
𝐷𝐶𝐿 =
𝑛
2−𝑟 𝑛 − 1
−log2(𝑛 + 1) 𝑛: {0 < 𝑟𝑛 < positivity threshold}
𝑟𝑛 - value of negative feedback
Holdout items
Recommendations
tpfptn fn
“presumption of innocence”
13. Key takeaways
Standard evaluation metrics are biased towards positive effects of recommendations.
Negative feedback is a valuable source of information and shouldn’t be neglected.
It’s more natural to treat users’ feedback as ordinal not cardinal concept.
Tensor methods are effective for this kind of problems, giving you speed and quality.
Proposed CoFFee model can help to alleviate rating elicitation problems.
14. Polara framework
fast and easy-to-use
feature-rich and extensible
actively developed
MyMediaLite support (extended with folding-in)
https://github.com/evfro/polara
“RecSys for Humans”
general conclusion: many models are unable to properly handle polarity of user feedback without additional heuristics and manual tweaking
Key idea: represent ratings as an additional (categorical) variable and encode observations as a multidimensional array (tensor):
Each interaction can now be encoded with 3 indices instead of two, as we take rating information into account in addition to users and items.
We will call this multidimensional array a tensor and we use efficient tensor-based techniques
Calculation of tensor-based model might be time consuming and we propose an efficient way of fast recommendations computation based on an generalization of known folding-in technique to higher order
Tucker Decomposition obtained with HOOI
This uncovers new recommendation scenarios beyond “users who like this also like…”