Recommender systemsevaluation: a 3D benchmark  Alan Said1, Domonkos Tikk2, Yue     Shi3, Martha Larson3, Klára     Stumpf2...
Motivation• Current recsys evaluation benchmarks are  insufficient  – mostly focused on IR measures (RMSE,    MAP@X, preci...
Stakeholdersusers                      content of service                          provider        recommender
The Proposed 3D model
Recent benchmarks (1)• pros:  – Large scale  – very well organized• cons:  – qualitative assessment of recommendation:    ...
Recent benchmarks (2)• pros:  – constraints on training and response time  – real traffic (only planned)  – major driver: ...
Recent Benchmarks (3)• pros:  – availability of additional metadata (compared to    KDD Cup 2011)  – not rating based (imp...
3D MODEL
User requirements• functional (quality-related)  – relevant, interesting, novel, diverse,    serendipitious, context-aware...
Business requirements• Business model  – for-profit: revenue stream  – NP-style: award driven (reputation,    community bu...
Technical constraints• data driven  – availability of user feedback (e.g. satellite TV)• system driven  – hardware/softwar...
Example• VoD recommendation scenario (TV)  – user: easy contect exploration, context-    awareness (time, viewer identific...
Facit• Recommendation tasks have many aspects  typically overlooked• Tasks define the important user, business,  and techn...
Upcoming SlideShare
Loading in …5
×

Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

1,061
-1

Published on

Recommender systems add value to vast content resources by matching users with items of interest. In recent years, immense progress has been made in recommendation techniques. The evaluation of these has however not been matched and is threatening to impede the further development of recommender systems. In this paper we propose an approach that addresses this impasse by formulating a novel evaluation concept adopting aspects from recommender systems research and industry. Our model can express the quality of a recommender algorithm from three perspectives, the end consumer (user), the service provider and the vendor (business and technique for both). We review current benchmarking activities and point out their shortcomings, which are addressed by our model. We also explain how our 3D benchmarking framework would apply to a specific use case.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,061
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

  1. 1. Recommender systemsevaluation: a 3D benchmark Alan Said1, Domonkos Tikk2, Yue Shi3, Martha Larson3, Klára Stumpf2, Paolo Cremonesi41: TU Berlin2: Gravity R&D3: TU Delft4: Politecnico di Milano/Moviri
  2. 2. Motivation• Current recsys evaluation benchmarks are insufficient – mostly focused on IR measures (RMSE, MAP@X, precision/recall) – does not consider the need of all stakeholders (users, content provider, recsys vendor) – technological and business requirements are mostly overlooked• 3D Recommender System Benchmarking Model
  3. 3. Stakeholdersusers content of service provider recommender
  4. 4. The Proposed 3D model
  5. 5. Recent benchmarks (1)• pros: – Large scale – very well organized• cons: – qualitative assessment of recommendation: simplified to RMSE – rating prediction (not ranking) – no focus on direct business and technical parameters (scalability, robustness, reactivity)
  6. 6. Recent benchmarks (2)• pros: – constraints on training and response time – real traffic (only planned) – major driver: revenue increase• cons: – only business goals, but otherwise unclear optimization criteria – user needs are neglected – organization
  7. 7. Recent Benchmarks (3)• pros: – availability of additional metadata (compared to KDD Cup 2011) – not rating based (implicit feedback) – ranking based evaluation metric (MAP@500)• cons: – offline evaluation – size does not matter anymore (lower interest) – no business requirements or technical constraint
  8. 8. 3D MODEL
  9. 9. User requirements• functional (quality-related) – relevant, interesting, novel, diverse, serendipitious, context-aware, ethical, etc.• non-functional (technology related) – real-time – usability-related
  10. 10. Business requirements• Business model – for-profit: revenue stream – NP-style: award driven (reputation, community building)• KPI depends on the application area – Revenue increase – CTR – Raise awarness to content or service
  11. 11. Technical constraints• data driven – availability of user feedback (e.g. satellite TV)• system driven – hardware/software limitations (device- dependent)• scalability – typical response time• robustness
  12. 12. Example• VoD recommendation scenario (TV) – user: easy contect exploration, context- awareness (time, viewer identification) – business: increase VoD sales & awareness (user base) – technical: middleware, HW/SW of the provider, response time
  13. 13. Facit• Recommendation tasks have many aspects typically overlooked• Tasks define the important user, business, and technical quality measures – the fulfilment of all is required at a certain level – trade-off is usually required• Proposal: with our 3D evaluation concept more comprehensive evaluation can be achieved

×