Recommender systems in a
nutshell
A Short Tale about the Long Tail
The Plan
• Examples
• Why to bother?
• Long tail
• Recommender systems
• Zvooq Recommender Platform
Disclaimer
• There are a plenty of information on RS
• The technology is quite mature, you may have
RS just out of the box...
Examples - Amazon
Examples - Amazon
• Consumer goods of all types
• Suggest items based on
– different activity in the past (buying, browsin...
Examples - Netflix
Examples - Netflix
• The interface is a set of rows, one row per
different recommender system (signal)
• Based mostly on m...
Examples - Last.FM
Examples - Foursquare
Examples - Twitter
Recommender Systems
where infinite options
meet limited capabilities
time, money, attention,
viewport size
Why to bother?
Why to bother?
• Consumer perspective
– what to buy/use?
– user satisfaction
• Producer perspective
– promote things and g...
Any “default” interface may be
optimized
• Consumers optimize for satisfaction
– may be satisfied by the popular items
• P...
The Long Tail
The Long Tail
“Forget squeezing millions from a few megahits
at the top of the charts. The future of entertainment
is in t...
The Long Tail
• Supply-driven factors
– Distribution channels (limited space of physical
shelves)
• Demand-driven factors
...
The Long Tail
Too good to be true,
too many power laws to fight
The Long Tail
• Consumers almost don’t suffer from the thin
tail; producers suffer a lot
• In media, where the producer/co...
Recommender Systems
The Search Model
?
i
relevance
matching
documents
user problem query
answer(s)
Search to Discover
• One need to formulate the question
– known unknowns only
• When search paradigm fails:
– lack of pref...
Possible shortcuts
• Suggest a query
• Mine social layer
• Apply non-relevance scoring
• Recommender systems are all about...
Recommender model
• Allows to solve problems without knowing the
domain, even without the preferences
(unknown unknowns)
i...
IR vs. RS
• IR more like to remember what you don’t
know, finding an answer to a question, RS is
more like discover what y...
Recommender Systems and Interfaces
• RS and interface solve the same problem: provide
an access to data given restrictions...
Decisions to make
• What data to mine?
• How to build the recommendations?
– That is, how to pick a subset and order it
• ...
preferences
explicit or implicit
What data to mine?
users
items
metadata
and content
items
features
demographic
and social...
preferences
explicit or implicit
How to recommend?
users
items
metadata
and content
items
features
demographic
and social ...
Collaborative Filtering Example!
oranges celery meat
Alice 1 1 0
Bob 1 0 1
John ? ? 1
• User-based CF: Bob is more similar...
heavy offline
computation
Summary
• General or personalized recommendations
• Collaborative filtering
– what do people sim...
More things to keep in mind
(AKA “a very long slide”)
• Data sparsity and aggregation
• Popularity bias
• Filter bubble pr...
How to present results?
• Interface:
– explicit: easy to attract and explain, lots of WTF,
doesn’t work as discovery chann...
How to evaluate and optimize?
• Only evaluation affects algorithm selection
and parameter optimization
• Different evaluat...
Offline evaluation
• Rating prediction and top-K recommenders
• Cross-validation vs. backtesting
• Caveats: trying to make...
Online evaluation
• Primary goal: make decisions on algorithms
• Within-subjects and Between-subjects
• Metrics to optimiz...
Domain-specific recommendation
• Music
– augmentive (a lot of contexts)
– cheap to discover and fail
– to cheap to bother ...
Zvooq Case
Zvooq Case
Zvooq Case (now)
If you listened this you may also be
interested in…
• The Long Tail: Why The Future of Business is
Selling Less for More b...
Next talk
• Thursday 08.08.2013, 20:00
• Speaker: Vladimir Belikov
• More technical side
• Decisions we took and how to ma...
Upcoming SlideShare
Loading in …5
×

Recommender Systems in a nutshell

965 views

Published on

Overview talk on recommender systems from different perspectives. All math is out.

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
965
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
0
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide
  • Image source: http://amazon.com
  • Image source: http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html
  • guy with a spotlight in a treasure room, for recommender systems – guy with a lantern
  • guy with lantern!
  • Alien vs. Predator
  • Recommender Systems in a nutshell

    1. 1. Recommender systems in a nutshell A Short Tale about the Long Tail
    2. 2. The Plan • Examples • Why to bother? • Long tail • Recommender systems • Zvooq Recommender Platform
    3. 3. Disclaimer • There are a plenty of information on RS • The technology is quite mature, you may have RS just out of the box at any programming framework • It’s easy to use something as a blackbox and fail just because you had to think about certain things • This talk is about these things
    4. 4. Examples - Amazon
    5. 5. Examples - Amazon • Consumer goods of all types • Suggest items based on – different activity in the past (buying, browsing) – news – similarity • Support catalogue exploration • Explain recommendations
    6. 6. Examples - Netflix
    7. 7. Examples - Netflix • The interface is a set of rows, one row per different recommender system (signal) • Based mostly on movie ratings • Predicts rating to unseen films • Use UX specifics (multiple users of home cinema)
    8. 8. Examples - Last.FM
    9. 9. Examples - Foursquare
    10. 10. Examples - Twitter
    11. 11. Recommender Systems where infinite options meet limited capabilities time, money, attention, viewport size
    12. 12. Why to bother?
    13. 13. Why to bother? • Consumer perspective – what to buy/use? – user satisfaction • Producer perspective – promote things and get attention of consumers – increase demand, compete with other producers • Business perspective – optimize for core business values: costs, revenue or betterness – business settings may vary and aren’t always aligned with customers or producers
    14. 14. Any “default” interface may be optimized • Consumers optimize for satisfaction – may be satisfied by the popular items • Producers optimize for demand – ideally, would like to lock customers and the business on them, cheat the game • Business: – business optimize to reduce negative scale factors (e.g. number of deals) and increase positive – marketplace business optimize for market volume and growth
    15. 15. The Long Tail
    16. 16. The Long Tail “Forget squeezing millions from a few megahits at the top of the charts. The future of entertainment is in the millions of niche markets at the shallow end of the bitstream” Chris Andersen, Wired, 2004 popularity SELECT count(buys) FROM items ORDER BY count DESC; physical shelf restriction
    17. 17. The Long Tail • Supply-driven factors – Distribution channels (limited space of physical shelves) • Demand-driven factors – Discovery channels (mass-media, limited attention span, interfaces with a limited viewport) – Preferences / taste – Quality of content • It is not possible to solve all of them
    18. 18. The Long Tail Too good to be true, too many power laws to fight
    19. 19. The Long Tail • Consumers almost don’t suffer from the thin tail; producers suffer a lot • In media, where the producer/consumer border is blurred, the whole ecosystem suffer • Help to discover new stuff and elicit preferences, create a lot of niche communities/movements
    20. 20. Recommender Systems
    21. 21. The Search Model ? i relevance matching documents user problem query answer(s)
    22. 22. Search to Discover • One need to formulate the question – known unknowns only • When search paradigm fails: – lack of preferences – lack of domain knowledge – lack of query-result relevance
    23. 23. Possible shortcuts • Suggest a query • Mine social layer • Apply non-relevance scoring • Recommender systems are all about non- relevance scoring
    24. 24. Recommender model • Allows to solve problems without knowing the domain, even without the preferences (unknown unknowns) items Recommender system users list of recommendations
    25. 25. IR vs. RS • IR more like to remember what you don’t know, finding an answer to a question, RS is more like discover what you are not aware of. • Current web is biased towards search (thanks, Google). People start from thinking up a question instead of looking around.
    26. 26. Recommender Systems and Interfaces • RS and interface solve the same problem: provide an access to data given restrictions of device and human. • As there’s no ‘no interface’ setting, as there’s no ‘no RS setting’, since viewport is limited anyway. Things that are there by default are ‘recommended’. • If you don’t know about RS or don’t think about RS, you still have a problem. • Better know!
    27. 27. Decisions to make • What data to mine? • How to build the recommendations? – That is, how to pick a subset and order it • How to evaluate? – That is, how to tune and optimize • How to present the results?
    28. 28. preferences explicit or implicit What data to mine? users items metadata and content items features demographic and social data users features social connections users users context explicit or implicit history time usershistory history history history evolution-based
    29. 29. preferences explicit or implicit How to recommend? users items metadata and content items features demographic and social data users features CF-based user similarity CF-based item similarity content- based user similarity content- based item similarity Model- based prediction Collaborative Filtering Cold Start Problem
    30. 30. Collaborative Filtering Example! oranges celery meat Alice 1 1 0 Bob 1 0 1 John ? ? 1 • User-based CF: Bob is more similar to John than Alice => John likes oranges, but not celery. • Item-based CF: Celery is unlike meat, oranges somwhere in between => Jonh doesn’t like celery, maybe 0.5 for oranges. • Model-based CF: Apparently, for John, meat > oranges >> celery. 1 1 0 1 0 1 0.5 0.5 1 -0.6 -0.5 -0.5 0.8 -0.6 -0.3 -0.7 -0.4 -0.6 0.2 0.7 -0.7 0.3 -0.1 0.7 0.5 0.8 -0.3 0.4 0 0.6
    31. 31. heavy offline computation Summary • General or personalized recommendations • Collaborative filtering – what do people similar to you use? – what items are similar to items you use? – model-based methods • Cold start problem – how to assess new items? – what recommend to new users? • Exploration/Exploitation – accuracy on history vs. discovery kNN for each request heavy offline computation
    32. 32. More things to keep in mind (AKA “a very long slide”) • Data sparsity and aggregation • Popularity bias • Filter bubble problem • Hubness • Choosing between good options is hard and dissatisfying • Preference/Quality problem • Robustness • A sense of control • Discoverability
    33. 33. How to present results? • Interface: – explicit: easy to attract and explain, lots of WTF, doesn’t work as discovery channel – hidden: hard to explain, low trust per se, but augments existing discovery channels • Explaining recommendations: – important not only to increase user trust, but also due to difference between expected and perceived utility • Interface matters: – very small amount of actual user satisfaction depends on the algorithms
    34. 34. How to evaluate and optimize? • Only evaluation affects algorithm selection and parameter optimization • Different evaluation settings result in different algorithms used • Offline evaluation – historical data • Online evaluation – A/B testing on live users
    35. 35. Offline evaluation • Rating prediction and top-K recommenders • Cross-validation vs. backtesting • Caveats: trying to make long-tail thick, but in the same time fitting to the historic thin long- tail • Additional diversity, freshness and long-tail distribution metrics may apply • Primary goal: tune algorithm parameters
    36. 36. Online evaluation • Primary goal: make decisions on algorithms • Within-subjects and Between-subjects • Metrics to optimize: – retention, ARPU, taste evolution • Statistical significance
    37. 37. Domain-specific recommendation • Music – augmentive (a lot of contexts) – cheap to discover and fail – to cheap to bother make ratings • Videos – quite reliable rating systems – expected/experienced utility may be different • Books – huge time investment, expensive to fail and discover – evolution is more important than preference • News and events – unique objects, metadata and proper aggregation is more important than pure CF
    38. 38. Zvooq Case
    39. 39. Zvooq Case
    40. 40. Zvooq Case (now)
    41. 41. If you listened this you may also be interested in… • The Long Tail: Why The Future of Business is Selling Less for More by Chris Andersen • Recommender systems: An Introduction • Music Recommendation and Discovery: The Long Tail, Long Fail and Long Play by Oscar Celma • Recommender Systems Handbook • http://recommenderbook.net
    42. 42. Next talk • Thursday 08.08.2013, 20:00 • Speaker: Vladimir Belikov • More technical side • Decisions we took and how to make it better

    ×