SlideShare a Scribd company logo
1 of 42
Recommender systems in a
nutshell
A Short Tale about the Long Tail
The Plan
• Examples
• Why to bother?
• Long tail
• Recommender systems
• Zvooq Recommender Platform
Disclaimer
• There are a plenty of information on RS
• The technology is quite mature, you may have
RS just out of the box at any programming
framework
• It’s easy to use something as a blackbox and
fail just because you had to think about
certain things
• This talk is about these things
Examples - Amazon
Examples - Amazon
• Consumer goods of all types
• Suggest items based on
– different activity in the past (buying, browsing)
– news
– similarity
• Support catalogue exploration
• Explain recommendations
Examples - Netflix
Examples - Netflix
• The interface is a set of rows, one row per
different recommender system (signal)
• Based mostly on movie ratings
• Predicts rating to unseen films
• Use UX specifics (multiple users of home
cinema)
Examples - Last.FM
Examples - Foursquare
Examples - Twitter
Recommender Systems
where infinite options
meet limited capabilities
time, money, attention,
viewport size
Why to bother?
Why to bother?
• Consumer perspective
– what to buy/use?
– user satisfaction
• Producer perspective
– promote things and get attention of consumers
– increase demand, compete with other producers
• Business perspective
– optimize for core business values: costs, revenue or
betterness
– business settings may vary and aren’t always aligned
with customers or producers
Any “default” interface may be
optimized
• Consumers optimize for satisfaction
– may be satisfied by the popular items
• Producers optimize for demand
– ideally, would like to lock customers and the business
on them, cheat the game
• Business:
– business optimize to reduce negative scale factors
(e.g. number of deals) and increase positive
– marketplace business optimize for market volume and
growth
The Long Tail
The Long Tail
“Forget squeezing millions from a few megahits
at the top of the charts. The future of entertainment
is in the millions of niche markets at the shallow end
of the bitstream” Chris Andersen, Wired, 2004
popularity
SELECT count(buys) FROM items ORDER BY count DESC;
physical shelf
restriction
The Long Tail
• Supply-driven factors
– Distribution channels (limited space of physical
shelves)
• Demand-driven factors
– Discovery channels (mass-media, limited attention
span, interfaces with a limited viewport)
– Preferences / taste
– Quality of content
• It is not possible to solve all of them
The Long Tail
Too good to be true,
too many power laws to fight
The Long Tail
• Consumers almost don’t suffer from the thin
tail; producers suffer a lot
• In media, where the producer/consumer
border is blurred, the whole ecosystem
suffer
• Help to discover new stuff and elicit
preferences, create a lot of niche
communities/movements
Recommender Systems
The Search Model
?
i
relevance
matching
documents
user problem query
answer(s)
Search to Discover
• One need to formulate the question
– known unknowns only
• When search paradigm fails:
– lack of preferences
– lack of domain knowledge
– lack of query-result relevance
Possible shortcuts
• Suggest a query
• Mine social layer
• Apply non-relevance scoring
• Recommender systems are all about non-
relevance scoring
Recommender model
• Allows to solve problems without knowing the
domain, even without the preferences
(unknown unknowns)
items
Recommender
system
users
list of recommendations
IR vs. RS
• IR more like to remember what you don’t
know, finding an answer to a question, RS is
more like discover what you are not aware of.
• Current web is biased towards search (thanks,
Google). People start from thinking up a
question instead of looking around.
Recommender Systems and Interfaces
• RS and interface solve the same problem: provide
an access to data given restrictions of device and
human.
• As there’s no ‘no interface’ setting, as there’s no
‘no RS setting’, since viewport is limited anyway.
Things that are there by default are
‘recommended’.
• If you don’t know about RS or don’t think about
RS, you still have a problem.
• Better know!
Decisions to make
• What data to mine?
• How to build the recommendations?
– That is, how to pick a subset and order it
• How to evaluate?
– That is, how to tune and optimize
• How to present the results?
preferences
explicit or implicit
What data to mine?
users
items
metadata
and content
items
features
demographic
and social data
users
features
social
connections
users
users
context
explicit or
implicit
history
time
usershistory
history
history
history
evolution-based
preferences
explicit or implicit
How to recommend?
users
items
metadata
and content
items
features
demographic
and social data
users
features
CF-based
user
similarity
CF-based
item
similarity
content-
based
user
similarity
content-
based
item
similarity
Model-
based
prediction
Collaborative Filtering
Cold Start Problem
Collaborative Filtering Example!
oranges celery meat
Alice 1 1 0
Bob 1 0 1
John ? ? 1
• User-based CF: Bob is more similar to John than Alice => John
likes oranges, but not celery.
• Item-based CF: Celery is unlike meat, oranges somwhere in
between => Jonh doesn’t like celery, maybe 0.5 for oranges.
• Model-based CF: Apparently, for John, meat > oranges >> celery.
1 1 0
1 0 1
0.5 0.5 1
-0.6 -0.5
-0.5 0.8
-0.6 -0.3
-0.7 -0.4 -0.6
0.2 0.7 -0.7
0.3 -0.1 0.7
0.5 0.8 -0.3
0.4 0 0.6
heavy offline
computation
Summary
• General or personalized recommendations
• Collaborative filtering
– what do people similar to you use?
– what items are similar to items you use?
– model-based methods
• Cold start problem
– how to assess new items?
– what recommend to new users?
• Exploration/Exploitation
– accuracy on history vs. discovery
kNN for each
request
heavy offline
computation
More things to keep in mind
(AKA “a very long slide”)
• Data sparsity and aggregation
• Popularity bias
• Filter bubble problem
• Hubness
• Choosing between good options is hard and
dissatisfying
• Preference/Quality problem
• Robustness
• A sense of control
• Discoverability
How to present results?
• Interface:
– explicit: easy to attract and explain, lots of WTF,
doesn’t work as discovery channel
– hidden: hard to explain, low trust per se, but
augments existing discovery channels
• Explaining recommendations:
– important not only to increase user trust, but also due
to difference between expected and perceived utility
• Interface matters:
– very small amount of actual user satisfaction depends
on the algorithms
How to evaluate and optimize?
• Only evaluation affects algorithm selection
and parameter optimization
• Different evaluation settings result in different
algorithms used
• Offline evaluation
– historical data
• Online evaluation
– A/B testing on live users
Offline evaluation
• Rating prediction and top-K recommenders
• Cross-validation vs. backtesting
• Caveats: trying to make long-tail thick, but in
the same time fitting to the historic thin long-
tail
• Additional diversity, freshness and long-tail
distribution metrics may apply
• Primary goal: tune algorithm parameters
Online evaluation
• Primary goal: make decisions on algorithms
• Within-subjects and Between-subjects
• Metrics to optimize:
– retention, ARPU, taste evolution
• Statistical significance
Domain-specific recommendation
• Music
– augmentive (a lot of contexts)
– cheap to discover and fail
– to cheap to bother make ratings
• Videos
– quite reliable rating systems
– expected/experienced utility may be different
• Books
– huge time investment, expensive to fail and discover
– evolution is more important than preference
• News and events
– unique objects, metadata and proper aggregation is more
important than pure CF
Zvooq Case
Zvooq Case
Zvooq Case (now)
If you listened this you may also be
interested in…
• The Long Tail: Why The Future of Business is
Selling Less for More by Chris Andersen
• Recommender systems: An Introduction
• Music Recommendation and Discovery: The
Long Tail, Long Fail and Long Play by Oscar
Celma
• Recommender Systems Handbook
• http://recommenderbook.net
Next talk
• Thursday 08.08.2013, 20:00
• Speaker: Vladimir Belikov
• More technical side
• Decisions we took and how to make it better

More Related Content

Similar to Recommender Systems in a nutshell

Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation SystemsRumman Chowdhury
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbaiTejaspathiLV
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in puneprathyusha1234
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabadprathyusha1234
 
best online data science courses
best online data science coursesbest online data science courses
best online data science coursesprathyusha1234
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedBetclic Everest Group Tech Team
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...Data Con LA
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratchDr. Amit Sachan
 
IA breakfast briefing apr12 upload
IA breakfast briefing apr12 uploadIA breakfast briefing apr12 upload
IA breakfast briefing apr12 uploadRoss Philip
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation SystemMinha Hwang
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Dakiry
 
Theory research pro-forma updated
Theory  research pro-forma updatedTheory  research pro-forma updated
Theory research pro-forma updatedLouis Dowson
 
Theory research pro-forma(1)
Theory  research pro-forma(1)Theory  research pro-forma(1)
Theory research pro-forma(1)james Gannon
 
Levine-Clark, Michael, and Barbara Kawecki, "Best Practices for Demand-Driven...
Levine-Clark, Michael, and Barbara Kawecki, "Best Practices for Demand-Driven...Levine-Clark, Michael, and Barbara Kawecki, "Best Practices for Demand-Driven...
Levine-Clark, Michael, and Barbara Kawecki, "Best Practices for Demand-Driven...Michael Levine-Clark
 
MediaEval 2018: Baseline Algorithms for Predicting the Interest in News
MediaEval 2018: Baseline Algorithms for Predicting the Interest in NewsMediaEval 2018: Baseline Algorithms for Predicting the Interest in News
MediaEval 2018: Baseline Algorithms for Predicting the Interest in Newsmultimediaeval
 

Similar to Recommender Systems in a nutshell (20)

Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
 
Recommender lecture
Recommender lectureRecommender lecture
Recommender lecture
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
 
IA breakfast briefing apr12 upload
IA breakfast briefing apr12 uploadIA breakfast briefing apr12 upload
IA breakfast briefing apr12 upload
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation System
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”
 
Theory research pro-forma updated
Theory  research pro-forma updatedTheory  research pro-forma updated
Theory research pro-forma updated
 
Theory research pro-forma(1)
Theory  research pro-forma(1)Theory  research pro-forma(1)
Theory research pro-forma(1)
 
Unit 2: Research.
Unit 2: Research.Unit 2: Research.
Unit 2: Research.
 
Levine-Clark, Michael, and Barbara Kawecki, "Best Practices for Demand-Driven...
Levine-Clark, Michael, and Barbara Kawecki, "Best Practices for Demand-Driven...Levine-Clark, Michael, and Barbara Kawecki, "Best Practices for Demand-Driven...
Levine-Clark, Michael, and Barbara Kawecki, "Best Practices for Demand-Driven...
 
Theory research
Theory researchTheory research
Theory research
 
MediaEval 2018: Baseline Algorithms for Predicting the Interest in News
MediaEval 2018: Baseline Algorithms for Predicting the Interest in NewsMediaEval 2018: Baseline Algorithms for Predicting the Interest in News
MediaEval 2018: Baseline Algorithms for Predicting the Interest in News
 

More from Konstantin Savenkov

GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic ParrotsKonstantin Savenkov
 
Dodging AI biases in future-proof Machine Translation solutions
Dodging AI biases in future-proof Machine Translation solutionsDodging AI biases in future-proof Machine Translation solutions
Dodging AI biases in future-proof Machine Translation solutionsKonstantin Savenkov
 
Building Multi-Purpose MT Portfolio
Building Multi-Purpose MT PortfolioBuilding Multi-Purpose MT Portfolio
Building Multi-Purpose MT PortfolioKonstantin Savenkov
 
Как выбрать и приручить машинный перевод / How to choose and tame the Machine...
Как выбрать и приручить машинный перевод / How to choose and tame the Machine...Как выбрать и приручить машинный перевод / How to choose and tame the Machine...
Как выбрать и приручить машинный перевод / How to choose and tame the Machine...Konstantin Savenkov
 
Progress in Commercial Machine Translation Systems
Progress in Commercial Machine Translation SystemsProgress in Commercial Machine Translation Systems
Progress in Commercial Machine Translation SystemsKonstantin Savenkov
 
Cloud Artificial Intelligence Landscape
Cloud Artificial Intelligence LandscapeCloud Artificial Intelligence Landscape
Cloud Artificial Intelligence LandscapeKonstantin Savenkov
 
State of the Machine Translation by Intento (stock engines, Jun 2019)
State of the Machine Translation by Intento (stock engines, Jun 2019)State of the Machine Translation by Intento (stock engines, Jun 2019)
State of the Machine Translation by Intento (stock engines, Jun 2019)Konstantin Savenkov
 
State of the Machine Translation by Intento (stock engines, Jan 2019)
State of the Machine Translation by Intento (stock engines, Jan 2019)State of the Machine Translation by Intento (stock engines, Jan 2019)
State of the Machine Translation by Intento (stock engines, Jan 2019)Konstantin Savenkov
 
State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)Konstantin Savenkov
 
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...Konstantin Savenkov
 
Improving the Demand Side of the AI Economy (API World 2018)
Improving the Demand Side of the AI Economy (API World 2018)Improving the Demand Side of the AI Economy (API World 2018)
Improving the Demand Side of the AI Economy (API World 2018)Konstantin Savenkov
 
Сравнительный анализ систем машинного перевода
Сравнительный анализ систем машинного переводаСравнительный анализ систем машинного перевода
Сравнительный анализ систем машинного переводаKonstantin Savenkov
 
State of the Machine Translation by Intento (July 2018)
State of the Machine Translation by Intento (July 2018)State of the Machine Translation by Intento (July 2018)
State of the Machine Translation by Intento (July 2018)Konstantin Savenkov
 
Cloud Sentiment Analysis - Vendor Overview (April 2018)
Cloud Sentiment Analysis - Vendor Overview (April 2018)Cloud Sentiment Analysis - Vendor Overview (April 2018)
Cloud Sentiment Analysis - Vendor Overview (April 2018)Konstantin Savenkov
 
State of the Machine Translation by Intento (March 2018)
State of the Machine Translation by Intento (March 2018)State of the Machine Translation by Intento (March 2018)
State of the Machine Translation by Intento (March 2018)Konstantin Savenkov
 
State of the Machine Translation by Intento (November 2017)
State of the Machine Translation by Intento (November 2017)State of the Machine Translation by Intento (November 2017)
State of the Machine Translation by Intento (November 2017)Konstantin Savenkov
 
NLU / Intent Detection Benchmark by Intento, August 2017
NLU / Intent Detection Benchmark by Intento, August 2017NLU / Intent Detection Benchmark by Intento, August 2017
NLU / Intent Detection Benchmark by Intento, August 2017Konstantin Savenkov
 
Intento Machine Translation Benchmark, July 2017
Intento Machine Translation Benchmark, July 2017Intento Machine Translation Benchmark, July 2017
Intento Machine Translation Benchmark, July 2017Konstantin Savenkov
 

More from Konstantin Savenkov (20)

GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
 
Dodging AI biases in future-proof Machine Translation solutions
Dodging AI biases in future-proof Machine Translation solutionsDodging AI biases in future-proof Machine Translation solutions
Dodging AI biases in future-proof Machine Translation solutions
 
Building Multi-Purpose MT Portfolio
Building Multi-Purpose MT PortfolioBuilding Multi-Purpose MT Portfolio
Building Multi-Purpose MT Portfolio
 
Machine Translation Insights
Machine Translation InsightsMachine Translation Insights
Machine Translation Insights
 
Как выбрать и приручить машинный перевод / How to choose and tame the Machine...
Как выбрать и приручить машинный перевод / How to choose and tame the Machine...Как выбрать и приручить машинный перевод / How to choose and tame the Machine...
Как выбрать и приручить машинный перевод / How to choose and tame the Machine...
 
Progress in Commercial Machine Translation Systems
Progress in Commercial Machine Translation SystemsProgress in Commercial Machine Translation Systems
Progress in Commercial Machine Translation Systems
 
Cloud Artificial Intelligence Landscape
Cloud Artificial Intelligence LandscapeCloud Artificial Intelligence Landscape
Cloud Artificial Intelligence Landscape
 
Intento Enterprise MT Hub
Intento Enterprise MT HubIntento Enterprise MT Hub
Intento Enterprise MT Hub
 
State of the Machine Translation by Intento (stock engines, Jun 2019)
State of the Machine Translation by Intento (stock engines, Jun 2019)State of the Machine Translation by Intento (stock engines, Jun 2019)
State of the Machine Translation by Intento (stock engines, Jun 2019)
 
State of the Machine Translation by Intento (stock engines, Jan 2019)
State of the Machine Translation by Intento (stock engines, Jan 2019)State of the Machine Translation by Intento (stock engines, Jan 2019)
State of the Machine Translation by Intento (stock engines, Jan 2019)
 
State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)
 
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
 
Improving the Demand Side of the AI Economy (API World 2018)
Improving the Demand Side of the AI Economy (API World 2018)Improving the Demand Side of the AI Economy (API World 2018)
Improving the Demand Side of the AI Economy (API World 2018)
 
Сравнительный анализ систем машинного перевода
Сравнительный анализ систем машинного переводаСравнительный анализ систем машинного перевода
Сравнительный анализ систем машинного перевода
 
State of the Machine Translation by Intento (July 2018)
State of the Machine Translation by Intento (July 2018)State of the Machine Translation by Intento (July 2018)
State of the Machine Translation by Intento (July 2018)
 
Cloud Sentiment Analysis - Vendor Overview (April 2018)
Cloud Sentiment Analysis - Vendor Overview (April 2018)Cloud Sentiment Analysis - Vendor Overview (April 2018)
Cloud Sentiment Analysis - Vendor Overview (April 2018)
 
State of the Machine Translation by Intento (March 2018)
State of the Machine Translation by Intento (March 2018)State of the Machine Translation by Intento (March 2018)
State of the Machine Translation by Intento (March 2018)
 
State of the Machine Translation by Intento (November 2017)
State of the Machine Translation by Intento (November 2017)State of the Machine Translation by Intento (November 2017)
State of the Machine Translation by Intento (November 2017)
 
NLU / Intent Detection Benchmark by Intento, August 2017
NLU / Intent Detection Benchmark by Intento, August 2017NLU / Intent Detection Benchmark by Intento, August 2017
NLU / Intent Detection Benchmark by Intento, August 2017
 
Intento Machine Translation Benchmark, July 2017
Intento Machine Translation Benchmark, July 2017Intento Machine Translation Benchmark, July 2017
Intento Machine Translation Benchmark, July 2017
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Recommender Systems in a nutshell

  • 1. Recommender systems in a nutshell A Short Tale about the Long Tail
  • 2. The Plan • Examples • Why to bother? • Long tail • Recommender systems • Zvooq Recommender Platform
  • 3. Disclaimer • There are a plenty of information on RS • The technology is quite mature, you may have RS just out of the box at any programming framework • It’s easy to use something as a blackbox and fail just because you had to think about certain things • This talk is about these things
  • 5. Examples - Amazon • Consumer goods of all types • Suggest items based on – different activity in the past (buying, browsing) – news – similarity • Support catalogue exploration • Explain recommendations
  • 7. Examples - Netflix • The interface is a set of rows, one row per different recommender system (signal) • Based mostly on movie ratings • Predicts rating to unseen films • Use UX specifics (multiple users of home cinema)
  • 11. Recommender Systems where infinite options meet limited capabilities time, money, attention, viewport size
  • 13. Why to bother? • Consumer perspective – what to buy/use? – user satisfaction • Producer perspective – promote things and get attention of consumers – increase demand, compete with other producers • Business perspective – optimize for core business values: costs, revenue or betterness – business settings may vary and aren’t always aligned with customers or producers
  • 14. Any “default” interface may be optimized • Consumers optimize for satisfaction – may be satisfied by the popular items • Producers optimize for demand – ideally, would like to lock customers and the business on them, cheat the game • Business: – business optimize to reduce negative scale factors (e.g. number of deals) and increase positive – marketplace business optimize for market volume and growth
  • 16. The Long Tail “Forget squeezing millions from a few megahits at the top of the charts. The future of entertainment is in the millions of niche markets at the shallow end of the bitstream” Chris Andersen, Wired, 2004 popularity SELECT count(buys) FROM items ORDER BY count DESC; physical shelf restriction
  • 17. The Long Tail • Supply-driven factors – Distribution channels (limited space of physical shelves) • Demand-driven factors – Discovery channels (mass-media, limited attention span, interfaces with a limited viewport) – Preferences / taste – Quality of content • It is not possible to solve all of them
  • 18. The Long Tail Too good to be true, too many power laws to fight
  • 19. The Long Tail • Consumers almost don’t suffer from the thin tail; producers suffer a lot • In media, where the producer/consumer border is blurred, the whole ecosystem suffer • Help to discover new stuff and elicit preferences, create a lot of niche communities/movements
  • 22. Search to Discover • One need to formulate the question – known unknowns only • When search paradigm fails: – lack of preferences – lack of domain knowledge – lack of query-result relevance
  • 23. Possible shortcuts • Suggest a query • Mine social layer • Apply non-relevance scoring • Recommender systems are all about non- relevance scoring
  • 24. Recommender model • Allows to solve problems without knowing the domain, even without the preferences (unknown unknowns) items Recommender system users list of recommendations
  • 25. IR vs. RS • IR more like to remember what you don’t know, finding an answer to a question, RS is more like discover what you are not aware of. • Current web is biased towards search (thanks, Google). People start from thinking up a question instead of looking around.
  • 26. Recommender Systems and Interfaces • RS and interface solve the same problem: provide an access to data given restrictions of device and human. • As there’s no ‘no interface’ setting, as there’s no ‘no RS setting’, since viewport is limited anyway. Things that are there by default are ‘recommended’. • If you don’t know about RS or don’t think about RS, you still have a problem. • Better know!
  • 27. Decisions to make • What data to mine? • How to build the recommendations? – That is, how to pick a subset and order it • How to evaluate? – That is, how to tune and optimize • How to present the results?
  • 28. preferences explicit or implicit What data to mine? users items metadata and content items features demographic and social data users features social connections users users context explicit or implicit history time usershistory history history history evolution-based
  • 29. preferences explicit or implicit How to recommend? users items metadata and content items features demographic and social data users features CF-based user similarity CF-based item similarity content- based user similarity content- based item similarity Model- based prediction Collaborative Filtering Cold Start Problem
  • 30. Collaborative Filtering Example! oranges celery meat Alice 1 1 0 Bob 1 0 1 John ? ? 1 • User-based CF: Bob is more similar to John than Alice => John likes oranges, but not celery. • Item-based CF: Celery is unlike meat, oranges somwhere in between => Jonh doesn’t like celery, maybe 0.5 for oranges. • Model-based CF: Apparently, for John, meat > oranges >> celery. 1 1 0 1 0 1 0.5 0.5 1 -0.6 -0.5 -0.5 0.8 -0.6 -0.3 -0.7 -0.4 -0.6 0.2 0.7 -0.7 0.3 -0.1 0.7 0.5 0.8 -0.3 0.4 0 0.6
  • 31. heavy offline computation Summary • General or personalized recommendations • Collaborative filtering – what do people similar to you use? – what items are similar to items you use? – model-based methods • Cold start problem – how to assess new items? – what recommend to new users? • Exploration/Exploitation – accuracy on history vs. discovery kNN for each request heavy offline computation
  • 32. More things to keep in mind (AKA “a very long slide”) • Data sparsity and aggregation • Popularity bias • Filter bubble problem • Hubness • Choosing between good options is hard and dissatisfying • Preference/Quality problem • Robustness • A sense of control • Discoverability
  • 33. How to present results? • Interface: – explicit: easy to attract and explain, lots of WTF, doesn’t work as discovery channel – hidden: hard to explain, low trust per se, but augments existing discovery channels • Explaining recommendations: – important not only to increase user trust, but also due to difference between expected and perceived utility • Interface matters: – very small amount of actual user satisfaction depends on the algorithms
  • 34. How to evaluate and optimize? • Only evaluation affects algorithm selection and parameter optimization • Different evaluation settings result in different algorithms used • Offline evaluation – historical data • Online evaluation – A/B testing on live users
  • 35. Offline evaluation • Rating prediction and top-K recommenders • Cross-validation vs. backtesting • Caveats: trying to make long-tail thick, but in the same time fitting to the historic thin long- tail • Additional diversity, freshness and long-tail distribution metrics may apply • Primary goal: tune algorithm parameters
  • 36. Online evaluation • Primary goal: make decisions on algorithms • Within-subjects and Between-subjects • Metrics to optimize: – retention, ARPU, taste evolution • Statistical significance
  • 37. Domain-specific recommendation • Music – augmentive (a lot of contexts) – cheap to discover and fail – to cheap to bother make ratings • Videos – quite reliable rating systems – expected/experienced utility may be different • Books – huge time investment, expensive to fail and discover – evolution is more important than preference • News and events – unique objects, metadata and proper aggregation is more important than pure CF
  • 41. If you listened this you may also be interested in… • The Long Tail: Why The Future of Business is Selling Less for More by Chris Andersen • Recommender systems: An Introduction • Music Recommendation and Discovery: The Long Tail, Long Fail and Long Play by Oscar Celma • Recommender Systems Handbook • http://recommenderbook.net
  • 42. Next talk • Thursday 08.08.2013, 20:00 • Speaker: Vladimir Belikov • More technical side • Decisions we took and how to make it better

Editor's Notes

  1. Image source: http://amazon.com
  2. Image source: http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html
  3. guy with a spotlight in a treasure room, for recommender systems – guy with a lantern
  4. guy with lantern!
  5. Alien vs. Predator