SlideShare a Scribd company logo
1 of 22
Download to read offline
KDD
KDD’
’10 Tutorial:
10 Tutorial:
Recommender Problems for
Recommender Problems for
Web Applications
Web Applications
Deepak Agarwal and Bee-Chung Chen
Yahoo! Research
2
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Agenda
• Focus:
– Recommender problems for dynamic
dynamic, time
time-
-sensitive
sensitive applications
• Content Optimization
• Introduction (20 min, Deepak)
– Content optimization, match-making, example applications
• Offline components (40 min, Deepak)
– Collaborative filtering (CF), methods for cold-start
• Online components + initialization (70 min, Bee-Chung)
– Time-series, online/incremental methods, explore/exploit (bandit)
• Evaluation methods (15 min, Deepak)
• Challenges (5 min, Deepak)
3
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Content Optimization
• Goal
– Effectively and “pro-actively” learn from user interactions with
content that are displayed to maximize our objectives.
• A new scientific discipline at the interface of
– Large scale Machine Learning & Statistics
• Offline Models
• Online Models
• Collaborative Filtering
• Explore/Exploit
– Multi-Objective Optimization in the presence of Uncertainty
• Click-rates (CTR), Engagement,….
– User Understanding
• Profile construction
– Content Understanding
• Topics, “aboutness”, entities, follow-up of something, breaking news,…
4
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Content Optimization: High level flowchart
• Flow
– Understand content (Offline)
– Serve content to optimize our objectives (Online)
– quickly learn from feedback obtained using ML/Statistics (Offline +
Online)
– Constantly enhance our content inventory to improve future
performance (Offline)
– Constantly enhance our user understanding to improve future
performance (Offline + Online)
– Iterate
5
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Some examples
• Simple version
– I have an important module on my page, content inventory is
obtained from a third party source which is further refined through
editorial oversight. Can I algorithmically recommend content on this
module? I want to drive up total CTR on this module
• More advanced
– I got X% lift in CTR. But I have additional information on other
downstream utilities (e.g. dwell time). Can I increase downstream
utility without losing too many clicks?
• Highly advanced
– There are multiple modules running on my website. How do I take
a holistic approach and perform a simultaneous optimization?
6
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Recommend applications
Recommend search queries
Recommend news article
Recommend packages:
Image
Title, summary
Links to other pages
Pick 4 out of a pool of K
K = 20 ~ 40
Dynamic
Routes traffic other pages
7
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Problems in this example
• Optimize CTR on different modules together in a holistic
way
– Today Module, Trending Now, Personal Assistant, News, Ads
– Treat them as independent?
• For a given module
– Optimize some combination of CTR, downstream engagement and
perhaps revenue.
8
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Single module CTR optimization problem
• Display “best” articles for each user visit
• Best - Maximize User Satisfaction, Engagement
– BUT Hard to obtain quick feedback to measure these
• Approximation
– Maximize utility based on immediate feedback (click rate) subject
to constraints (relevance, freshness, diversity)
• Inventory of articles?
– Created by human editors
– Small pool (30-50 articles) but refreshes periodically
9
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Recommendation: A Match-making Problem
Opportunity
Users, queries,
pages, …
Item Inventory
Articles, web page,
ads, …
Use an automated algorithm
to select item(s) to show
Get feedback (click, time spent,..)
Refine the models
Repeat (large number of times)
Measure metric(s) of interest
(Total clicks, Total revenue,…)
• Recommendation problems
• Search: Web, Vertical
• Online advertising
• …
10
Deepak Agarwal & Bee-Chung Chen @ KDD’10
• Items: Articles, web pages, ads, modules, queries, users, updates, etc.
• Opportunities: Users, query keywords, pages, etc.
• Metric (e.g., editorial score, CTR, revenue, engagement)
– Currently, most applications are single-objective
– May be multi-objective optimization (maximize X subject to Y, Z,..)
• Properties of the item pool
– Size (e.g., all web pages vs. 40 stories)
– Quality of the pool (e.g., anything vs. editorially selected)
– Lifetime (e.g., mostly old items vs. mostly new items)
Important Factors affecting solution in Match-making Problems
11
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Factors affecting Solution continued
• Properties of the opportunities
– Pull: Specified by explicit, user-driven query (e.g., keywords, a form)
– Push: Specified by implicit context (e.g., a page, a user, a session)
– Size (e.g., user base); continuity (e.g., session vs. single event)
• Properties of the feedback on the matches made
– Types and semantics of feedback (e.g., click, vote)
– Latency (e.g., available in 5 minutes vs. 1 day)
– Volume (e.g., 100K per day vs. 300M per day)
• Constraints specifying legitimate matches (e.g., business
rules)
• Available Metadata (e.g., link graph, various user/item attributes)
12
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Recommendation vs. Other Match-Making Problems
Ads
Anything
(except for ads)
Anything
(except for ads)
Items
Sponsored search
Content match
Behavior targeting
Display advertising
(non-guaranteed)
Web search
Vertical search
Recommend articles,
friends, feeds to
users
Recommend related
items given an item
Examples
Push
Pull (explicit)
Users specify their info
needs
Push (implicit)
The system guesses
users info needs
Opportunities
Revenue
Relevance to the query
User engagement
Main Metric
Advertising
Search
Recommendation
13
Deepak Agarwal & Bee-Chung Chen @ KDD’10
More on Recommendation vs. Search
• Recommendation
– User intent: See something “interesting” (browse mode, implicit)
• The system tries to guess what a user likes on an entity/topic
page
– No query reformulation (unless we suggest related topics/entities)
– False +ve more costly than false –ve
• Showing a bad article is a worse than missing a good one
• Search
– User intent: Explicit, users express what they want
– Users can reformulate queries
– False –ve more costly (but depends on the query)
• Users want to get the results they are looking for
14
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Modeling: Key components
Offline
(Logistic, GBDT,..)
Feature construction
Content: IR, clustering, taxonomy, entity,..
User profiles: clicks, views, social, community,..
Online
(Fine resolution
Corrections)
(item, user level)
(Quick updates)
Explore/Exploit
(Adaptive sampling)
Initialize
15
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Modeling Problems that has received attention
• Univariate response (e.g. click); single objective (e.g.
maximize CTR)
• Our solution
– Initialize online through offline models
– Learn “corrections” to offline models at very granular levels (user,
item) and learn rapidly in an online fashion
– Online correction models have reduced dimension through clever
representations of parameters and by exploiting the fallback
mechanism to coarser models
– The models are tightly coupled with Explore-exploit to ensure fast
convergence to areas of high valued response
Example Application:
Example Application:
Today Module on Yahoo! Homepage
Today Module on Yahoo! Homepage
Currently in production powered by some methods
discussed in this tutorial
17
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Recommend packages:
Image
Title, summary
Links to other pages
Pick 4 out of a pool of K
K = 20 ~ 40
Dynamic
Routes traffic other pages
1
1 2
2 3
3 4
4
18
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Problem definition
• Display “best” articles for each user visit
• Best - Maximize User Satisfaction, Engagement
– BUT Hard to obtain quick feedback to measure these
• Approximation
– Maximize utility based on immediate feedback (click rate) subject
to constraints (relevance, freshness, diversity)
• Inventory of articles?
– Created by human editors
– Small pool (30-50 articles) but refreshes periodically
19
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Where are we today?
• Before this research
– Articles created and selected for display by editors
• After this research
– Article placement done through statistical models
• How successful ?
"Just look at our homepage, for example. Since we began pairing our content
optimization technology with editorial expertise, we've seen click-through rates
in the Today module more than double. ----- Carol Bartz, CEO Yahoo! Inc (Q4,
2009)
20
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Main Goals
• Methods to select most popular articles
– This was done by editors before
• Provide personalized article selection
– Based on user covariates
– Based on per user behavior
• Scalability: Methods to generalize in small traffic scenarios
– Today module part of most Y! portals around the world
– Also syndicated to sources like Y! Mail, Y! IM etc
21
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Similar applications
• Goal: Use same methods for selecting most popular, personalization
across different applications at Y!
• Good news! Methods generalize, already in use
22
Deepak Agarwal & Bee-Chung Chen @ KDD’10
Next few hours
Prior estimation,
dimension reduction
Prior estimation
Intelligent Initialization
Bandits with covariates
Multi-armed bandits
Explore/Exploit
Incremental CF,
online regression
Time-series models
Online Models
Collaborative filtering
(cold-start problem)
Offline Models
Personalized
Recommendation
Most Popular
Recommendation

More Related Content

Similar to Recommender Problems Introduction

114 20 Product Prioritization Frameworks
114   20 Product Prioritization Frameworks114   20 Product Prioritization Frameworks
114 20 Product Prioritization FrameworksProductCamp Boston
 
Building an Excellent Web Startup
Building an Excellent Web StartupBuilding an Excellent Web Startup
Building an Excellent Web Startupmatthewhyatt
 
Small Business SEO Tips and Strategies For 2013 - Chaosmap.com
Small Business SEO Tips and Strategies For 2013 - Chaosmap.comSmall Business SEO Tips and Strategies For 2013 - Chaosmap.com
Small Business SEO Tips and Strategies For 2013 - Chaosmap.comJon Rognerud Chaosmap Digital
 
What Makes a Great Engineering Culture?
What Makes a Great Engineering Culture?What Makes a Great Engineering Culture?
What Makes a Great Engineering Culture?Edmond Lau
 
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launchLean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launchPeople10 Technosoft Private Limited
 
Agile and data driven product development oleh Dhiku VP Product KMK Online
Agile and data driven product development oleh Dhiku VP Product KMK OnlineAgile and data driven product development oleh Dhiku VP Product KMK Online
Agile and data driven product development oleh Dhiku VP Product KMK OnlineRein Mahatma
 
A feature guide to QUT's Digital Workplace (Intranets2016)
A feature guide to QUT's Digital Workplace (Intranets2016)A feature guide to QUT's Digital Workplace (Intranets2016)
A feature guide to QUT's Digital Workplace (Intranets2016)Andy McBride
 
Google Analytics: MVPs and Game-Changing New Features
Google Analytics: MVPs and Game-Changing New FeaturesGoogle Analytics: MVPs and Game-Changing New Features
Google Analytics: MVPs and Game-Changing New FeaturesBrian Alpert
 
User Experience Design: an Overview
User Experience Design: an OverviewUser Experience Design: an Overview
User Experience Design: an OverviewJulie Grundy
 
Aegora mvp-presentation v1-1
Aegora mvp-presentation v1-1Aegora mvp-presentation v1-1
Aegora mvp-presentation v1-1Aegora
 
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018 Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018 Ria Sankar
 
The Ultimate Website Development Roadmap
The Ultimate Website Development RoadmapThe Ultimate Website Development Roadmap
The Ultimate Website Development RoadmapAdina Zaiontz
 
UX Webinar: Always Be Testing
UX Webinar: Always Be TestingUX Webinar: Always Be Testing
UX Webinar: Always Be TestingCharity Dynamics
 
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...museums and the web
 
Website Presentation
Website PresentationWebsite Presentation
Website PresentationAndrew Berg
 
How should we change our approach to content creation to fit agile ways of wo...
How should we change our approach to content creation to fit agile ways of wo...How should we change our approach to content creation to fit agile ways of wo...
How should we change our approach to content creation to fit agile ways of wo...CharityComms
 
Digital Marketing Course Week 3: Digital Assets
Digital Marketing Course Week 3: Digital AssetsDigital Marketing Course Week 3: Digital Assets
Digital Marketing Course Week 3: Digital AssetsAyca Turhan
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesPhilipp Klöckner
 

Similar to Recommender Problems Introduction (20)

114 20 Product Prioritization Frameworks
114   20 Product Prioritization Frameworks114   20 Product Prioritization Frameworks
114 20 Product Prioritization Frameworks
 
Building an Excellent Web Startup
Building an Excellent Web StartupBuilding an Excellent Web Startup
Building an Excellent Web Startup
 
Small Business SEO Tips and Strategies For 2013 - Chaosmap.com
Small Business SEO Tips and Strategies For 2013 - Chaosmap.comSmall Business SEO Tips and Strategies For 2013 - Chaosmap.com
Small Business SEO Tips and Strategies For 2013 - Chaosmap.com
 
What Makes a Great Engineering Culture?
What Makes a Great Engineering Culture?What Makes a Great Engineering Culture?
What Makes a Great Engineering Culture?
 
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launchLean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
 
Agile and data driven product development oleh Dhiku VP Product KMK Online
Agile and data driven product development oleh Dhiku VP Product KMK OnlineAgile and data driven product development oleh Dhiku VP Product KMK Online
Agile and data driven product development oleh Dhiku VP Product KMK Online
 
A feature guide to QUT's Digital Workplace (Intranets2016)
A feature guide to QUT's Digital Workplace (Intranets2016)A feature guide to QUT's Digital Workplace (Intranets2016)
A feature guide to QUT's Digital Workplace (Intranets2016)
 
Google Analytics: MVPs and Game-Changing New Features
Google Analytics: MVPs and Game-Changing New FeaturesGoogle Analytics: MVPs and Game-Changing New Features
Google Analytics: MVPs and Game-Changing New Features
 
User Experience Design: an Overview
User Experience Design: an OverviewUser Experience Design: an Overview
User Experience Design: an Overview
 
Aegora mvp-presentation v1-1
Aegora mvp-presentation v1-1Aegora mvp-presentation v1-1
Aegora mvp-presentation v1-1
 
Cs548 s15 showcase_web_mining
Cs548 s15 showcase_web_miningCs548 s15 showcase_web_mining
Cs548 s15 showcase_web_mining
 
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018 Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018
 
The Ultimate Website Development Roadmap
The Ultimate Website Development RoadmapThe Ultimate Website Development Roadmap
The Ultimate Website Development Roadmap
 
UX Webinar: Always Be Testing
UX Webinar: Always Be TestingUX Webinar: Always Be Testing
UX Webinar: Always Be Testing
 
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
 
Website Presentation
Website PresentationWebsite Presentation
Website Presentation
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 
How should we change our approach to content creation to fit agile ways of wo...
How should we change our approach to content creation to fit agile ways of wo...How should we change our approach to content creation to fit agile ways of wo...
How should we change our approach to content creation to fit agile ways of wo...
 
Digital Marketing Course Week 3: Digital Assets
Digital Marketing Course Week 3: Digital AssetsDigital Marketing Course Week 3: Digital Assets
Digital Marketing Course Week 3: Digital Assets
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge Websites
 

Recently uploaded

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Recently uploaded (20)

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 

Recommender Problems Introduction

  • 1. KDD KDD’ ’10 Tutorial: 10 Tutorial: Recommender Problems for Recommender Problems for Web Applications Web Applications Deepak Agarwal and Bee-Chung Chen Yahoo! Research
  • 2. 2 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Agenda • Focus: – Recommender problems for dynamic dynamic, time time- -sensitive sensitive applications • Content Optimization • Introduction (20 min, Deepak) – Content optimization, match-making, example applications • Offline components (40 min, Deepak) – Collaborative filtering (CF), methods for cold-start • Online components + initialization (70 min, Bee-Chung) – Time-series, online/incremental methods, explore/exploit (bandit) • Evaluation methods (15 min, Deepak) • Challenges (5 min, Deepak)
  • 3. 3 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Content Optimization • Goal – Effectively and “pro-actively” learn from user interactions with content that are displayed to maximize our objectives. • A new scientific discipline at the interface of – Large scale Machine Learning & Statistics • Offline Models • Online Models • Collaborative Filtering • Explore/Exploit – Multi-Objective Optimization in the presence of Uncertainty • Click-rates (CTR), Engagement,…. – User Understanding • Profile construction – Content Understanding • Topics, “aboutness”, entities, follow-up of something, breaking news,…
  • 4. 4 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Content Optimization: High level flowchart • Flow – Understand content (Offline) – Serve content to optimize our objectives (Online) – quickly learn from feedback obtained using ML/Statistics (Offline + Online) – Constantly enhance our content inventory to improve future performance (Offline) – Constantly enhance our user understanding to improve future performance (Offline + Online) – Iterate
  • 5. 5 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Some examples • Simple version – I have an important module on my page, content inventory is obtained from a third party source which is further refined through editorial oversight. Can I algorithmically recommend content on this module? I want to drive up total CTR on this module • More advanced – I got X% lift in CTR. But I have additional information on other downstream utilities (e.g. dwell time). Can I increase downstream utility without losing too many clicks? • Highly advanced – There are multiple modules running on my website. How do I take a holistic approach and perform a simultaneous optimization?
  • 6. 6 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Recommend applications Recommend search queries Recommend news article Recommend packages: Image Title, summary Links to other pages Pick 4 out of a pool of K K = 20 ~ 40 Dynamic Routes traffic other pages
  • 7. 7 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Problems in this example • Optimize CTR on different modules together in a holistic way – Today Module, Trending Now, Personal Assistant, News, Ads – Treat them as independent? • For a given module – Optimize some combination of CTR, downstream engagement and perhaps revenue.
  • 8. 8 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Single module CTR optimization problem • Display “best” articles for each user visit • Best - Maximize User Satisfaction, Engagement – BUT Hard to obtain quick feedback to measure these • Approximation – Maximize utility based on immediate feedback (click rate) subject to constraints (relevance, freshness, diversity) • Inventory of articles? – Created by human editors – Small pool (30-50 articles) but refreshes periodically
  • 9. 9 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Recommendation: A Match-making Problem Opportunity Users, queries, pages, … Item Inventory Articles, web page, ads, … Use an automated algorithm to select item(s) to show Get feedback (click, time spent,..) Refine the models Repeat (large number of times) Measure metric(s) of interest (Total clicks, Total revenue,…) • Recommendation problems • Search: Web, Vertical • Online advertising • …
  • 10. 10 Deepak Agarwal & Bee-Chung Chen @ KDD’10 • Items: Articles, web pages, ads, modules, queries, users, updates, etc. • Opportunities: Users, query keywords, pages, etc. • Metric (e.g., editorial score, CTR, revenue, engagement) – Currently, most applications are single-objective – May be multi-objective optimization (maximize X subject to Y, Z,..) • Properties of the item pool – Size (e.g., all web pages vs. 40 stories) – Quality of the pool (e.g., anything vs. editorially selected) – Lifetime (e.g., mostly old items vs. mostly new items) Important Factors affecting solution in Match-making Problems
  • 11. 11 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Factors affecting Solution continued • Properties of the opportunities – Pull: Specified by explicit, user-driven query (e.g., keywords, a form) – Push: Specified by implicit context (e.g., a page, a user, a session) – Size (e.g., user base); continuity (e.g., session vs. single event) • Properties of the feedback on the matches made – Types and semantics of feedback (e.g., click, vote) – Latency (e.g., available in 5 minutes vs. 1 day) – Volume (e.g., 100K per day vs. 300M per day) • Constraints specifying legitimate matches (e.g., business rules) • Available Metadata (e.g., link graph, various user/item attributes)
  • 12. 12 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Recommendation vs. Other Match-Making Problems Ads Anything (except for ads) Anything (except for ads) Items Sponsored search Content match Behavior targeting Display advertising (non-guaranteed) Web search Vertical search Recommend articles, friends, feeds to users Recommend related items given an item Examples Push Pull (explicit) Users specify their info needs Push (implicit) The system guesses users info needs Opportunities Revenue Relevance to the query User engagement Main Metric Advertising Search Recommendation
  • 13. 13 Deepak Agarwal & Bee-Chung Chen @ KDD’10 More on Recommendation vs. Search • Recommendation – User intent: See something “interesting” (browse mode, implicit) • The system tries to guess what a user likes on an entity/topic page – No query reformulation (unless we suggest related topics/entities) – False +ve more costly than false –ve • Showing a bad article is a worse than missing a good one • Search – User intent: Explicit, users express what they want – Users can reformulate queries – False –ve more costly (but depends on the query) • Users want to get the results they are looking for
  • 14. 14 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Modeling: Key components Offline (Logistic, GBDT,..) Feature construction Content: IR, clustering, taxonomy, entity,.. User profiles: clicks, views, social, community,.. Online (Fine resolution Corrections) (item, user level) (Quick updates) Explore/Exploit (Adaptive sampling) Initialize
  • 15. 15 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Modeling Problems that has received attention • Univariate response (e.g. click); single objective (e.g. maximize CTR) • Our solution – Initialize online through offline models – Learn “corrections” to offline models at very granular levels (user, item) and learn rapidly in an online fashion – Online correction models have reduced dimension through clever representations of parameters and by exploiting the fallback mechanism to coarser models – The models are tightly coupled with Explore-exploit to ensure fast convergence to areas of high valued response
  • 16. Example Application: Example Application: Today Module on Yahoo! Homepage Today Module on Yahoo! Homepage Currently in production powered by some methods discussed in this tutorial
  • 17. 17 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Recommend packages: Image Title, summary Links to other pages Pick 4 out of a pool of K K = 20 ~ 40 Dynamic Routes traffic other pages 1 1 2 2 3 3 4 4
  • 18. 18 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Problem definition • Display “best” articles for each user visit • Best - Maximize User Satisfaction, Engagement – BUT Hard to obtain quick feedback to measure these • Approximation – Maximize utility based on immediate feedback (click rate) subject to constraints (relevance, freshness, diversity) • Inventory of articles? – Created by human editors – Small pool (30-50 articles) but refreshes periodically
  • 19. 19 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Where are we today? • Before this research – Articles created and selected for display by editors • After this research – Article placement done through statistical models • How successful ? "Just look at our homepage, for example. Since we began pairing our content optimization technology with editorial expertise, we've seen click-through rates in the Today module more than double. ----- Carol Bartz, CEO Yahoo! Inc (Q4, 2009)
  • 20. 20 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Main Goals • Methods to select most popular articles – This was done by editors before • Provide personalized article selection – Based on user covariates – Based on per user behavior • Scalability: Methods to generalize in small traffic scenarios – Today module part of most Y! portals around the world – Also syndicated to sources like Y! Mail, Y! IM etc
  • 21. 21 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Similar applications • Goal: Use same methods for selecting most popular, personalization across different applications at Y! • Good news! Methods generalize, already in use
  • 22. 22 Deepak Agarwal & Bee-Chung Chen @ KDD’10 Next few hours Prior estimation, dimension reduction Prior estimation Intelligent Initialization Bandits with covariates Multi-armed bandits Explore/Exploit Incremental CF, online regression Time-series models Online Models Collaborative filtering (cold-start problem) Offline Models Personalized Recommendation Most Popular Recommendation