SlideShare a Scribd company logo
ICML’11 Tutorial:
Recommender Problems for
       Web Applications
Deepak Agarwal and Bee-Chung Chen
                   Yahoo! Research
Other Significant Y! Labs Contributors

• Content Targeting
    –   Pradheep Elango
    –   Rajiv Khanna
    –   Raghu Ramakrishnan
    –   Xuanhui Wang
    –   Liang Zhang


• Ad Targeting
    – Nagaraj Kota




  Deepak Agarwal & Bee-Chung Chen @ ICML’11   2
Agenda

• Topic of Interest
    – Recommender problems for dynamic, time-sensitive applications
                                 dynamic
       • Content Optimization (main focus today), Online Advertising,
         Movie recommendation, shopping,…
• Introduction (20 min, Deepak)
• Offline components (40 min, Deepak)
    – Regression, Collaborative filtering (CF), …
• Online components + initialization (70 min, Bee-Chung)
    – Time-series, online/incremental methods, explore/exploit (bandit)
• Evaluation methods + Multi-Objective (10-15 min, Deepak)
• Challenges (5-10 min, Deepak)


  Deepak Agarwal & Bee-Chung Chen @ ICML’11                           3
Three components we will focus on today

• Defining the problem
    – Formulate objectives whose optimization achieves some long-
      term goals for the recommender system
         • E.g. How to serve content to optimize audience reach and engagement,
           optimize some combination of engagement and revenue ?

• Modeling (to estimate some critical inputs)
    – Predict rates of some positive user interaction(s) with items based
      on data obtained from historical user-item interactions
       • E.g. Click rates, average time-spent on page, etc
       • Could be explicit feedback like ratings
• Experimentation
    – Create experiments to collect data proactively to improve models,
      helps in converging to the best choice(s) cheaply and rapidly.
       • Explore and Exploit (continuous experimentation)
       • DOE (testing hypotheses by avoiding bias inherent in data)
  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                       4
Modern Recommendation Systems

• Goal
    – Serve the right item to a user in a given context to optimize long-
      term business objectives
• A scientific discipline that involves
    – Large scale Machine Learning & Statistics
         • Offline Models (capture global & stable characteristics)
         • Online Models (incorporates dynamic components)
         • Explore/Exploit (active and adaptive experimentation)
    – Multi-Objective Optimization
         • Click-rates (CTR), Engagement, advertising revenue, diversity, etc
    – Inferring user interest
         • Constructing User Profiles
    – Natural Language Processing to understand content
         • Topics, “aboutness”, entities, follow-up of something, breaking news,…


  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                     5
Some examples from content optimization

• Simple version
    – I have a content module on my page, content inventory is obtained
      from a third party source which is further refined through editorial
      oversight. Can I algorithmically recommend content on this
      module? I want to improve overall click-rate (CTR) on this module
• More advanced
    – I got X% lift in CTR. But I have additional information on other
      downstream utilities (e.g. advertising revenue). Can I increase
      downstream utility without losing too many clicks?
• Highly advanced
    – There are multiple modules running on my webpage. How do I
      perform a simultaneous optimization?




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                              6
Recommend search queries

                                              Recommend packages:
                                                Image
                                                Title, summary
                                                Links to other pages

                                              Pick 4 out of a pool of K
                                                K = 20 ~ 40
                                                Dynamic

                                              Routes traffic other pages

                     Recommend applications   Recommend news article
Deepak Agarwal & Bee-Chung Chen @ ICML’11                                  7
Problems in this example

• Optimize CTR on multiple modules
    – Today Module, Trending Now, Personal Assistant, News
    – Simple solution: Treat modules as independent, optimize
      separately. May not be the best when there are strong correlations.



• For any single module
    – Optimize some combination of CTR, downstream engagement,
      and perhaps advertising revenue.




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                           8
Online Advertising

                            Response rates
                            (click, conversion, ad-view)
                                                                            Bids
     conversion
                          ML /Statistical                  Auction
                          model




                                                                                        Advertisers
                                                Select argmax f(bid,response rates)
                                Click
                                              Recommend
                             Ads
                                                Best ad(s)
                                                                Ad Network
                            Page
   User                                                           •Examples:
                                                             Yahoo, Google, MSN, …
                                                            Ad exchanges (RightMedia,
                                                                 DoubleClick, …)

                             Publisher
  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                                  9
Recommender problems in general

  • Example applications
      • Search: Web, Vertical
      • Online Advertising                           Item Inventory
      • Content                                      Articles, web page,
      •…..                                                  ads, …

                   Context
                      query, page, …
                                               Use an automated algorithm
                                                 to select item(s) to show

                                             Get feedback (click, time spent,..)
        USER
                                                    Refine the models

                                              Repeat (large number of times)
                                               Optimize metric(s) of interest
                                              (Total clicks, Total revenue,…)


 Deepak Agarwal & Bee-Chung Chen @ ICML’11                                         10
Important Factors

  • Items: Articles, ads, modules, movies, users, updates, etc.

  • Context: query keywords, pages, mobile, social media, etc.

  • Metric to optimize (e.g., relevance score, CTR, revenue, engagement)
      – Currently, most applications are single-objective
      – Could be multi-objective optimization (maximize X subject to Y, Z,..)


  • Properties of the item pool
      – Size (e.g., all web pages vs. 40 stories)
      – Quality of the pool (e.g., anything vs. editorially selected)
      – Lifetime (e.g., mostly old items vs. mostly new items)




   Deepak Agarwal & Bee-Chung Chen @ ICML’11                                    11
Factors affecting Solution (continued)

• Properties of the context
    – Pull: Specified by explicit, user-driven query (e.g., keywords, a form)
    – Push: Specified by implicit context (e.g., a page, a user, a session)
       • Most applications are somewhere on continuum of pull and push

• Properties of the feedback on the matches made
    – Types and semantics of feedback (e.g., click, vote)
    – Latency (e.g., available in 5 minutes vs. 1 day)
    – Volume (e.g., 100K per day vs. 300M per day)

• Constraints specifying legitimate matches
    – e.g., business rules, diversity rules, editorial Voice
    – Multiple objectives
• Available Metadata (e.g., link graph, various user/item attributes)


  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                     12
Predicting User-Item Interactions (e.g. CTR)

• Myth: We have so much data on the web, if we can only
  process it the problem is solved
    – Number of things to learn increases with sample size
       • Rate of increase is not slow
    – Dynamic nature of systems make things worse
    – We want to learn things quickly and react fast


• Data is sparse in web recommender problems
    – We lack enough data to learn all we want to learn and as quickly
      as we would like to learn
    – Several Power laws interacting with each other
       • E.g. User visits power law, items served power law
               – Bivariate Zipf: Owen & Dyer, 2011


  Deepak Agarwal & Bee-Chung Chen @ ICML’11                              13
Can Machine Learning help?

• Fortunately, there are group behaviors that generalize to
  individuals & they are relatively stable
    – E.g. Users in San Francisco tend to read more baseball news


• Key issue: Estimating such groups
    – Coarse group : more stable but does not generalize that well.
    – Granular group: less stable with few individuals
    – Getting a good grouping structure is to hit the “sweet spot”


• Another big advantage on the web
    – Intervene and run small experiments on a small population to
      collect data that helps rapid convergence to the best choices(s)
         • We don’t need to learn all user-item interactions, only those that are good.


  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                               14
Predicting user-item interaction rates

                                         Feature construction
                               Content: IR, clustering, taxonomy, entity,..
                             User profiles: clicks, views, social, community,..




             Offline                                                      Online
( Captures stable characteristics       Initialize                   (Finer resolution
     at coarse resolutions)                                             Corrections)
    (Logistic, Boosting,….)                                          (item, user level)
                                                                     (Quick updates)



                                        Explore/Exploit
                                     (Adaptive sampling)
                                  (helps rapid convergence
                                       to best choices)

    Deepak Agarwal & Bee-Chung Chen @ ICML’11                                             15
Post-click: An example in Content Optimization

Recommender                •
                           EDITORIAL
                                                AD SERVER

               Clicks on FP links influence       DISPLAY
               downstream supply distribution
content                                         ADVERTISING Revenue




                                                  Downstream
                                                  engagement

                                                  (Time spent)




   Deepak Agarwal & Bee-Chung Chen @ ICML’11                     16
Serving Content on Front Page: Click Shaping

• What do we want to optimize?
•   Current: Maximize clicks (maximize downstream supply from FP)
•   But consider the following
      – Article 1: CTR=5%, utility per click = 5
      – Article 2: CTR=4.9%, utility per click=10
         • By promoting 2, we lose 1 click/100 visits, gain 5 utils
•   If we do this for a large number of visits --- lose some clicks but obtain
    significant gains in utility?
      – E.g. lose 5% relative CTR, gain 40% in utility (revenue, engagement, etc)




    Deepak Agarwal & Bee-Chung Chen @ ICML’11                                   17
Example Application:
      Today Module on Yahoo! Homepage

Currently in production, powered by some methods
                           discussed in this tutorial
Recommend packages:
                                                      Image
                                                      Title, summary
                      1        2            3   4     Links to other pages

                                                    Pick 4 out of a pool of K
                                                      K = 20 ~ 40
                                                      Dynamic

                                                    Routes traffic other pages



Deepak Agarwal & Bee-Chung Chen @ ICML’11                                        19
Problem definition

• Display “best” articles for each user visit
• Best - Maximize User Satisfaction, Engagement
    – BUT Hard to obtain quick feedback to measure these


• Approximation
    – Maximize utility based on immediate feedback (click rate) subject
      to constraints (relevance, freshness, diversity)
• Inventory of articles?
    – Created by human editors
    – Small pool (30-50 articles) but refreshes periodically




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                           20
Where are we today?

• Before this research
    – Articles created and selected for display by editors
• After this research
    – Article placement done through statistical models
• How successful ?
 "Just look at our homepage, for example. Since we began pairing our content
   optimization technology with editorial expertise, we've seen click-through rates
   in the Today module more than double. ----- Carol Bartz, CEO Yahoo! Inc (Q4,
   2009)




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                     21
Main Goals

• Methods to select most popular articles
    – This was done by editors before


• Provide personalized article selection
    – Based on user covariates
    – Based on per user behavior


• Scalability: Methods to generalize in small traffic scenarios
    – Today module part of most Y! portals around the world
    – Also syndicated to sources like Y! Mail, Y! IM etc




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                   22
Similar applications

•   Goal: Use same methods for selecting most popular, personalization
    across different applications at Y!
•   Good news! Methods generalize, already in use




    Deepak Agarwal & Bee-Chung Chen @ ICML’11                        23
Next few hours



                                   Most Popular          Personalized
                                   Recommendation        Recommendation

  Offline Models                                         Collaborative filtering
                                                         (cold-start problem)

  Online Models                    Time-series models    Incremental CF,
                                                         online regression

  Intelligent Initialization       Prior estimation      Prior estimation,
                                                         dimension reduction

  Explore/Exploit                  Multi-armed bandits   Bandits with covariates




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                        24

More Related Content

Similar to Recommender Systems Tutorial (Part 1) -- Introduction

Recommendation engines matching items to users
Recommendation engines matching items to usersRecommendation engines matching items to users
Recommendation engines matching items to users
Flytxt
 
kdd2015
kdd2015kdd2015
kdd2015
Deepak Agarwal
 
Scalable advertising recommender systems
Scalable advertising recommender systemsScalable advertising recommender systems
Scalable advertising recommender systems
Joaquin Delgado PhD.
 
Aiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversion
Deepak Agarwal
 
Preference Elicitation Interface
Preference Elicitation InterfacePreference Elicitation Interface
Preference Elicitation Interface
晓愚 孟
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
Betclic Everest Group Tech Team
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Christian Posse
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge Websites
Philipp Klöckner
 
L'Oreal Tech Talk
L'Oreal Tech TalkL'Oreal Tech Talk
L'Oreal Tech Talk
Doug Chang
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
Liangjie Hong
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search Experience
Marianne Sweeny
 
Data Science for Digital Commerce
Data Science for Digital CommerceData Science for Digital Commerce
Data Science for Digital Commerce
Manish Gupta, Ph.D.
 
Sept 20 2012 ona show me the numbers
Sept 20 2012 ona  show me the numbersSept 20 2012 ona  show me the numbers
Sept 20 2012 ona show me the numbers
Hack the Hood
 
SEO + MTurk
SEO + MTurk SEO + MTurk
SEO + MTurk
Mechanical Turk
 
Conversion Rate Optmization
Conversion Rate OptmizationConversion Rate Optmization
Conversion Rate Optmization
Edureka!
 
Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016
Stanford University
 
Business model innovation in the cloud v1
Business model innovation in the cloud v1Business model innovation in the cloud v1
Business model innovation in the cloud v1
Michael Netzley, Ph.D.
 
Social Media Use Cases - Webinar
Social Media Use Cases - Webinar Social Media Use Cases - Webinar
Social Media Use Cases - Webinar
Mechanical Turk
 
Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)
Joni Salminen
 
Pubcon Presentation on How to Resolve Search Engine Reputation Problems
Pubcon Presentation on How to Resolve Search Engine Reputation ProblemsPubcon Presentation on How to Resolve Search Engine Reputation Problems
Pubcon Presentation on How to Resolve Search Engine Reputation Problems
Online Reputation Management
 

Similar to Recommender Systems Tutorial (Part 1) -- Introduction (20)

Recommendation engines matching items to users
Recommendation engines matching items to usersRecommendation engines matching items to users
Recommendation engines matching items to users
 
kdd2015
kdd2015kdd2015
kdd2015
 
Scalable advertising recommender systems
Scalable advertising recommender systemsScalable advertising recommender systems
Scalable advertising recommender systems
 
Aiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversion
 
Preference Elicitation Interface
Preference Elicitation InterfacePreference Elicitation Interface
Preference Elicitation Interface
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge Websites
 
L'Oreal Tech Talk
L'Oreal Tech TalkL'Oreal Tech Talk
L'Oreal Tech Talk
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search Experience
 
Data Science for Digital Commerce
Data Science for Digital CommerceData Science for Digital Commerce
Data Science for Digital Commerce
 
Sept 20 2012 ona show me the numbers
Sept 20 2012 ona  show me the numbersSept 20 2012 ona  show me the numbers
Sept 20 2012 ona show me the numbers
 
SEO + MTurk
SEO + MTurk SEO + MTurk
SEO + MTurk
 
Conversion Rate Optmization
Conversion Rate OptmizationConversion Rate Optmization
Conversion Rate Optmization
 
Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016
 
Business model innovation in the cloud v1
Business model innovation in the cloud v1Business model innovation in the cloud v1
Business model innovation in the cloud v1
 
Social Media Use Cases - Webinar
Social Media Use Cases - Webinar Social Media Use Cases - Webinar
Social Media Use Cases - Webinar
 
Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)
 
Pubcon Presentation on How to Resolve Search Engine Reputation Problems
Pubcon Presentation on How to Resolve Search Engine Reputation ProblemsPubcon Presentation on How to Resolve Search Engine Reputation Problems
Pubcon Presentation on How to Resolve Search Engine Reputation Problems
 

Recently uploaded

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

Recommender Systems Tutorial (Part 1) -- Introduction

  • 1. ICML’11 Tutorial: Recommender Problems for Web Applications Deepak Agarwal and Bee-Chung Chen Yahoo! Research
  • 2. Other Significant Y! Labs Contributors • Content Targeting – Pradheep Elango – Rajiv Khanna – Raghu Ramakrishnan – Xuanhui Wang – Liang Zhang • Ad Targeting – Nagaraj Kota Deepak Agarwal & Bee-Chung Chen @ ICML’11 2
  • 3. Agenda • Topic of Interest – Recommender problems for dynamic, time-sensitive applications dynamic • Content Optimization (main focus today), Online Advertising, Movie recommendation, shopping,… • Introduction (20 min, Deepak) • Offline components (40 min, Deepak) – Regression, Collaborative filtering (CF), … • Online components + initialization (70 min, Bee-Chung) – Time-series, online/incremental methods, explore/exploit (bandit) • Evaluation methods + Multi-Objective (10-15 min, Deepak) • Challenges (5-10 min, Deepak) Deepak Agarwal & Bee-Chung Chen @ ICML’11 3
  • 4. Three components we will focus on today • Defining the problem – Formulate objectives whose optimization achieves some long- term goals for the recommender system • E.g. How to serve content to optimize audience reach and engagement, optimize some combination of engagement and revenue ? • Modeling (to estimate some critical inputs) – Predict rates of some positive user interaction(s) with items based on data obtained from historical user-item interactions • E.g. Click rates, average time-spent on page, etc • Could be explicit feedback like ratings • Experimentation – Create experiments to collect data proactively to improve models, helps in converging to the best choice(s) cheaply and rapidly. • Explore and Exploit (continuous experimentation) • DOE (testing hypotheses by avoiding bias inherent in data) Deepak Agarwal & Bee-Chung Chen @ ICML’11 4
  • 5. Modern Recommendation Systems • Goal – Serve the right item to a user in a given context to optimize long- term business objectives • A scientific discipline that involves – Large scale Machine Learning & Statistics • Offline Models (capture global & stable characteristics) • Online Models (incorporates dynamic components) • Explore/Exploit (active and adaptive experimentation) – Multi-Objective Optimization • Click-rates (CTR), Engagement, advertising revenue, diversity, etc – Inferring user interest • Constructing User Profiles – Natural Language Processing to understand content • Topics, “aboutness”, entities, follow-up of something, breaking news,… Deepak Agarwal & Bee-Chung Chen @ ICML’11 5
  • 6. Some examples from content optimization • Simple version – I have a content module on my page, content inventory is obtained from a third party source which is further refined through editorial oversight. Can I algorithmically recommend content on this module? I want to improve overall click-rate (CTR) on this module • More advanced – I got X% lift in CTR. But I have additional information on other downstream utilities (e.g. advertising revenue). Can I increase downstream utility without losing too many clicks? • Highly advanced – There are multiple modules running on my webpage. How do I perform a simultaneous optimization? Deepak Agarwal & Bee-Chung Chen @ ICML’11 6
  • 7. Recommend search queries Recommend packages: Image Title, summary Links to other pages Pick 4 out of a pool of K K = 20 ~ 40 Dynamic Routes traffic other pages Recommend applications Recommend news article Deepak Agarwal & Bee-Chung Chen @ ICML’11 7
  • 8. Problems in this example • Optimize CTR on multiple modules – Today Module, Trending Now, Personal Assistant, News – Simple solution: Treat modules as independent, optimize separately. May not be the best when there are strong correlations. • For any single module – Optimize some combination of CTR, downstream engagement, and perhaps advertising revenue. Deepak Agarwal & Bee-Chung Chen @ ICML’11 8
  • 9. Online Advertising Response rates (click, conversion, ad-view) Bids conversion ML /Statistical Auction model Advertisers Select argmax f(bid,response rates) Click Recommend Ads Best ad(s) Ad Network Page User •Examples: Yahoo, Google, MSN, … Ad exchanges (RightMedia, DoubleClick, …) Publisher Deepak Agarwal & Bee-Chung Chen @ ICML’11 9
  • 10. Recommender problems in general • Example applications • Search: Web, Vertical • Online Advertising Item Inventory • Content Articles, web page, •….. ads, … Context query, page, … Use an automated algorithm to select item(s) to show Get feedback (click, time spent,..) USER Refine the models Repeat (large number of times) Optimize metric(s) of interest (Total clicks, Total revenue,…) Deepak Agarwal & Bee-Chung Chen @ ICML’11 10
  • 11. Important Factors • Items: Articles, ads, modules, movies, users, updates, etc. • Context: query keywords, pages, mobile, social media, etc. • Metric to optimize (e.g., relevance score, CTR, revenue, engagement) – Currently, most applications are single-objective – Could be multi-objective optimization (maximize X subject to Y, Z,..) • Properties of the item pool – Size (e.g., all web pages vs. 40 stories) – Quality of the pool (e.g., anything vs. editorially selected) – Lifetime (e.g., mostly old items vs. mostly new items) Deepak Agarwal & Bee-Chung Chen @ ICML’11 11
  • 12. Factors affecting Solution (continued) • Properties of the context – Pull: Specified by explicit, user-driven query (e.g., keywords, a form) – Push: Specified by implicit context (e.g., a page, a user, a session) • Most applications are somewhere on continuum of pull and push • Properties of the feedback on the matches made – Types and semantics of feedback (e.g., click, vote) – Latency (e.g., available in 5 minutes vs. 1 day) – Volume (e.g., 100K per day vs. 300M per day) • Constraints specifying legitimate matches – e.g., business rules, diversity rules, editorial Voice – Multiple objectives • Available Metadata (e.g., link graph, various user/item attributes) Deepak Agarwal & Bee-Chung Chen @ ICML’11 12
  • 13. Predicting User-Item Interactions (e.g. CTR) • Myth: We have so much data on the web, if we can only process it the problem is solved – Number of things to learn increases with sample size • Rate of increase is not slow – Dynamic nature of systems make things worse – We want to learn things quickly and react fast • Data is sparse in web recommender problems – We lack enough data to learn all we want to learn and as quickly as we would like to learn – Several Power laws interacting with each other • E.g. User visits power law, items served power law – Bivariate Zipf: Owen & Dyer, 2011 Deepak Agarwal & Bee-Chung Chen @ ICML’11 13
  • 14. Can Machine Learning help? • Fortunately, there are group behaviors that generalize to individuals & they are relatively stable – E.g. Users in San Francisco tend to read more baseball news • Key issue: Estimating such groups – Coarse group : more stable but does not generalize that well. – Granular group: less stable with few individuals – Getting a good grouping structure is to hit the “sweet spot” • Another big advantage on the web – Intervene and run small experiments on a small population to collect data that helps rapid convergence to the best choices(s) • We don’t need to learn all user-item interactions, only those that are good. Deepak Agarwal & Bee-Chung Chen @ ICML’11 14
  • 15. Predicting user-item interaction rates Feature construction Content: IR, clustering, taxonomy, entity,.. User profiles: clicks, views, social, community,.. Offline Online ( Captures stable characteristics Initialize (Finer resolution at coarse resolutions) Corrections) (Logistic, Boosting,….) (item, user level) (Quick updates) Explore/Exploit (Adaptive sampling) (helps rapid convergence to best choices) Deepak Agarwal & Bee-Chung Chen @ ICML’11 15
  • 16. Post-click: An example in Content Optimization Recommender • EDITORIAL AD SERVER Clicks on FP links influence DISPLAY downstream supply distribution content ADVERTISING Revenue Downstream engagement (Time spent) Deepak Agarwal & Bee-Chung Chen @ ICML’11 16
  • 17. Serving Content on Front Page: Click Shaping • What do we want to optimize? • Current: Maximize clicks (maximize downstream supply from FP) • But consider the following – Article 1: CTR=5%, utility per click = 5 – Article 2: CTR=4.9%, utility per click=10 • By promoting 2, we lose 1 click/100 visits, gain 5 utils • If we do this for a large number of visits --- lose some clicks but obtain significant gains in utility? – E.g. lose 5% relative CTR, gain 40% in utility (revenue, engagement, etc) Deepak Agarwal & Bee-Chung Chen @ ICML’11 17
  • 18. Example Application: Today Module on Yahoo! Homepage Currently in production, powered by some methods discussed in this tutorial
  • 19. Recommend packages: Image Title, summary 1 2 3 4 Links to other pages Pick 4 out of a pool of K K = 20 ~ 40 Dynamic Routes traffic other pages Deepak Agarwal & Bee-Chung Chen @ ICML’11 19
  • 20. Problem definition • Display “best” articles for each user visit • Best - Maximize User Satisfaction, Engagement – BUT Hard to obtain quick feedback to measure these • Approximation – Maximize utility based on immediate feedback (click rate) subject to constraints (relevance, freshness, diversity) • Inventory of articles? – Created by human editors – Small pool (30-50 articles) but refreshes periodically Deepak Agarwal & Bee-Chung Chen @ ICML’11 20
  • 21. Where are we today? • Before this research – Articles created and selected for display by editors • After this research – Article placement done through statistical models • How successful ? "Just look at our homepage, for example. Since we began pairing our content optimization technology with editorial expertise, we've seen click-through rates in the Today module more than double. ----- Carol Bartz, CEO Yahoo! Inc (Q4, 2009) Deepak Agarwal & Bee-Chung Chen @ ICML’11 21
  • 22. Main Goals • Methods to select most popular articles – This was done by editors before • Provide personalized article selection – Based on user covariates – Based on per user behavior • Scalability: Methods to generalize in small traffic scenarios – Today module part of most Y! portals around the world – Also syndicated to sources like Y! Mail, Y! IM etc Deepak Agarwal & Bee-Chung Chen @ ICML’11 22
  • 23. Similar applications • Goal: Use same methods for selecting most popular, personalization across different applications at Y! • Good news! Methods generalize, already in use Deepak Agarwal & Bee-Chung Chen @ ICML’11 23
  • 24. Next few hours Most Popular Personalized Recommendation Recommendation Offline Models Collaborative filtering (cold-start problem) Online Models Time-series models Incremental CF, online regression Intelligent Initialization Prior estimation Prior estimation, dimension reduction Explore/Exploit Multi-armed bandits Bandits with covariates Deepak Agarwal & Bee-Chung Chen @ ICML’11 24

Editor's Notes

  1. This only shows one scenario; that of content match. Let’s add Sponsored Search (Replace Content with Query) and Have a new slide for display advertising. This also does not provide info for the revenue model (shall we add it here or later).