Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
1 of 23

Causal inference in Recommender Systems



Download to read offline

What is the impact of a recommender system? In a typical three-way interaction between users, items and the platform, a recommender system can have differing impacts on the three stakeholders, and there can be multiple metrics based on utility, diversity, and fairness. One way to measure impact is through randomized A/B tests, but experiments are costly and can only be applied for short-term outcomes. This talk describes a unifying framework based on causality that can be used to answer such questions. Using the example of a recommender system's effect on increasing sales for a platform, I will discuss the four steps that form the basis of a causal analysis: modeling the causal mechanism, identifying the correct estimand, estimation, and finally checking robustness of the obtained estimates. Utilizing independence assumptions common in click log data, this process led to a new method for estimating impact of recommendations, called the split-door causal criterion. In the later half of the talk, I will show how the four steps can be used to address otherw questions such as selection bias, missing data, and fairness questions about a recommender system.

Related Books

Free with a 30 day trial from Scribd

See all

Causal inference in Recommender Systems

  1. 1. Causal Inference in Recommender Systems Amit Sharma Senior Researcher, Microsoft Research India @amt_shrma Invited Talk: REVEAL Workshop @ACM RecSys 2020
  2. 2. 2 Quartz
  3. 3. How to evaluate a recommender system? Accuracy • Is the predicted rating similar to a user’s rating? • Does the user click on a recommendation? Coverage • Does the system exclude certain items from recommendation? Diversity • Does the system recommend items different from each other? Insufficient for the questions we want to answer. Does the recommender system increase revenue? Does it shape what people buy or consume? Does it create “echo chambers” or make people more polarized?
  4. 4. Simple example: The “Harry Potter” Problem Suppose a recommender always recommends the next book by the same author. High accuracy and high coverage system. Diversity can also be high if user reads diverse genres of books. Harry Potter 2 By J.K. Rowling The Road By Cormac McCarthy
  5. 5. A causal view of a recommender system Key question: What would be the outcome metric in a world without the recommendation algorithm? Recommendation Algorithm Evaluating the algorithm Policy or Intervention Causal effect of intervention 𝑃(𝑅𝑒𝑐|𝑈𝑠𝑒𝑟𝐶𝑜𝑛𝑡𝑒𝑥𝑡) 𝑃 𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝐝𝐨(𝑅𝑒𝑐)) 𝑃 𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝐝𝐨(𝑅𝑒𝑐 = 1)) 𝑃 𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝐝𝐨(𝑅𝑒𝑐 = 0)) Causal Impact of Recommender= 𝑃 𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝐝𝐨(𝑅𝑒𝑐 = 1)) − 𝑃 𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝐝𝐨(𝑅𝑒𝑐 = 1))
  6. 6. Comparing to a counterfactual world provides new, causal metrics Serendipity: “Recommendation helps the user find a surprisingly interesting item they might not have otherwise discovered” ---Herlocker et al. 2004 (TOIS) But so far we lacked the tools to measure such metrics. Accuracy Coverage Increase in Clicks or Revenue Fairness by Parity Diversity Fairness by Equal Opportunity
  7. 7. Today’s talk: How to estimate causal metrics for a recommender system? 1. Case study: Estimate the impact of Amazon’s recommendation engine Describe the four steps of causal analysis: 1. Model causal mechanisms in a system. 2. Identify the correct metric to estimate. 3. Estimate the metric. 4. Check robustness of the estimate to unobserved confounding. 2. New, causal metrics: How a causal inference view enables us to ask new questions about a recommender system? DoWhy: A Python library for causal inference that implements the four steps.
  8. 8. Causal Impact: How many additional views does a recommender system bring? 8 Accuracy Increase in Clicks or Revenue
  9. 9. Hypothetical experiment: Randomized A/B test Can we develop an offline metric? 9 Treatment (A): Observed Control (B): Counterfactual world
  10. 10. Step 1: Modeling the causal mechanism and identifying the confounding factors 10 Demand for The Road Visits to The Road Rec. visits to No Country for Old Men Demand for No Country for Old Men
  11. 11. Observed activity is almost surely an overestimate of the causal effect 11 Causal Convenience OBSERVED ACTIVITY FROM RECOMMENDER All page visits ? ACTIVITY WITHOUT RECOMMENDER
  12. 12. Step 2: Identification--Is there a way to recover the causal effect from observed data? Naïve: 𝐄[𝑌/𝑋] To remove convenience clicks, need a proxy for unobserved demand. “Backdoor criterion”: 𝐄 wY/X where the weight 𝑤 = 1/𝑃(𝑋 = 1| 𝑈𝑠𝑒𝑟𝐶𝑜𝑛𝑡𝑒𝑥𝑡) captures demand of the user. (inverse propensity weighting). But method depends on accurately capturing unknown user context. Demand for Product Visits to Product (X) Visits to Recommended product (Y) Demand for Recommended product
  13. 13. Finding a demand proxy using natural experiments: Split outcome into recommender (primary) and direct visits 13 All visits to a recommended product Recommender visits Direct visits Search visits Direct browsing Auxiliary outcome: Proxy for unobserved demand for recommended product Demand for Product Visits to Product (X) Rec. Visits to Y (𝒀 𝑹) Demand for Recommended product Direct Visits to Y (𝒀 𝑫)
  14. 14. ? ? Example: Product X’s visits change but the direct visits to recommended product Y are constant (Accept) 14
  15. 15. 15 Example: Products visits change and direct visits to recommended product also change similarly (Reject)
  16. 16. Leads to the “split-door” criterion 16 Criterion: Observed visits through a recommended link are causal only when 𝑿 ∐ 𝒀 𝑫 . Demand for focal product (UX) Visits to focal product (X) Rec. visits (YR) Direct visits (YD) Demand for rec. product (UY)
  17. 17. More formally, the criterion is based on do- calculus over the causal graph 17 Unobserved variables (UX) Cause (X) Outcome (YR) Auxiliary Outcome (YD) Unobserved variables (UY)
  18. 18. Step 3: Estimation with logs from the Bing toolbar Out of which 20 K products have at least 10 visits on any one day
  19. 19. Implementing the split-door criterion 19 < 𝑋, 𝑌𝐷 > 𝑡 = 15 days
  20. 20. Estimate the metric over valid split-door pairs of products 20 Using the split-door criterion, obtained 23,000 natural experiments for over 12,000 products. (~half of all products~20k)
  21. 21. Step 4: Check robustness of the estimate to unobserved confounding What if there is an unobserved confounder that affects the recommendation click- throughs but not the direct visits? • Select plausible values for the confounder • Simulate how robust the estimate is.
  22. 22. Summary: Same process of causal analysis can be applied to develop metrics for new problems • Does a system provide same accuracy/performance across demographics? • Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine Yilmaz (WWW 2017). Auditing Search Engines for Differential Satisfaction Across Demographics. • How to measure long-term outcomes due to a system that cannot be measured by randomized experiments? • If you have a new product, which people to send the recommendation to such that number of purchases is maximized (limited budget to send recommendations)? • Email for a copy.
  23. 23. Thank you!! • Try DoWhy, a Python library for causal inference that implements the four steps of causal analysis • Upcoming book on Causal Inference in ML systems (w/ Emre Kiciman): • Papers • Sharma, Amit, Jake M. Hofman, and Duncan J. Watts. "Estimating the causal impact of recommendation systems from observational data." Proc. ACM EC 2015. • Sharma, Amit, Jake M. Hofman, and Duncan J. Watts. "Split-door criterion: Identification of causal effects through auxiliary outcomes." The Annals of Applied Statistics (2018). Amit Sharma, Microsoft Research India @amt_shrma

Editor's Notes

  • What is the impact of a recommender system?
    The truth obviously lies somewhere in the middle. Both are exaggerated.
  • Key question
  • Nothing new.
  • Suppose you are Amazon and you are While the concepts are general, they are best understood through an example.
    Causal: how much activity
    Suppose you want to improve recommendation. One of the metrics you want is for novel recommendation
  • And Ideally, we would want such an estimate for every product.
    And in many cases, infeasible.
    E.g. considerable effect on user experience.

    Question: rec has value
    Question: can randomize order. Or show random recommendations: why costly?
    Answer: can do but we need offlne metric..can be used to train new algorithm.
  • But if you just think about it, obs. CTR is almost surely an overestimate.
    It is helpful to think about in terms of causal and convenience. By design, a recommender system shows similar products,
  • In our case, it is page visits due to recommender and direct visits.
  • Story: yd is instrument. Not coming automatically but more validating.
    Say and this actually happened..oprah invited road book.
  • Everything that is affecting pr outcome should affect auxiliary.
    Can think of as giving us exclusion. But more broadly, serves to remove this arrow.
  • Observed effect is also the causal effect.
  • But we can actually do more general.
  • Improve quality of image.
  • All products is it method?
    Baseline: A method that can generate valid instrument

  • Can discover those that we would not think of.
  • ×