Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Causal inference in online systems:... by Amit Sharma 7550 views
- From prediction to causation: Causa... by Amit Sharma 1728 views
- Causal inference in data science by Amit Sharma 2208 views
- Causal inference in practice by Amit Sharma 958 views
- слет книголюбов. библиотека Power p... by Димка Куликов 183 views
- Causal inference in practice: Here,... by Amit Sharma 600 views

2,976 views

Published on

Taking recommender systems as an example domain, I will show that data mining can be used to augment a popular techniques such as instrumental variables, by searching for large and sudden shocks in time series data. Applying this method to system logs for Amazon's "People who bought this also bought" recommendations, we are able to analyze over 4,000 unique products that experience such shocks. This leads to a more accurate estimate of the impact of the recommender system: at least 75% of recommendation click-throughs would likely occur in their absence, questioning popular industry estimates based on observed click-through rates.

Finally, this shock-based approach can be generalized to derive a data-driven identification strategy for finding natural experiments in time series data. This method too reveals a similar overestimate for the impact of recommendation systems.

Published in:
Data & Analytics

No Downloads

Total views

2,976

On SlideShare

0

From Embeds

0

Number of Embeds

2,172

Shares

0

Downloads

34

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Data mining for causal inference AMIT SHARMA Postdoctoral Researcher, Microsoft Research (Joint work with JAKE HOFMAN and DUNCAN WATTS, Microsoft Research) http://www.amitsharma.in @amt_shrma 1
- 2. My research Analyzing the effect of online systems ◦ Recommender systems [WWW ’13, EC ’15, CSCW ‘15] ◦ Social news feeds [CSCW ‘16] ◦ Web search Methodological ◦ Threats to large-scale observational studies [WWW ’16b] ◦ Mining for natural experiments [EC ‘15] ◦ New identification strategies suited for fine-grained data ◦ Testing assumptions for validity of an instrumental variable ◦ Gaps between prediction and understanding [WWW ’16a, ICWSM ‘16] 2
- 3. How much do they change user behavior? 4
- 4. Naively, up to 30% of traffic comes from recommendations 5
- 5. Naively, up to 30% of traffic comes from recommendations “Burton Snowboard, a sports retailer, reported that personalized product recommendations have driven nearly 25% of total sales since it began offering them in 2008. Prior to this, Burton’s customer recommendations consisted of items from its list of top-selling products.” 6
- 6. Example: product browsing on Amazon.com
- 7. Example: product browsing on Amazon.com
- 8. Example: product browsing on Amazon.com
- 9. Counterfactual browsing: no recommendations
- 10. Counterfactual browsing: no recommendations
- 11. Problem: Correlated demand may drive page visits, even without recommendations
- 12. The problem of correlated demand Demand for winter accessories Visits to winter hat Rec. visits to winter gloves 14
- 13. Goal: Estimate the causal effect Causal Convenience OBSERVED CLICK-THROUGHS WITHOUT RECOMMENDER Convenience ? 15
- 14. Ideal experiment: A/B Test Treatment (A) Control (B) But, experiments: may be costly hamper user experience require full access to the system 16
- 15. Using natural variations to simulate an experiment 18
- 16. Studying sudden spikes, “shocks” to demand for a book [Carmi et al. 2012] 19
- 17. The same author’s recommended book may also have a shock 20
- 18. Past work Uses statistical models to control for confounds Carmi et al. [2012], Oestreicher and Sundararajan [2012] and Lin [2013] construct “complementary sets” of similar, non-recommended products. Garfinkel et. al. [2006] and Broder et al. [2015] compare to model- predicted clicks without recommendations. But, 1. These assumptions are hard to verify. 2. Finding examples of valid shocks requires ingenuity and restricts researchers to very specific categories 21
- 19. This talk: Using data mining for natural experiments I. Data-driven instrumental variables “Shock-IV” method: Mining for sudden spikes (“shocks”) in data II. General data-driven identification strategy for time series data “Split-door” criterion: Generalizing the idea of shocks Throughout, we will use Amazon’s recommendation system as an example. 22
- 20. I. Shock-IV: Mining for valid natural experiments 23
- 21. Distinguishing between recommendation and direct traffic All visits to a product Recommender visits Direct visits Search visits Direct browsing Proxy for unobserved demand 24
- 22. The Shock-IV strategy: Searching for valid shocks ? ? 25
- 23. The Shock-IV strategy: Filtering out invalid shocks 26
- 24. Why does it work? Shock as an instrumental variable Demand Focal visits (X) Rec. visits (Y) Sudden Shock Direct visits (Y)
- 25. Computing the causal estimate Increase in recommendation clicks ( ) Causal CTR ( *Same as Wald estimator for instrumental variables Increase in visits to focal product ( )
- 26. Application to Amazon.com, using Bing toolbar logs • • • Sept 2013-May 2014
- 27. Recreating sequence of page visits by a user
- 28. Recreating sequence of page visits by a user Timestamp URL 2014-01-20 09:04:10 http://www.amazon.com/s/ref=nb_sb_nos s_1?field-keywords=George%20saunders 2014-01-20 09:04:15 http://www.amazon.com/dp/0812984250/ ref=sr_1_1 2014-01-20 09:05:01 http://www.amazon.com/dp/1573225797/ ref=pd_sim_b_2
- 29. Recreating sequence of page visits by a user Timestamp URL 2014-01-20 09:04:10 http://www.amazon.com/s/ref=nb_sb_no ss_1?field-keywords=George%20saunders 2014-01-20 09:04:15 http://www.amazon.com/dp/0812984250/ ref=sr_1_1 2014-01-20 09:05:01 http://www.amazon.com/dp/1573225797/ ref=pd_sim_b_2 User searches for George Saunders User clicks on the first search result User clicks on the second recommendation
- 30. I. Weekly and seasonal patterns in traffic, nearly tripling in holidays
- 31. II. 30% of all pageviews come through recommendations
- 32. III. Books and eBooks are the most popular categories by far
- 33. IV. Apparel and shoes see a substantially higher fraction of visits through recommendations
- 34. Shock-IV: Finding shocks in user visit data We look for focal products with large and sudden increases in views relative to typical traffic. Size of shock exceeds: ◦ 5 times median traffic ◦ Shock exceeds 5 times the previous day's traffic and 5 times the mean of the last 7 days. Shocked product has: ◦ Visits from at least 10 unique users during the shock ◦ Non-zero visits for at least five out of seven days before and after the shock 38
- 35. Shock-IV: Ensuring exclusion restriction Recommended product (Y) should have constant direct visits during the time of the shock. (1-β): Ratio of maximum 14-day variation in visits to a recommended product to the size of the shock for the focal product. Direct traffic to Y is stable relative to the shock to the focal product. β = 1 Direct traffic to Y is no less varying than the shock to focal product. β = 0 39
- 36. How to choose 𝛽? Accept RejectSelect 𝛽 = 0.7
- 37. Using the method, obtain >4000 natural experiments!
- 38. Estimating the causal clickthrough rate (𝜌)
- 39. Causal click-through rate by product category
- 40. Estimating fraction of observed click-throughs that are causal Compare the number of estimated causal clicks to all observed recommendation clicks (non-shock period). 45
- 41. Only a quarter of the observed click-throughs are causal At β = 0.7, only 25% of recommendation traffic is caused by the recommender.
- 42. Generalization? Shocks may be due to discounts or sales Lower CTR may be due to the holiday season 47
- 43. Local average treatment effect (LATE), not fully generalizable Shocked products are not a representative sample of all products, nor are the users who participate in them. • Fortunately, Shock-IV method covers roughly one-fifth of all products with at least 10 visits on any single day. • Causal estimates are consistent with experimental findings (e.g., Belluf et. al. [2012]) 48
- 44. Summary: Shock-IV method I. Mining for instruments allows us to study a much larger sample of natural experiments. II. Fine-grained data allowed us to test for exclusion restriction directly. A simple, scalable method for causal inference. ◦ Can used for improving recommender systems through causal metrics. ◦ Can be applied to other domains, such as online ads. ◦ Can be used for finding potential instruments. 49
- 45. II. Generalizing Shock-IV: “Split-door” criterion 50
- 46. Let’s have a look at the model again Demand Focal visits (X) Rec. visits (Y) Sudden Shock Direct visits (Y)
- 47. Focal Product Recommended Product Accept Accept 54
- 48. The split-door criterion Instead of searching for shocks, Check whether direct traffic for Y is independent of visits to X. Demand Focal visits (X) Rec. visits (Y) Direct Visits (YD) 55
- 49. More formal: Why does it work? Demand Focal visits (X) Rec. visits (Y) Direct Visits (YD)
- 50. Two possibilities, both remove the effect of common demand Demand Focal visits (X) Rec. visits (Y) Dir. visits (YD) Demand Focal visits (X) Rec. visits (Y) Dir. visits (YD)
- 51. Sidenote: Split-door criterion generalizes Shock-IV By capturing shocks, we were essentially capturing notion of independence between X and 𝑌𝐷 Split-door will admit all valid shocks, as also other variations. 58
- 52. Applying to logs from Amazon recommendations 1. 2.
- 53. Summary: A general identification criterion Split-door criterion admits a broader sample of natural experiments than shocks. Automatically tests for valid identification. Can be used whenever 𝑌𝑑 is separable. Applications: Evaluate the relationship between any two timeseries: e.g. social media and news, ads and search. 61
- 54. Conclusion Majority of traffic from recommendations may be not causal, simply convenience. Two data-driven methods: • Shock-IV: An IV-based method for mining exclusion-valid instruments from observational data • Split-door: A general identification strategy for time series data. 62
- 55. More generally, data mining can augment causal inference methods Hypothesize about a natural variation Argue why it resembles a randomized experiment Compute causal effect Develop tests for validity of natural variation Mine for such valid variations in observational data Compute causal effect 63
- 56. Thank you! AMIT SHARMA MICROSOFT RESEARCH @amt_shrma http://www.amitsharma.in Hypothesize about a natural variation Argue why it resembles a randomized experiment Compute causal effect Develop tests for validity of natural variation Mine for such valid variations in observational data Compute causal effect Sharma, A., Hofman, J. M., & Watts, D. J. (2015). Estimating the causal impact of recommendation systems from observational data. In Proceedings of the Sixteenth ACM Conference on Economics and Computation. 64

No public clipboards found for this slide

Be the first to comment