Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

From prediction to causation: Causal inference in online systems

2,006 views

Published on

Predictive modeling and machine learning has been widely successful: such models can suggest movies, songs or games to try out, identify people to target ads for, predict customer churn, detect fraud, identify health risks and so on. However, predictive models are not well-equipped to answer questions about cause and effect, which form the logical next step of action after establishing correlational models: what should one do next to improve key metrics or goals? For example, what is the effect of a recommender algorithm? if a product recommendation system is changed or removed, what will be the effect on people's purchases? If having more friends is correlated with higher activity on a social network, would encouraging users to add more friends increase their activity?

This tutorial won't give all the answers, but will provide a principled way to reason about causal effects and estimate them. In the first half of the tutorial, I will present an overview of counterfactual reasoning and common methods for causal inference. The second half is hands-on: a practical example of estimating the causal impact of a recommender system, starting from simple methods to more complex methods, with the side-goal of appreciating and learning from common pitfalls in causal inference. Code and resources for the tutorial available at: https://github.com/amit-sharma/causal-inference-tutorial/

Amit Sharma

should we focus on encouraging Xbox users to add more friends? In the first half of the tutorial, I will present an overview of counterfactual reasoning and common methods for causal inference. In the second half, participants will work through a practical example of estimating the causal impact of a recommender system, starting from simple methods to more complex methods, with the side-goal of appreciating and learning from common pitfalls in causal inference.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

From prediction to causation: Causal inference in online systems

  1. 1. amshar@microsoft.com 1http://www.github.com/amit-sharma/causal-inference-tutorial
  2. 2. 2
  3. 3. 3
  4. 4. 4
  5. 5. 5
  6. 6. Use these correlations to make a predictive model. Future Activity -> f(number of friends, logins in past month)  6
  7. 7. 7
  8. 8. 8
  9. 9. 9
  10. 10. 10
  11. 11. 11
  12. 12. 12
  13. 13. 13
  14. 14. 14
  15. 15. 15
  16. 16. 16
  17. 17. 17
  18. 18. 18
  19. 19. 19 Old Algorithm (A) New Algorithm (B) 50/1000 (5%) 54/1000 (5.4%)
  20. 20. 20 Old Algorithm (A) New Algorithm (B) 10/400 (2.5%) 4/200 (2%) Old Algorithm (A) New Algorithm (B) 40/600 (6.6%) 50/800 (6.2%) 0 2 4 6 8 Low-activity High-activity CTR
  21. 21. Is Algorithm A better? Old algorithm (A) New Algorithm (B) CTR for Low- Activity users 10/400 (2.5%) 4/200 (2%) CTR for High- Activity users 40/600 (6.6%) 50/800 (6.2%) Total CTR 50/1000 (5%) 54/1000 (5.4%) 21
  22. 22. 22
  23. 23. Average comment length decreases over time. 23 But for each yearly cohort of users, comment length increases over time.
  24. 24. 24
  25. 25. 25
  26. 26. 26
  27. 27. 27http://plato.stanford.edu/entries/causation-mani/
  28. 28. 28http://plato.stanford.edu/entries/causation-counterfactual/
  29. 29. 29
  30. 30. 30
  31. 31. 31
  32. 32. 32
  33. 33. 33
  34. 34. 34
  35. 35. 35
  36. 36. 36
  37. 37. 37
  38. 38. 38
  39. 39. 39
  40. 40. 40
  41. 41. 41Dunning (2002), Rosenzweig-Wolpin (2000)
  42. 42. 42
  43. 43. 43
  44. 44. 44
  45. 45. 45
  46. 46. 46
  47. 47. 47
  48. 48. 48
  49. 49. 49
  50. 50. 50
  51. 51. 51
  52. 52. 52
  53. 53. 53
  54. 54. 54
  55. 55. 55 Does new Algorithm B increase CTR for recommendations on Windows Store, compared to old algorithm A?
  56. 56. Does new Algorithm B increase CTR for recommendations on Windows Store, compared to old algorithm A? 56
  57. 57. 57
  58. 58. 58
  59. 59. 59
  60. 60. 60
  61. 61. 61
  62. 62. 62
  63. 63. 63
  64. 64. 64
  65. 65. 65
  66. 66. 𝑷𝒓𝒐𝒑𝒆𝒏𝒔𝒊𝒕𝒚 𝑁𝑒𝑤𝐴𝑙𝑔𝑜 𝑈𝑠𝑒𝑟𝑖 = 𝑳𝒐𝒈𝒊𝒔𝒕𝒊𝒄(𝑎 𝑐𝑎𝑡1, 𝑎 𝑐𝑎𝑡2, … 𝑎 𝑐𝑎𝑡𝑛) Compare CTR between users with the same propensity score. 66
  67. 67. 67
  68. 68. 68
  69. 69. 69 Non-FriendsEgo Network f5 u f1 f4 f3f2 n5 u n1 n4 n3n2
  70. 70. 70
  71. 71. 71
  72. 72. 72
  73. 73. 73http://tylervigen.com/spurious-correlations
  74. 74. 74
  75. 75. http://www.github.com/amit-sharma/causal-inference- tutorial amshar@microsoft.com 75
  76. 76. https://www.github.com/amit-sharma/causal-inference-tutorial 76
  77. 77. 77
  78. 78. 78
  79. 79. 79
  80. 80. 80
  81. 81. 81
  82. 82. > nrow(user_app_visits_A) [1] 1,000,000 > length(unique(user_app_visits_A$user_id)) [1] 10,000 > length(unique(user_app_visits_A$product_id)) [1] 990 > length(unique(user_app_visits_A$category)) [1] 10 82
  83. 83. 83
  84. 84. 84
  85. 85. > user_app_visits_B = read.csv("user_app_visits_B.csv") > naive_observational_estimate <- function(user_visits){ # Naive observational estimate # Simply the fraction of visits that resulted in a recommendation click- through. est = summarise(user_visits, naive_estimate=sum(is_rec_visit)/length(is_rec_visit)) return(est) } > naive_observational_estimate(user_app_visits_A) naive_estimate [1] 0.200768 > naive_observational_estimate(user_app_visits_B) naive_estimate [1] 0.226467 85
  86. 86. 86
  87. 87. > stratified_by_activity_estimate(user_app_visits_A) Source: local data frame [4 x 2] activity_level stratified_estimate 1 1 0.1248852 2 2 0.1750483 3 3 0.2266394 4 4 0.2763522 > stratified_by_activity_estimate(user_app_visits_B) Source: local data frame [4 x 2] activity_level stratified_estimate 1 1 0.1253469 2 2 0.1753933 3 3 0.2257211 4 4 0.2749867 87
  88. 88. > stratified_by_category_estimate(user_app_visits_A) Source: local data frame [10 x 2] category stratified_estimate 1 1 0.1758294 2 2 0.2276829 3 3 0.2763157 4 4 0.1239860 5 5 0.1767163 … … … > stratified_by_category_estimate(user_app_visits_B) Source: local data frame [10 x 2] category stratified_estimate 1 1 0.2002127 2 2 0.2517528 3 3 0.3021371 4 4 0.1503150 5 5 0.1999519 … … … 88
  89. 89. 89
  90. 90. 90
  91. 91. 91
  92. 92. 92
  93. 93. > naive_observational_estimate(user_app_visits_A) naive_estimate [1] 0.200768 > ranking_discontinuity_estimate(user_app_visits_A) discontinuity_estimate [1] 0.121362 40% of app visits coming from recommendation click- throughs are not causal. Could have happened even without the recommendation system. 93
  94. 94. 94
  95. 95. 95 amshar@microsoft.com

×