Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Impact of Computing Systems | Causal inference in practice

66 views

Published on

Computing and machine learning systems are affecting almost all parts of our lives and the society at large. How do we formulate and estimate the impact of these systems? This talk introduces causal inference as a methodology to answer such questions and provides examples of applying it to estimating impact of recommender systems, online social media feeds, search engines and interventions in public health in India.

Published in: Data & Analytics
  • Be the first to comment

The Impact of Computing Systems | Causal inference in practice

  1. 1. The Impact of Computing Systems: Causal inference in practice Amit Sharma Microsoft Research www.amitsharma.in Twitter: @amt_shrma Email: amshar@microsoft.com Summer School on Human-Centered AI http://www.hcixb.org/
  2. 2. I. How little we know about the systems we build II. How can causal inference help?
  3. 3. Computing systems are a part of life 3
  4. 4. What is the impact of these systems on our lives? Efficie ncy Convenie nce Inclusi on Fairne ss Accountab ility Transpare ncy
  5. 5. What will be the impact of computing systems on their lives?
  6. 6. (New?) social science of a world mediated by computing systems Programming Data science Machine learning Sensors and Systems Sociology Psychology Ethics Political Science Economics Development Studies
  7. 7. Many different communities • Human Computer Interaction (HCI) • Human Factors in Computing Systems (CHI) • Computer Supported Cooperative Work (CSCW) • Science and Technology Studies (STS) • Computational Social Science (CSS) • Information & Communication Technology and Development (ICTD) • Computing and Sustainable Societies (COMPASS)
  8. 8. People + Computing
  9. 9. My path “Intelligent systems that help people” Recommendation systems Social networking platforms Prediction Can we predict what you’ll be interested in? “How much do recommender systems shape people’s decisions?” “How much does a social NewsFeed influence people’s information access? “How do the recommender systems affect sellers on a platform? “How do you know that recommendations are having a positive impact? Causation Can we estimate the effect of our recommendations?
  10. 10. I. How little we know about the systems we build II. How can causal inference help?
  11. 11. 1. What’s the right decision? Use the social feed to predict a user's future activity (e.g, Likes). • Future Likes -> f( items in social feed) + 𝜖 Highly predictive model. “Would changing what a person sees in their feed change what they Like?” a) Yes b) No c) Maybe, maybe not
  12. 12. Prediction != Decision-making Would changing what people see in the feed affect what a user likes? Maybe, maybe not (!) Items liked by a user Homophily Items in Social Feed Items liked by a user Items in Social Feed Predictability due to feed influence Predictability due to homophily
  13. 13. 2. Which algorithm is better? 16
  14. 14. Comparing old versus new algorithm 17 Old Algorithm (A) New Algorithm (B) 50/1000 (5%) 54/1000 (5.4%)
  15. 15. Change in Success Rate by activity-level 18 Old Algorithm (A) New Algorithm (B) 10/400 (2.5%) 4/200 (2%) Old Algorithm (A) New Algorithm (B) 40/600 (6.6%) 50/800 (6.2%) 0 1 2 3 4 5 6 7 8 1 2 3 4 SR
  16. 16. Is Algorithm A better? Which algorithm will you choose? Old algorithm (A) New Algorithm (B) CTR for Low- Activity users 10/400 (2.5%) 4/200 (2%) CTR for High- Activity users 40/600 (6.6%) 50/800 (6.2%) Total CTR 50/1000 (5%) 54/1000 (5.4%) 19
  17. 17. Is Algorithm A still better? The Simpson’s paradox Old algorithm (A) New Algorithm (B) CTR for Low- Activity users Low-Income: 1/200 (0.5%) High-Income: 9/200 (4.5%) Low-Income: 4/100 (4%) High-Income: 0/100 (0%) CTR for High- Activity users Low-Income: 10/500 (2%) High-Income: 30/100 (30%) Low-Income: 45/600 (7.5%) High-Income: 5/200 (2.5%) Total CTR 50/1000 (5%) 54/1000 (5.4%) 20
  18. 18. E.g., Algorithm A could have been shown at different times than B. There could be other hidden causal variations. Answer (as usual): May be, may be not. 21
  19. 19. Average comment length decreases over time. Example: Simpson’s paradox in Reddit 22 But for each yearly cohort of users, comment length increases over time.
  20. 20. 23
  21. 21. I. How little we know about the systems we build II. How can causal inference help?
  22. 22. Causality: An enigma that has attracted scholars for centuries 25
  23. 23. What is the effect of a taxi-app’s matching algorithm on people’s incomes? What is the effect of algorithmic screening on a patient’s health? What is the influence of an online social feed on a person’s behavior? From interventions to algorithmic interventions
  24. 24. Definition: X causes Y iff changing X leads to a change in Y, keeping everything else constant. The causal effect is the magnitude by which Y is changed by a unit change in X. Called the “interventionist” interpretation of causality. A practical definition 27 http://plato.stanford.edu/entries/causation- mani/
  25. 25. Thinking of “counterfactuals”
  26. 26. Powerful statistical frameworks 29 For more details, check out a KDD tutorial on causal inference by Emre Kiciman and I: https://causalinference.gitlab.io/kdd-tutorial/
  27. 27. Running example: Estimating effect of an algorithm 30
  28. 28. Lookback: Need answers to “what if” questions 31http://plato.stanford.edu/entries/causation-counterfactual/
  29. 29. Ideal experiment 32
  30. 30. Methods for answering causal questions 33
  31. 31. Randomizing algorithm assignment: A/B test 34
  32. 32. Randomization removes hidden variation 35
  33. 33. But randomized experiments can be infeasibly, costly or even unethical… 36
  34. 34. So how about comparing with a similar user instead of random 37
  35. 35. Continuing example: Effect of Algorithm on CTR 38 Does new Algorithm B increase CTR for recommendations on Windows Store, compared to old algorithm A?
  36. 36. Previous example: Effect of Algorithm over CTR Does new Algorithm B increase CTR for recommendations on Windows Store, compared to old algorithm A? 39
  37. 37. Assumptions to estimate effect of Algorithm 40
  38. 38. General method: Conditioning on variables 41
  39. 39. Tricky to find correct variables to condition on. Fortunately, graphical models make it precise. 42
  40. 40. Backdoor criterion: Condition on enough variables to cover all backdoor paths 43
  41. 41. Algorithm: Stratification 44
  42. 42. I. How little we know about the systems we build II. How can causal inference help?
  43. 43. Example 1: Causal effect of a social news feed Amit Sharma, Dan Cosley (2016). Distinguishing Between Personal Preferences and Social Influence in Online Activity Feeds (Honorable Mention for Best Paper award) . Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing.
  44. 44. Example 1: Causal effect of a social newsfeed 47 Non-FriendsEgo Network f5 u f1 f4 f3f2 n5 u n1 n4 n3n2
  45. 45. Example 2: Is a search engine fair to all its users? Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine Yilmaz (2017). Auditing Search Engines for Differential Satisfaction Across Demographics. Proceedings of the 26th International Conference on World Wide Web (Industry Track).
  46. 46. Tricky: straightforward optimization can lead to differential performance • Search engine uses a standard metric: time spent on clicked result page as an indicator of satisfaction. • Goal: estimate difference in user satisfaction between these two demographic groups. • Suppose older users issue more of “retirement planning” queries Age: >50 years 80% users 10% users Age: <30 years …
  47. 47. Overall metrics can hide differential satisfaction • Average user satisfaction for “retirement planning” may be high. But, • Average satisfaction for younger users=0.7 • Average satisfaction for older users=0.2
  48. 48. Overall metrics across Demographics Four metrics: Graded Utility (GU) Reformulation Rate (RR) Successful Click Count (SCC) Page Click Count (PCC)
  49. 49. Pitfalls with Overall Metrics • Conflate two separate effects: • natural demographic variation caused by the differing traits among the different demographic groups e.g. • Different queries issued • Different information need for the same query • Even for the same satisfaction, demographic A tends to click more than demographic B • Systemic difference in user satisfaction due to the search engine
  50. 50. Utilize work from causal inference Information Need Demographics Metric User satisfaction Query Search Results
  51. 51. I. Context Matching: selecting for activity with near-identical context Information Need Demographics Metric User satisfaction Query Search Results Context
  52. 52. Information Need Demographics Metric User satisfaction Query Search Results Context For any two users from different demographics, 1. Same Query 2. Same Information Need: 1. Control for user intent: same final SAT click 2. Only consider navigational queries 3. Identical top-8 Search Results 1.2 M impressions, 19K unique queries, 617K users
  53. 53. Age-wise differences in metrics disappear
  54. 54. Example 3: Effect of a recommendation system 57
  55. 55. Confounding: Observed click-throughs may be due to correlated demand 58 Demand for The Road Visits to The Road Rec. visits to No Country for Old Men Demand for No Country for Old Men
  56. 56. Observational click-through rate overestimates causal effect 59 Amit Sharma, Jake M Hofman, Duncan J Watts (2018). Split-door criterion: Identification of causal effects through auxiliary outcomes. The Annals of Applied Statistics.
  57. 57. Example 4: Prioritizing tuberculosis patients for followup • TB is the leading infectious cause of death globally • TB treatment takes 6 months or more • Poor adherence to treatment increases risk of relapse, drug resistance, and death • India’s government TB program has used Directly Observed Treatment (DOT) to monitor adherence, but effort-intensive for patients and providers Jackson A Killian, Bryan Wilder, Amit Sharma, Vinod Choudhary, Bistra Dilkina, Milind Tambe (2019). Learning to Prescribe Interventions for Tuberculosis Patients using Digital Adherence Data. Proc. KDD 2019.
  58. 58. Background: How 99Dots works * Slide content sourced from Everwell.
  59. 59. Combination of Caller ID and numbers called shows that doses are in patient’s hands. Background: How 99Dots works * Slide content sourced from Everwell.
  60. 60. Two questions •“How to help health workers reprioritize their interventions?” • “Looking at a week’s data, can we predict adherence for the next week?”
  61. 61. Machine learning task • Input (t-7,t) • demographic features (age, gender, location) • Call details (number of calls, time of calls, days between calls, etc.) • Output (t, t+7) • Number of calls in the next week Obtain nearly 0.85 AUC.
  62. 62. Tale of Two worlds • Person makes no calls in week 1, intervention, starts making calls in week 2 • Person makes no calls in week 1, intervention, no calls in week 2
  63. 63. A causal model for interventions Person’s Behavior (t) Health worker’s intervention Call to 99Dots (t) Person’s Behavior (t-1) Call to 99Dots (t-1)
  64. 64. Domain-based filtering solution • 99Dots records suggested attention level for each patient • High: 4 or more calls missed in the last week • Medium: 1 to 4 calls missed in the last week • Low: No missed calls Medium -> High? • Given last week’s data, can we predict whether a person moves from Medium to High attention ?
  65. 65. Complex model and lower accuracy, but are able to save more missed doses
  66. 66. Example 5: What is the effect of peer support on mental health forums?
  67. 67. Talklife: thousands of “counselling” conversations online • A social network for peer support • People experiencing mental distress can post on Talklife and get support from their peers. • Global network, but also has Indian users • Can we identify patterns of successful peer support conversations? “Moments of cognitive change” Yada Pruksachatkun, Sachin R. Pendse, Amit Sharma (2019). Moments of Change: Analyzing Peer-Based Cognitive Support in Online Mental Health Forums. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems.
  68. 68. Summary People + Computing • Our lives are being mediated by computing systems, often using predictive models. • The impact can shape the future of our society! • But their impact is far from obvious. • Naïve prediction metrics can lead us astray. Need causal reasoning + understanding context
  69. 69. Thank you Amit Sharma @amt_shrma www.amitsharma.in • Our lives are being mediated by computing systems, often using predictive models. • The impact can shape the future of our society! • But their impact is far from obvious. • Naïve prediction metrics can lead us astray. Need causal reasoning + understanding context

×