Successfully reported this slideshow.
Your SlideShare is downloading. ×

Shparkley: Scaling Shapley with Apache Spark

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 44 Ad

Shparkley: Scaling Shapley with Apache Spark

Download to read offline

Shapley algorithm is an interpretation algorithm that is well-recognized by both the industry and academia. However, given its exponential runtime complexity and existing implementations taking a very long time to generate feature contributions for a single instance, it has found limited practical use in the industry.

Shapley algorithm is an interpretation algorithm that is well-recognized by both the industry and academia. However, given its exponential runtime complexity and existing implementations taking a very long time to generate feature contributions for a single instance, it has found limited practical use in the industry.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Shparkley: Scaling Shapley with Apache Spark (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Shparkley: Scaling Shapley with Apache Spark

  1. 1. Shparkley Scaling Shapley values with Apache Spark Cristine Dewar and Xiang Huang
  2. 2. Agenda Introduction What is a Shapley value? How did we implement it? How does the algorithm perform?
  3. 3. Introduction Cristine Dewar is an applied machine learning scientist on Affirm’s fraud ML team. Xiang Huang is an applied machine learning scientist working on underwriting problems for Affirm.
  4. 4. Introduction Affirm offers point of sale loans for our customers. Our applied machine learning team creates models for credit risk decisioning and for fraud detection and builds recommendation systems to personalize a customer’s experience.
  5. 5. Introduction For both fraud and credit, it is extremely important to be able to have a model that is fair and interpretable.
  6. 6. Introduction We have millions of rows of data and hundreds of features. We need a solution that allows us to interpret how our models are impacting individual users at scale and can serve up results quickly.
  7. 7. What we need We need a solution that: ▪ Allows us to interpret the effect of features on individual users ▪ Does so in a timely manner
  8. 8. What we need In cooperative game theory, there is a problem of how to allocate the surplus of resources generated by the cooperation of the players.
  9. 9. What we need We want the following properties when allocating marginal contribution to players: ▪ Symmetry ▪ Dummy ▪ Additivity ▪ Efficiency
  10. 10. What we need Symmetry - two players that contribute equally will be paid out equally
  11. 11. What we need Dummy - a player that does not contribute will have a value of zero
  12. 12. Additivity - the player’s average marginal contribution of each game is the same as evaluating that player on the entire season What we need .3 .2 .25 .25 Avg( ) =
  13. 13. What we need Efficiency - marginal contributions for each feature summed with the average prediction is that sample’s prediction Average prediction .5 + .3 - .4 - .2 + .1 This user’s prediction .3
  14. 14. What we need ▪ Symmetry ▪ Dummy ▪ Additivity ▪ Efficiency
  15. 15. Shapley values
  16. 16. What is a Shapley value? A Shapley value is a way to define payments proportional to each players marginal contribution for all members of the group.
  17. 17. Four feature model example FICO score Number of delinquencies Loan Amount Repaid Affirm
  18. 18. MATH! marginal contribution of feature j score with feature j score prior to adding feature j number of features this term is the fraction of permutations with the features in that order Shapley value equation place in permutation order possible permutation orders for feature j
  19. 19. Why does permutation order matter? We are no only trying to see how well a feature works on it’s own We are trying to measure is how well a feature collaborates
  20. 20. Permutations 1st 2nd 3rd 4th
  21. 21. Permutations |S| = 1 |S| = 2 |S| = 3 |S| = 4
  22. 22. MATH! marginal contribution of feature j score with feature j score prior to adding feature j number of features Shapley value equation place in permutation order
  23. 23. Comparing performance Score of Score of Score of Score of vs. score of no features vs. score of vs. score of vs. score of
  24. 24. MATH! marginal contribution of feature j score with feature j score prior to adding feature j number of features Shapley value equation place in permutation order
  25. 25. Approximate Approximate by suppressing the permuted feature’s contribution by making it noise
  26. 26. Make sense, sounds great A way to get the marginal contribution for individual rows not just a generalized feature importance. Even with approximation, it seems super computationally expensive, how do we deal with that?
  27. 27. Implementation in
  28. 28. Monte-Carlo approximation for Shapley value Shparkley Implementation Black Box Model 660 $500 Yes 2 Joe Shapley Value for Fico Score
  29. 29. Monte-Carlo approximation for Shapley value Shparkley Implementation Sampled Order 660 $500 Yes 2 Joe 700 $300 Yes 0 Sally
  30. 30. Monte-Carlo approximation for Shapley value Shparkley Implementation 660 $500 Yes 0 700 $500 Yes 0 From Joe Sampled Order 660 $500 Yes 2 Joe From Sally From Joe From Sally 700 $300 Yes 0 Sally
  31. 31. Monte-Carlo approximation for Shapley value Shparkley Implementation 660 $500 Yes 0 700 $500 Yes 0 From Joe Sampled Order 660 $500 Yes 2 Joe From Sally From Joe From Sally Instance with Joe Fico score Instance without Joe Fico score 700 $300 Yes 0 Sally
  32. 32. Shparkley Implementation Data ... X X partition ... Sampled background dataset X instance to investigate 600 300 1No broadcast
  33. 33. Shparkley Implementation X Shapley for For each row Row in partition 660 1000 0Yes instance to investigate 600 300 1No
  34. 34. Shparkley Implementation X Shapley for Permutation Order Order by Order by Row in partition 660 1000 0Yes 0 660 Yes1000 instance to investigate 600 300 1No 1 600 No300
  35. 35. Shparkley Implementation X Feature set with Loan Amount 1 600 300 Yes Shapley for Row in partition 660 1000 0Yes 0 660 Yes1000 Permutation Order Order by Order by instance to investigate 600 300 1No 1 600 No300 Feature set without Loan Amount 1 600 Yes1000
  36. 36. Shparkley Implementation Marginal Contribution from this row 0.8 - 0.7 = 0.1 Output: 0.7 Output: 0.8 Black Box Model Row in partition 660 1000 0Yes instance to investigate 600 300 1No Feature set without Loan Amount 1 600 Yes1000 Feature set with Loan Amount 1 600 300 Yes
  37. 37. Shparkley Implementation groupby(feature).agg(... feature Shapleyvalue weighted mean: MC Feature Marginal Contribution(MC) ... MC Feature
  38. 38. Shparkley Implementation ▪ Highlights ○ Spark-based implementation that scales with datasets ○ Leverage runtime advantages from batch prediction ○ Reuse predictions to calculate shapley value for all features ○ Shapley Value with weight support
  39. 39. Runtime and convergence Shparkley convergence for monte carlo error
  40. 40. Runtime and convergence Runtime comparison with shap BruteForce Explainer vs. Shparkley Feature Name Value Difference(%) Rank Difference Fico Score 3.7% 0 No. of Delinquencies 1.1% Length on credit Report 2.9% No. of inquiries in last six months 5.1% Loan Amount 0.4% User has repaid Affirm 2.5% Merchant Category 0.5% Cluster Config 10 machines (1 master 9 workers) Machine spec r5.4xlarge EC2 Instance (16 Cores, 128GB memory)
  41. 41. Conclusion Our implementation compared to a brute force explanation: ▪ improves the runtime by 50-60x ▪ shows minimal difference in shapley values Our open source implementation by Niloy Gupta, Isaac Joseph, Adam Johnston, Xiang Huang, and Cristine Dewar is available at github.com/Affirm/shparkley
  42. 42. Questions? affirm.com/careers
  43. 43. References ● Interpretable Machine Learning: Shapley Values ● An Efficient Explanation of Individual Classifications using Game Theory ● SHAP (SHapley Additive exPlanations)

×