Square's Machine Learning Infrastructure and Applications - Rong Yan

2,627 views

Published on

http://www.hakkalabs.co/articles/squares-machine-learning-infrastructure-applications

Published in: Software
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,627
On SlideShare
0
From Embeds
0
Number of Embeds
76
Actions
Shares
0
Downloads
69
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide

Square's Machine Learning Infrastructure and Applications - Rong Yan

  1. 1. May 15, 2014 ! Rong Yan Machine Learning 
 @ Square
  2. 2. Birth of Square
  3. 3. Payment StandReader Payment Device Payment Aggregation Risk Model
  4. 4. Payment Commerce Cash Market
  5. 5. Our Mission Make commerce easy.
  6. 6. Payment Data Commerce The Next Big Thing
  7. 7. 3M+ Readers $15B+ Annualized Scale
  8. 8. Offline and Online Amount Location Item Desc. Card #
 Credit Score Friends Activity History Inventory

  9. 9. Sales Volume
  10. 10. Haircut Price
  11. 11. Turn Data into Business Value Fraud
 Detection Business
 Insight Customer
 Relation Information
 Discovery
  12. 12. Fraud Detection 
 @ Square
  13. 13. Fraud Detection in the payment flow Bank Clears for settlement Suspect ~2000 sellers Risk Ops
 Transaction review 150,000 active sellers per day Risk ML 
 Fraud Detection
  14. 14. Payments near-real-time ML Architecture Merchant Devices Bank Accounts Machine Learning (300+ features) Suspicions
  15. 15. Card not present: Yes Pan Diversity: 0.05 Use iPhone: No Feature Generation
  16. 16. Easy to interpret
 ! Dimension reduction ! ! Very powerful in ensemble
 Decline Rate >= 0.1 NoYes Amount <= $10000 NoYes Business Type = Auto repair NoYes 0.9 0.6 Decision Tree Model
  17. 17. Random Forests: Decision Tree Ensemble Decline Rate <= 0.1 NoYes Amount <= $10000 Business Type = Auto repair 0.9 0.6 Tree 1 Tree N Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32. Mode for classification = Bad Average for regression = 0.63
 NoYes NoYes Success Rate <= 0.2 NoYes Age >= 20 Amount <= $1000 0.4 0.7 NoYes NoYes Decline Rate <= 0.3 NoYes Amount <= $20000 Age <= 22 0.8 0.6 NoYes NoYes Tree 2 Bad, 0.9
 Good, 0.4
 Bad, 0.6

  18. 18. Random Forests - Build each Tree All data
  19. 19. Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32. All data Samples Random Forests - Build each Tree
  20. 20. Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32. Features Dollar Amount Connected with bad user Business Type Decline Rate Time of Day Location Randomly select sqrt(n) features All data Samples Random Forests - Build each Tree
  21. 21. Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32. Features Dollar Amount Connected with bad user Business Type Decline Rate Time of Day Location Randomly select sqrt(n) features Best split: feature and value Decline Rate <= 0.1 NoYes 0.4 0.6 All data Samples Random Forests - Build each Tree
  22. 22. Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32. Features Dollar Amount Connected with bad user Business Type Decline Rate Time of Day Location Randomly select sqrt(n) features Best split: feature and value Decline Rate <= 0.1 NoYes 0.4 0.6 All data Samples Grow Tree Grow Tree Random Forests - Build each Tree
  23. 23. Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32. Features Dollar Amount Connected with bad user Business Type Decline Rate Time of Day Location Randomly select sqrt(n) features Best split: feature and value Decline Rate <= 0.1 NoYes 0.4 0.6 All data Samples Grow Tree Grow Tree When sample size is small STOP Random Forests - Build each Tree
  24. 24. Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32. Features Dollar Amount Connected with bad user Business Type Decline Rate Time of Day Location Randomly select sqrt(n) features Best split: feature and value Decline Rate <= 0.1 NoYes 0.4 0.6 All data Samples Grow Tree Grow Tree When sample size is small STOP Repeat these steps multiple times to create a forest Random Forests - Build each Tree
  25. 25. Boosting Trees Tree 1
  26. 26. Boosting Trees Tree 1 Tree 2 Help Tree 1
  27. 27. Boosting Trees Tree 1 Tree 2 Tree 3 Tree 4 Help Tree 1 Help Tree 1, 2 Help Tree 1, 2, 3 Stop when no help needed 0 weights all samples
  28. 28. Boosting Trees Tree 1 Tree 2 Tree 3 Tree 4 Help Tree 1 Help Tree 1, 2 Help Tree 1, 2, 3 8.0 -2.0 1.0 0.57.5 = + + +
  29. 29. Boosting Trees - Algorithm Objective function: Loss Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine." 1999
  30. 30. Precision at a fixed recall level Results - Precision Model April May June Random Forest 76% 77% 80% Boosting Trees 85% 82% 88% +11.8% +6.5% +10%
  31. 31. Results - Fraud Detection Recall # Payments to Reject Fraud$Prevented Easy Hard Medium
  32. 32. Data Sampling Highly biased in label distribution - Less than 1 in 1000 ! Weighted training - Higher weights on positive samples => oscillation - Lower weights on negative samples => no real gain ! Solution - Keep negative:positive ratio to be 3:1 - 10:1 - Scale the final model if calibration is needed ! Fewer data requires fewer resources to train ! Observed +10% improvement from 20:1 to 3:1
  33. 33. Productionalize
 Machine Learning
  34. 34. ‣ Ruby-on-Rails + MySQL ‣ MySQL replication ‣ Tied to production schema ‣ Hard to do complex analysis Startup Architecture
  35. 35. ‣ Jave services ‣ APIs ‣ HDFS Scale it up: 
 SOA + 
 Data Warehouse
  36. 36. Scale it up: 
 Data Transport ‣ Append-only feeds ‣ Kafka ‣ Replication ‣ Protocol buffers
  37. 37. Payments Highly Available Merchant Devices Bank Accounts Suspicions
  38. 38. Parallel Environments and Data Integrity Blue Green VIPupstream
  39. 39. Square Random Forest Learning Management Recommendation Other ML @ Square
  40. 40. Square Random Forest RF Learner Implementation Time (Train / Test) RiskML Random Forest (Built on Scikit-Learn) C / Cython / Python (Open Source + Square Code) 72 minutes WiseRF C++ (Proprietary) 23 minutes Square Random Forest Java (Square Code) 15 minutes Note: time reported on 3M training and 15M testing data
  41. 41. Learning Management System ‣ Support non-sophisticated users ‣ Fast ad-hoc analytics ‣ Accessible to everyone for easy model generation and evaluation ‣ Tracks results to ensure different models can be compared
  42. 42. Square Market Recommendation 10x conversion rate vs. random baseline
  43. 43. ML @ Square ! rongyan@squareup.com

×