May 15, 2014
!
Rong Yan
Machine
Learning 

@ Square
Birth of Square
Payment
StandReader
Payment Device	

Payment Aggregation	

Risk Model
Payment
Commerce Cash
Market
Our Mission
Make commerce easy.
Payment
Data
Commerce
The Next Big Thing
3M+
Readers
$15B+
Annualized
Scale
Offline and Online
Amount
Location
Item Desc.
Card #

Credit Score
Friends
Activity History
Inventory

Sales Volume
Haircut
Price
Turn Data into
Business Value
Fraud

Detection
Business

Insight
Customer

Relation
Information

Discovery
Fraud Detection 

@ Square
Fraud Detection in the payment flow
Bank
Clears for
settlement
Suspect
~2000 sellers
Risk Ops

Transaction review
150,000 active
sellers per day
Risk ML 

Fraud Detection
Payments
near-real-time
ML Architecture
Merchant
Devices
Bank
Accounts
Machine
Learning
(300+ features)
Suspicions
Card not present: Yes
Pan Diversity: 0.05
Use iPhone: No
Feature Generation
Easy to interpret

!
Dimension reduction
!
!
Very powerful in ensemble

Decline Rate >= 0.1
NoYes
Amount <= $10000
NoYes
Business Type = Auto repair
NoYes
0.9 0.6
Decision Tree Model
Random Forests: Decision Tree Ensemble
Decline Rate <= 0.1
NoYes
Amount <= $10000
Business Type = Auto repair
0.9 0.6
Tree 1 Tree N
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Mode for classification = Bad
Average for regression = 0.63

NoYes
NoYes
Success Rate <= 0.2
NoYes
Age >= 20
Amount <= $1000
0.4 0.7
NoYes
NoYes
Decline Rate <= 0.3
NoYes
Amount <= $20000
Age <= 22
0.8 0.6
NoYes
NoYes
Tree 2
Bad, 0.9
 Good, 0.4
 Bad, 0.6

Random Forests - Build each Tree
All data
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
All data
Samples
Random Forests - Build each Tree
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
All data
Samples
Random Forests - Build each Tree
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
Best split: feature and value
Decline Rate <= 0.1
NoYes
0.4 0.6
All data
Samples
Random Forests - Build each Tree
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
Best split: feature and value
Decline Rate <= 0.1
NoYes
0.4 0.6
All data
Samples
Grow Tree Grow Tree
Random Forests - Build each Tree
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
Best split: feature and value
Decline Rate <= 0.1
NoYes
0.4 0.6
All data
Samples
Grow Tree Grow Tree
When sample size is small STOP
Random Forests - Build each Tree
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
Best split: feature and value
Decline Rate <= 0.1
NoYes
0.4 0.6
All data
Samples
Grow Tree Grow Tree
When sample size is small STOP
Repeat these steps multiple times to create a forest
Random Forests - Build each Tree
Boosting Trees
Tree 1
Boosting Trees
Tree 1 Tree 2
Help Tree 1
Boosting Trees
Tree 1 Tree 2 Tree 3 Tree 4
Help Tree 1
Help Tree 1, 2
Help Tree 1, 2, 3
Stop when no
help needed
0 weights all samples
Boosting Trees
Tree 1 Tree 2 Tree 3 Tree 4
Help Tree 1
Help Tree 1, 2
Help Tree 1, 2, 3
8.0 -2.0 1.0 0.57.5 = + + +
Boosting Trees - Algorithm
Objective function:
Loss
Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine." 1999
Precision at a fixed recall level
Results - Precision
Model April May June
Random Forest 76% 77% 80%
Boosting Trees 85% 82% 88%
+11.8% +6.5% +10%
Results - Fraud Detection Recall
# Payments to Reject
Fraud$Prevented
Easy
Hard
Medium
Data Sampling
Highly biased in label distribution
- Less than 1 in 1000
!
Weighted training
- Higher weights on positive samples => oscillation
- Lower weights on negative samples => no real gain
!
Solution
- Keep negative:positive ratio to be 3:1 - 10:1
- Scale the final model if calibration is needed
!
Fewer data requires fewer resources to train
!
Observed +10% improvement from 20:1 to 3:1
Productionalize

Machine Learning
‣ Ruby-on-Rails + MySQL
‣ MySQL replication
‣ Tied to production schema
‣ Hard to do complex analysis
Startup
Architecture
‣ Jave services
‣ APIs
‣ HDFS
Scale it up: 

SOA + 

Data Warehouse
Scale it up: 

Data Transport
‣ Append-only feeds
‣ Kafka
‣ Replication
‣ Protocol buffers
Payments
Highly Available
Merchant
Devices
Bank
Accounts
Suspicions
Parallel Environments and Data Integrity
Blue
Green
VIPupstream
Square Random Forest
Learning Management
Recommendation
Other ML @ Square
Square Random Forest
RF Learner Implementation Time (Train / Test)
RiskML Random Forest
(Built on Scikit-Learn)
C / Cython / Python
(Open Source + Square Code)
72 minutes
WiseRF
C++
(Proprietary)
23 minutes
Square Random Forest
Java
(Square Code)
15 minutes
Note: time reported on 3M training and 15M testing data
Learning Management System
‣ Support non-sophisticated users
‣ Fast ad-hoc analytics
‣ Accessible to everyone for easy
model generation and evaluation
‣ Tracks results to ensure different
models can be compared
Square Market
Recommendation
10x conversion rate vs. random baseline
ML @ Square
!
rongyan@squareup.com

Square's Machine Learning Infrastructure and Applications - Rong Yan