Square's Machine Learning Infrastructure and Applications - Rong Yan

May 15, 2014
!
Rong Yan
Machine
Learning  
@ Square

Payment
StandReader
Payment Device

Payment Aggregation

Risk Model

Our Mission
Make commerce easy.

Payment
Data
Commerce
The Next Big Thing

3M+
Readers
$15B+
Annualized
Scale

Ofﬂine and Online
Amount
Location
Item Desc.
Card # 
Credit Score
Friends
Activity History
Inventory

Turn Data into
Business Value
Fraud 
Detection
Business 
Insight
Customer 
Relation
Information 
Discovery

Fraud Detection in the payment ﬂow
Bank
Clears for
settlement
Suspect
~2000 sellers
Risk Ops 
Transaction review
150,000 active
sellers per day
Risk ML  
Fraud Detection

Payments
near-real-time
ML Architecture
Merchant
Devices
Bank
Accounts
Machine
Learning
(300+ features)
Suspicions

Card not present: Yes
Pan Diversity: 0.05
Use iPhone: No
Feature Generation

Easy to interpret 
!
Dimension reduction
!
!
Very powerful in ensemble 
Decline Rate >= 0.1
NoYes
Amount <= $10000
NoYes
Business Type = Auto repair
NoYes
0.9 0.6
Decision Tree Model

Random Forests: Decision Tree Ensemble
Decline Rate <= 0.1
NoYes
Amount <= $10000
Business Type = Auto repair
0.9 0.6
Tree 1 Tree N
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Mode for classiﬁcation = Bad
Average for regression = 0.63 
NoYes
NoYes
Success Rate <= 0.2
NoYes
Age >= 20
Amount <= $1000
0.4 0.7
NoYes
NoYes
Decline Rate <= 0.3
NoYes
Amount <= $20000
Age <= 22
0.8 0.6
NoYes
NoYes
Tree 2
Bad, 0.9  Good, 0.4  Bad, 0.6

Random Forests - Build each Tree
All data

All data
Samples

Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
All data
Samples

Features
Dollar Amount
Business Type
Decline Rate
Time of Day
Location
Best split: feature and value
Decline Rate <= 0.1
NoYes
0.4 0.6
All data
Samples

Features
Dollar Amount
Business Type
Decline Rate
Time of Day
Location
Decline Rate <= 0.1
NoYes
0.4 0.6
All data
Samples
Grow Tree Grow Tree

Features
Dollar Amount
Business Type
Decline Rate
Time of Day
Location
Decline Rate <= 0.1
NoYes
0.4 0.6
All data
Samples
Grow Tree Grow Tree
When sample size is small STOP

Features
Dollar Amount
Business Type
Decline Rate
Time of Day
Location
Decline Rate <= 0.1
NoYes
0.4 0.6
All data
Samples
Grow Tree Grow Tree
When sample size is small STOP
Repeat these steps multiple times to create a forest

Boosting Trees
Tree 1 Tree 2
Help Tree 1

Boosting Trees
Tree 1 Tree 2 Tree 3 Tree 4
Help Tree 1
Help Tree 1, 2
Help Tree 1, 2, 3
Stop when no
help needed
0 weights all samples

Boosting Trees
Tree 1 Tree 2 Tree 3 Tree 4
Help Tree 1
Help Tree 1, 2
Help Tree 1, 2, 3
8.0 -2.0 1.0 0.57.5 = + + +

Boosting Trees - Algorithm
Objective function:
Loss
Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine." 1999

Precision at a ﬁxed recall level
Results - Precision
Model April May June
Random Forest 76% 77% 80%
Boosting Trees 85% 82% 88%
+11.8% +6.5% +10%

Results - Fraud Detection Recall
# Payments to Reject
Fraud$Prevented
Easy
Hard
Medium

Data Sampling
Highly biased in label distribution
- Less than 1 in 1000
!
Weighted training
- Higher weights on positive samples => oscillation
- Lower weights on negative samples => no real gain
!
Solution
- Keep negative:positive ratio to be 3:1 - 10:1
- Scale the ﬁnal model if calibration is needed
!
Fewer data requires fewer resources to train
!
Observed +10% improvement from 20:1 to 3:1

Productionalize 
Machine Learning

‣ Ruby-on-Rails + MySQL
‣ MySQL replication
‣ Tied to production schema
‣ Hard to do complex analysis
Startup
Architecture

‣ Jave services
‣ APIs
‣ HDFS
Scale it up:  
SOA +  
Data Warehouse

Scale it up:  
Data Transport
‣ Append-only feeds
‣ Kafka
‣ Replication
‣ Protocol buffers

Payments
Highly Available
Merchant
Devices
Bank
Accounts
Suspicions

Parallel Environments and Data Integrity
Blue
Green
VIPupstream

Square Random Forest
Learning Management
Recommendation
Other ML @ Square

RF Learner Implementation Time (Train / Test)
RiskML Random Forest
(Built on Scikit-Learn)
C / Cython / Python
(Open Source + Square Code)
72 minutes
WiseRF
C++
(Proprietary)
23 minutes
Java
(Square Code)
15 minutes
Note: time reported on 3M training and 15M testing data

Learning Management System
‣ Support non-sophisticated users
‣ Fast ad-hoc analytics
‣ Accessible to everyone for easy
model generation and evaluation
‣ Tracks results to ensure different
models can be compared

Square Market
Recommendation
10x conversion rate vs. random baseline

ML @ Square
!
rongyan@squareup.com

Square's Machine Learning Infrastructure and Applications - Rong Yan

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Square's Machine Learning Infrastructure and Applications - Rong Yan

Similar to Square's Machine Learning Infrastructure and Applications - Rong Yan (20)

More from Hakka Labs

More from Hakka Labs (20)

Recently uploaded

Recently uploaded (20)

Square's Machine Learning Infrastructure and Applications - Rong Yan