ML Infra at an
Early-Stage
Feature Services
Nick Handel, Head of Data Science
March 2019
- Many people
“Machine Learning is
99% Infrastructure”
Unfortunately the Infra
is really hard...
4
Where should you
start?
It will look
something like this
Big tech companies are
building incredible
infrastructure
8Source: Hidden Technical Debt in Machine Learning Systems
9
Source: Meet Michelangelo: Uber’s Machine Learning Platform
10Source: Bighead - Airbnb’s End-to-End Machine Learning Platform
11
What about the rest of us?
● Public solutions are lagging
○ Big Cloud providers aren’t providing end-to-end solutions
○ There is no enterprise solution that goes end-to-end
○ There is no widely-adopted open source solution
● The option set for the rest of us:
○ Buy pieces and combine
■ Requires engineering and money
■ Some pieces of infra didn’t have solutions: feature stores
○ Build
■ Requires engineering and may lead to tech debt with scale
12
Data is at the center of ML Infra
Connect to a range of
data sources
Monitor raw and
transformed data,
Monitor for feature drift
Collect and transform
features for testing
new model ideas
Share model outputs
as features in other
models
Cache production features
for training and validation
of point in time
correctness
Transform data
consistently between
inference and training
Backfill historical features
to test new ideas offline
(Not easy)Validate raw and
transformed data (types,
ranges, etc.)
Extract
Data
Build
Features
Train
Models
Monitor
Models
Serve
Models
Collect features for many
subjects (users, devices,
markets, etc.)
(duh)
1. Start basic
2. Build (or buy) a Feature Service
3. Mature the pieces that are
important to your business
The Feature Service
Simple Definition: Service for computing, and managing ML Data
In order of importance…
1. Framework
○ Reusable code
○ Consistency
○ Ease of development
2. Computation Engine
○ Service that builds features
○ Backfills new features for old inferences
3. Cache
○ Stores derived features
15
Defining a Feature Service
Feature
Repository
DynamoDB
Architecture
Write
Read
Inference
Training
Development
Feature
Service
Flask App
Write
Read
And for training
17
Life of a Feature
Inference Training Training
Model
Iteration
Feature
Iteration
Feature Repository
DynamoDB
Feature
Iteration
Validate point in time
correctness by
running training path
on previously
computed features
Calculate
and cache
features in
production
Use cached
features for
model
development
And for testing
new features
Calculate
features in
production
Train with new
features and
save them to
the cache
Flexible methods for
merge, join and concat
Everything is built on ABCs with
automated testing
As flexible as Python
Custom one-off
transforms
Features are built on versioned
extracts and transforms
Chain of
transformations
Multiple Features from
a single extract
Feature Definition
Defining Features
● Python is approachable and fast enough for our
inference needs (<10s)
● Keeps it simple
Versions
● Easy to manage at our stage
● Consistent transforms
● Different versions for different models
Transforms
● Reuseable!
● Organized: Filter, Map, Reduce
Testing
● Code works
● Production models don’t break
Feature Definition
Validate input and
output data of features Store transformed
features at the point of
inference for records
Track metrics on
features and monitor
for drift
20
Where we are today
Extract
Data
Build
Features
Train
Models
Monitor
Models
Serve
Models
Common Feature
Transformation Code
Features
accessible by
SQL
Backfill historical
features at specific
points in time (100%!!)
Enable Training on much
larger datasets with
previously computed features
Share model outputs as
features in other models
(learned features)
Prediction:
Feature stores will be the
centerpiece of everyone's ML
Infra in 3 years
The Team
Dave Bernthal
Dennis Van Der Staay
Spencer Barton
Ting Ting Liu
Thank You!
Nick Handel
nick@branch.co
@nick_handel
Appendix
25
Branch’s ML Problem
● Long Feedback Signals
○ Problem: We make loans and get signal back between 28 and 1 year
later
○ Solution: Make it possible to reconstruct
● Feature Drift
○ Problem: The way people use their mobile phones in developing
markets changes constantly
○ Solution: Store features and adjust for feature drift
● Many data sources and types
○ Problem: We collect data from a variety of sources and types (raw
text, network data, event streams, location, etc.)
○ Solutions: Build a system for feature construction that unifies
pipelines from different sources and types of transformations
● Learned Features
○ Model Storage is easy
○ Model Serving isn’t trivial
● Monitoring
○ Concept drift is one of our primary ML challenges
● Auto ML
○ Input labels and output model for production…
○ You already have the features!
26
What’s Next
27

Ml infra at an early stage

  • 1.
    ML Infra atan Early-Stage Feature Services Nick Handel, Head of Data Science March 2019
  • 2.
    - Many people “MachineLearning is 99% Infrastructure”
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
    Big tech companiesare building incredible infrastructure
  • 8.
    8Source: Hidden TechnicalDebt in Machine Learning Systems
  • 9.
    9 Source: Meet Michelangelo:Uber’s Machine Learning Platform
  • 10.
    10Source: Bighead -Airbnb’s End-to-End Machine Learning Platform
  • 11.
    11 What about therest of us? ● Public solutions are lagging ○ Big Cloud providers aren’t providing end-to-end solutions ○ There is no enterprise solution that goes end-to-end ○ There is no widely-adopted open source solution ● The option set for the rest of us: ○ Buy pieces and combine ■ Requires engineering and money ■ Some pieces of infra didn’t have solutions: feature stores ○ Build ■ Requires engineering and may lead to tech debt with scale
  • 12.
    12 Data is atthe center of ML Infra Connect to a range of data sources Monitor raw and transformed data, Monitor for feature drift Collect and transform features for testing new model ideas Share model outputs as features in other models Cache production features for training and validation of point in time correctness Transform data consistently between inference and training Backfill historical features to test new ideas offline (Not easy)Validate raw and transformed data (types, ranges, etc.) Extract Data Build Features Train Models Monitor Models Serve Models Collect features for many subjects (users, devices, markets, etc.) (duh)
  • 13.
    1. Start basic 2.Build (or buy) a Feature Service 3. Mature the pieces that are important to your business
  • 14.
  • 15.
    Simple Definition: Servicefor computing, and managing ML Data In order of importance… 1. Framework ○ Reusable code ○ Consistency ○ Ease of development 2. Computation Engine ○ Service that builds features ○ Backfills new features for old inferences 3. Cache ○ Stores derived features 15 Defining a Feature Service
  • 16.
  • 17.
    Write Read And for training 17 Lifeof a Feature Inference Training Training Model Iteration Feature Iteration Feature Repository DynamoDB Feature Iteration Validate point in time correctness by running training path on previously computed features Calculate and cache features in production Use cached features for model development And for testing new features Calculate features in production Train with new features and save them to the cache
  • 18.
    Flexible methods for merge,join and concat Everything is built on ABCs with automated testing As flexible as Python Custom one-off transforms Features are built on versioned extracts and transforms Chain of transformations Multiple Features from a single extract Feature Definition
  • 19.
    Defining Features ● Pythonis approachable and fast enough for our inference needs (<10s) ● Keeps it simple Versions ● Easy to manage at our stage ● Consistent transforms ● Different versions for different models Transforms ● Reuseable! ● Organized: Filter, Map, Reduce Testing ● Code works ● Production models don’t break Feature Definition
  • 20.
    Validate input and outputdata of features Store transformed features at the point of inference for records Track metrics on features and monitor for drift 20 Where we are today Extract Data Build Features Train Models Monitor Models Serve Models Common Feature Transformation Code Features accessible by SQL Backfill historical features at specific points in time (100%!!) Enable Training on much larger datasets with previously computed features Share model outputs as features in other models (learned features)
  • 21.
    Prediction: Feature stores willbe the centerpiece of everyone's ML Infra in 3 years
  • 22.
    The Team Dave Bernthal DennisVan Der Staay Spencer Barton Ting Ting Liu
  • 23.
  • 24.
  • 25.
    25 Branch’s ML Problem ●Long Feedback Signals ○ Problem: We make loans and get signal back between 28 and 1 year later ○ Solution: Make it possible to reconstruct ● Feature Drift ○ Problem: The way people use their mobile phones in developing markets changes constantly ○ Solution: Store features and adjust for feature drift ● Many data sources and types ○ Problem: We collect data from a variety of sources and types (raw text, network data, event streams, location, etc.) ○ Solutions: Build a system for feature construction that unifies pipelines from different sources and types of transformations
  • 26.
    ● Learned Features ○Model Storage is easy ○ Model Serving isn’t trivial ● Monitoring ○ Concept drift is one of our primary ML challenges ● Auto ML ○ Input labels and output model for production… ○ You already have the features! 26 What’s Next
  • 27.